Blog post

mahout 0.5 基于 hadoop 的 CF 代码分析

mahout的taste框架是协同过滤算法的实现。它支持DataModel,如文件、数据库、NoSQL存储等,也支持hadoop的MapReduce。这里主要分析mahout0.5中的基于MR的实现。

Authored by Last updated on 01/24/2019 - 16:00
Article

Hadoop 0.22.0 及其 RAID 部署

        使用0.20.X系列版本的Hadoop快有一年时间了,主要集中在HDFS上。期间自己参与了部署Hadoop集群(1 Server + 20 PC),也参与了分析HDFS的源码。

Authored by Last updated on 01/24/2019 - 16:00
Blog post

Part #1 - Tuning Java Garbage Collection for HBase

Part #1 of a multi-parts post, we will take a look on how to tune Java garbage collection (GC) for HBase focusing on 100% YCSB reads. In part #2, we will look at 100% writes and finally in part #3, we will tune Java GC for a mix of 50/50 read/writes. As already mentioned, we are using YCSB which seems to be the de facto NoSQL workload. We wont go into much details on how to install, configure...
Authored by Eric Kaczmarek (Intel) Last updated on 06/14/2017 - 16:10
Video

Intel Software Optimization of Java* Virtual Machine and OpenJDK Community Announcement (OOW '14)

At Oracle OpenWorld 2014, Michael Greene talks about the role of his organization in helping optimize the Java* Virtual Machine and Intel's announcement that it is joining the Java OpenJDK communit

Authored by Last updated on 03/27/2019 - 14:04
Video

Big Data Java Optimization

This video provides overview of Java programming language and its benefits to enterprise applications.

Authored by admin Last updated on 06/14/2017 - 08:55
Blog post

Experience and Lessons Learned for Large-Scale Graph Analysis using GraphX

While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Apache Spark*, there are still many challenges in app

Authored by Mike P. (Intel) Last updated on 06/14/2017 - 15:44
Article

Intel Keynote and Intel technical presentations at Spark Summit West 2015

To find new trends and strong patterns from large complex data sets, a strong analytics foundation is needed. Intel is working closely with Databricks, AMPLab, Spark community and its ecosystem to advance these analytics capabilities…
Authored by Mike P. (Intel) Last updated on 06/07/2017 - 09:33
Article

How to Use Intel® DAAL in Java Applications

Intel® Data Analytics Acceleration Library (Intel® DAAL) provides a Java API and the ease-of-use for Java programmers. This article discusses how to build and run applications with the Eclipse IDE (one of the most popular Java IDEs). The procedures outlined in this article should also be applicable to other Java IDEs. If you want to build and run Java applications from the command line, see...
Authored by Zhang, Zhang (Intel) Last updated on 10/03/2018 - 07:24
Article

A Walk-Through of Online Processing Using Intel® DAAL

Intel® Data Analytics Acceleration Library (Intel® DAAL) is a new highly optimized library targeting data mining, statistical analysis, and machine learning applications. It provides advanced building blocks supporting all data analysis stages. Intel DAAL supports three processing modes, batch processing, online processing, and distributed processing. Online processing, a.k.a. streaming, is...
Authored by Zhang, Zhang (Intel) Last updated on 06/07/2017 - 10:33
Article

A Walk-Through of Distributed Processing Using Intel® DAAL

Intel® Data Analytics Acceleration Library (Intel® DAAL) is a new highly optimized library targeting data mining, statistical analysis, and machine learning applications. It provides advanced building blocks supporting all data analysis stages (preprocessing, transformation, analysis, modeling, decision making) for offline, streaming and distributed analytics usages. Intel DAAL support...
Authored by Ying H. (Intel) Last updated on 10/04/2018 - 04:16