博客

mahout 0.5 基于 hadoop 的 CF 代码分析

mahout的taste框架是协同过滤算法的实现。它支持DataModel,如文件、数据库、NoSQL存储等,也支持hadoop的MapReduce。这里主要分析mahout0.5中的基于MR的实现。

作者: 最后更新时间: 2019/01/24 - 16:00
博客

Part #1 - Tuning Java Garbage Collection for HBase

Part #1 of a multi-parts post, we will take a look on how to tune Java garbage collection (GC) for HBase focusing on 100% YCSB reads. In part #2, we will look at 100% writes and finally in part #3, we will tune Java GC for a mix of 50/50 read/writes. As already mentioned, we are using YCSB which seems to be the de facto NoSQL workload. We wont go into much details on how to install, configure...
作者: Eric Kaczmarek (Intel) 最后更新时间: 2017/06/14 - 16:10
博客

Experience and Lessons Learned for Large-Scale Graph Analysis using GraphX

While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Apache Spark*, there are still many challenges in app

作者: Mike P. (Intel) 最后更新时间: 2017/06/14 - 15:44
博客

Hadoop RPC机制+源码分析

 一、RPC基本原理

作者: 最后更新时间: 2019/07/03 - 20:08
博客

The JITter Conundrum - Just in Time for Your Traffic Jam

In interpreted languages, it just takes longer to get stuff done - I earlier gave the example where the Python source code a = b + c would result in a BINARY_ADD byte code which takes 78 machine instructions to do the add, but it's a single native ADD instruction if run in compiled language like C or C++. How can we speed this up? Or as the performance expert would say, how do I decrease...
作者: David S. (Blackbelt) 最后更新时间: 2019/07/04 - 20:00
博客

Getting Started with Tachyon by Use Cases

In-memory computing has become an irreversible trend in big data technology, for which the wide popularity of Spark provides a good evidence. Meanwhile, memory storage and management for large data sets are still posing challenges. Out of numerous solutions, Tachyon, a memory-centric distributed storage, well solves the problems faced by many application scenarios. For example, it avoids severe...
作者: 最后更新时间: 2019/06/07 - 16:01
博客

按照使用案例开始使用 Tachyon

In-memory computing has become an irreversible trend in big data technology, for which the wide popularity of Spark provides a good evidence. Meanwhile, memory storage and management for large data sets are still posing challenges. Out of numerous solutions, Tachyon, a memory-centric distributed storage, well solves the problems faced by many application scenarios. For example, it avoids severe...
作者: 最后更新时间: 2019/06/07 - 16:00
博客

Intel® Data Analytics Acceleration Library

The Intel® Data Analytics Acceleration Library (Intel® DAAL) helps speed big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages (Pre-processing, Transformation, Analysis, Modeling, Validation, and Decision Making) for offline, streaming and distributed analytics usages. It’s designed for use with popular data platforms including Hadoop*, Spark*,...
作者: James R. (Blackbelt) 最后更新时间: 2019/08/27 - 13:50
博客

英特尔® 数据分析加速库

The Intel® Data Analytics Acceleration Library (Intel® DAAL) helps speed big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages (Pre-processing, Transformation, Analysis, Modeling, Validation, and Decision Making) for offline, streaming and distributed analytics usages. It’s designed for use with popular data platforms including Hadoop*, Spark*,...
作者: James R. (Blackbelt) 最后更新时间: 2019/08/27 - 13:50