Mensajes en el blog

Part #1 - Tuning Java Garbage Collection for HBase

Part #1 of a multi-parts post, we will take a look on how to tune Java garbage collection (GC) for HBase focusing on 100% YCSB reads. In part #2, we will look at 100% writes and finally in part #3, we will tune Java GC for a mix of 50/50 read/writes. As already mentioned, we are using YCSB which seems to be the de facto NoSQL workload. We wont go into much details on how to install, configure...
Autor Eric Kaczmarek (Intel) Última actualización 14/06/2017 - 16:10
Mensajes en el blog

Experience and Lessons Learned for Large-Scale Graph Analysis using GraphX

While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Apache Spark*, there are still many challenges in app

Autor Mike P. (Intel) Última actualización 14/06/2017 - 15:44
Mensajes en el blog

The JITter Conundrum - Just in Time for Your Traffic Jam

In interpreted languages, it just takes longer to get stuff done - I earlier gave the example where the Python source code a = b + c would result in a BINARY_ADD byte code which takes 78 machine instructions to do the add, but it's a single native ADD instruction if run in compiled language like C or C++. How can we speed this up? Or as the performance expert would say, how do I decrease...
Autor David S. (Blackbelt) Última actualización 04/07/2019 - 20:00
Mensajes en el blog

Getting Started with Tachyon by Use Cases

In-memory computing has become an irreversible trend in big data technology, for which the wide popularity of Spark provides a good evidence. Meanwhile, memory storage and management for large data sets are still posing challenges. Out of numerous solutions, Tachyon, a memory-centric distributed storage, well solves the problems faced by many application scenarios. For example, it avoids severe...
Autor Última actualización 07/06/2019 - 16:01
Mensajes en el blog

Intel® Data Analytics Acceleration Library

The Intel® Data Analytics Acceleration Library (Intel® DAAL) helps speed big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages (Pre-processing, Transformation, Analysis, Modeling, Validation, and Decision Making) for offline, streaming and distributed analytics usages. It’s designed for use with popular data platforms including Hadoop*, Spark*,...
Autor James R. (Blackbelt) Última actualización 27/08/2019 - 13:50