Pachyzoom: Understanding and Optimizing Apache Hadoop* Servers With Intel® VTune™ Amplifier Platform Profiler

Overview

Twitter* collaborated with Intel to find ways to increase the storage density of Apache Hadoop* nodes. The project began with a focus on Intel® Cache Acceleration Software (Intel® CAS) and Intel® Optane™ Solid State Drive, but evolved into a deeper dive into Twitter's existing Apache Hadoop infrastructure using Intel® VTune™ Amplifier platform profiler and internal tooling. As bottlenecks were removed, new ones took their place―causing a shift in the focus of our testing. By working with experts on Apache Hadoop, storage, caching, and telemetry from both companies, we were able to challenge several assumptions about Twitter's desired compute/storage balance. The result of many months of reconfiguration, benchmark testing, and analysis was a clear direction for the shape of Twitter's next generation of Apache Hadoop hardware. The presentation will discuss the evolution of the project and key results of the collaboration.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804