Costin Iancu is a scientist at Lawrence Berkeley National Laboratory (LBNL), where he spent 15 years performing research in the areas of programming models and code optimization for large scale parallel systems.
Khaled Ibrahim is a computer scientist at Lawrence Berkeley Laboratory. His research interest spans runtime systems, virtualized computing, high performance computing, and computer architectures. He has numerous publications in these areas in top tier journals and conferences. He is also an HPC challenge awardee in supercomputing 2014.
To extend their mission and to open new science frontiers, operators of large scale supercomputers have a vested interest to deploy existing data analytics frameworks such as Spark or Hadoop. So far, this deployment has been hampered by the differences in system architecture, which are reflected in the design and optimization approaches for the data analytics stacks. HPC systems and data centers have opposing design constraints and optimization goals. In a data center, the file system is optimized for latency (with local disks) and the network is optimized for bandwidth. In a supercomputer, the file system is optimized for bandwidth (without local disk) while the network is optimized for latency. Tightly coupling the compute nodes allows leveraging high performance centralized storage space with globally visible file directory structures.
We propose research and development efforts to ensure the successful adoption of data analytics frameworks in the HPC ecosystem, with emphasis on Spark. Our initial evaluation on a Cray XC30 and on an InfiniBand cluster indicates that performance is hampered by the interaction with Lustre: we observe 2-4X slowdown that can be directly attributed to it.
Scalability is the main target of our project. Currently, data analytics frameworks scale up to O(100) cores. We plan to demonstrate scalability up to O(10,000) cores. As a reference, we plan to use data center installations such as Amazon EC2. Initial R&D efforts will be performed on Cray XC30 and IB clusters with server class processors. In the later stages of the project we plan to experiment with the Intel® Xeon Phi processor (also known as Knights Landing) based system (Cori) at NERSC.
In the first stage of the project we plan to systematically re-design the Spark internals to accommodate the different performance characteristics of Lustre based systems. Examples include hiding the latency of expensive metadata operations associated with IO or exploiting the global name space. The goal is to at least match Amazon EC2 performance at this stage. This will provide us with a decent starting point and we can initiate collaborations with external application groups.
In the second stage of the project, we plan to re-examine the memory management in Spark to accommodate the complex vertical memory hierarchies present in HPC systems. Data center nodes provide two logical levels of storage (disk and memory) and data management is Spark reflects this hierarchy. In a sense, Lustre already provides a third level of storage in HPC systems. Newer systems already incorporate an extra level of NVRAM memory, such as the expected NERSC Cori system. We plan to explore Spark specific modifications: 1) using compositions of existing layers (e.g. Tachyon) as a memory manager for vertical hierarchies; 2) exploring a HDFS emulation layer that targets NVRAM storage. We also plan to explore low-level memory management policies at the OS level. We feel there is an exciting research opportunity to re-examine the data coherency models in both Lustre and Spark and propose parallel file system coherency modes for data analytics.