Speaker: Prabhat, LBNL
In this work, we present parallel and highly optimized kd-tree-based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using terabyte-sized datasets from astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds utilizing around 50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperform earlier implementations by more than order of magnitude, thereby radically improving the applicability of our implementation to state-of-the-art big data analytics problems. In addition, we showcase performance and scalability on the recently released Intel® Xeon Phi™ processor showing that our algorithm scales well even on massively parallel architectures.