Experience and Lessons Learned for Large-Scale Graph Analysis using GraphX

While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Apache Spark*, there are still many challenges in applying it to an Internet-scale, production setting, e.g. graph algorithms and underlying frameworks optimized for billions of graph edges and 1000s of iterations. This presentation, will show our efforts in building real-world, large-scale graph analysis applications using GraphX for some of the largest organizations/websites in the world, including both algorithm level and framework level optimizations, e.g. minimizing graph state replications, optimizing long RDD lineages, etc.

 

Download PDF slide-set for more details.

AttachmentSize
PDF icon Spark_Summit_East.pdf352.19 KB
For more complete information about compiler optimizations, see our Optimization Notice.