While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Apache Spark*, there are still many challenges in applying it to an Internet-scale, production setting, e.g. graph algorithms and underlying frameworks optimized for billions of graph edges and 1000s of iterations. This presentation, will show our efforts in building real-world, large-scale graph analysis applications using GraphX for some of the largest organizations/websites in the world, including both algorithm level and framework level optimizations, e.g. minimizing graph state replications, optimizing long RDD lineages, etc.
Download PDF slide-set for more details.