Performance and Agility with Big Data in a Containerized Environment

Enterprise software developers no longer need to choose between performance and agility for big data analytics. BlueData EPIC* software platform provides the flexibility and cost-efficiency benefits of Docker* containers, while ensuring bare-metal performance. Data science teams can gain on-demand access to big data environments while leveraging enterprise-grade data governance and security in a multi-tenant architecture.

3 d blocks

There is great business value in the insights that can be gained from analyzing large data sets with Apache Hadoop*, Spark* and other big data frameworks. But large volumes may require hours to process on large compute clusters. These resource costs can be significant. The cost of a job is inversely related to the throughput, so performance is of the utmost importance.

To ensure the highest possible performance, many enterprises deployed on-premises big data analytics using bare-metal physical servers. Until recently, many IT departments were reluctant to use virtual machines or containers for big data implementations due to processing overhead and I/O latency.

As a result, most on-premises big data initiatives have limited agility. Deployments on a traditional bare-metal setup often take weeks or even months to implement. This has impacted the adoption of Apache Hadoop, Spark, and other enterprise big data deployments. The need for greater agility has also led more data scientists to use the public cloud for big data – despite any potential performance loss that may entail, since most cloud services run on virtual machines.

Intel Collaboration with BlueData EPIC Software Platform

Intel entered into an investment and collaboration agreement with BlueData to address these challenges. BlueData’s EPIC* software platform uses Docker containers to help accelerate big data deployments – leveraging the inherent agility and deployment flexibility of containers. Container-based clusters in the BlueData platform look and feel like standard physical clusters in a bare-metal deployment, with no modification to Hadoop or other big data frameworks. It can be implemented either on-premises, in the public cloud, or in a hybrid architecture.

With BlueData, enterprises can quickly and easily deploy big data – providing a Big-Data-as-a-Service experience with self-service, elastic, and on-demand Apache Hadoop or Spark clusters – while at the same time reducing costs. And the BlueData platform is specifically tailored to the performance needs of big data. For example, BlueData boosts the I/O performance and scalability of container-based clusters with hierarchical data caching and tiering. It also allows multiple user groups to securely share the same cluster resources, avoiding the complexity of each group requiring its own dedicated big data infrastructure.

As part of the strategic technology and business collaboration, Intel has helped to test, benchmark, and enhance the BlueData EPIC software platform to help ensure flexible, elastic, and high-performance big data deployments. We’ve worked closely with BlueData to prove — using validated and quantified benchmarking results — that their software innovations could deliver comparable performance to bare-metal deployments for Apache Hadoop, Spark, and other big data workloads.

Enterprises no longer need to choose between performance and agility. Now they can gain the flexibility and cost-efficiency benefits of Docker containers, – while ensuring bare-metal performance. As a result, BlueData EPIC software platform running on Intel® architecture is becoming the solution stack of choice for many big data initiatives.

Learn More

Intel ran benchmark tests to determine the performance of on-premises big data workloads running on BlueData (using containers) versus the same workloads running on a bare-metal environment. The most recent tests were performed using the BigBench benchmarking kit with identical configurations on Intel® Xeon® processor-based architecture for both test environments for an apples-to-apples comparison.

Download Bare-Metal Performance for Big Data Workloads on Docker Containers to learn about the performance benchmark results.

Related Content

Develop Advanced Analytics Solutions with AI at Scale Using Apache Spark* and Analytics Zoo: The need for AI-fused analytics development—to harness large data sets and extract useful insights at scale—has never been greater.

Open Source Software Drives HPC Innovation: We’re on the forefront of converging AI, analytics, simulation and modeling, and other HPC workloads that will drive the industry toward the next era.

Derive Value from Data Analytics and AI at Scale: How will organizations turn the data deluge into value for a sustainable competitive advantage, at scale?

Author

Michael GreeneMichael Greene is Intel vice president and general manager of System Technologies and Optimization in the Intel Architecture, Graphics and Software organization. Greene leads a worldwide division responsible for a broad range of development, validation, enabling, and architecture analysis efforts for Intel® platforms, including pre-silicon software, virtual platforms modeling and simulation solutions, and power performance analysis, to increase development velocity and time to market. Greene joined Intel in 1990, after graduating from the Massachusetts Institute of Technology and has managed several new product developments, research efforts, and engineering groups. He has served as Intel’s initiative owner for power efficiency, pre-silicon software readiness, and has driven new technology benchmarking throughout his career. Greene is also chairman of the board for the National GEM Consortium. GEM is a national non-profit providing programming and full fellowships to support the number of under-represented individuals who pursue a master’s or doctorate degree in science or engineering. Follow Michael on Twitter*.

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.