Ways to Speed up your Cloud Environment and Workload Performance on Intel® Architecture

By Quoc-Thai V Le, Published: 09/25/2012, Last Updated: 09/25/2012

Setting up a cloud environment is complicated, and it involves multiple elements such as database, network infrastructure, security, etc., (depending on the need).  How do you increase the performance of this environment?  We start small by focusing on one element at a time and try to speed up the process.  In this blog, I will share two popular cloud workloads and some key things that were done to increase their performance, followed by a list of resources that you can reference for more optimization ideas...these different approaches to increase performance, which may trigger creative ideas for what you can do with your cloud environment. 

Hadoop:  Hardware and Software selections matter

Apache Hadoop is an open-source software framework that enables applications to work with thousands of independent computers and petabytes of data.  This article: “Optimizing Hadoop Deployments,” focuses on the ideal combination of hardware and software knobs to set up a Hadoop Cluster. 

Hadoop may be ideal in situations where your goal is to manage and analyze “big data.” Big Data is a combination of structured and unstructured data.  Structured data is like information in a database.  Unstructured data contains everything that is not in a formal database.  Unstructured data can be textual or non-textual.  Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages.  Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.  So with traditional databases, big data is essentially unusable in any systematic way.  Hadoop allows you to combine all of your data and examine as one database.

  • Make all your data profitable – Hadoop can analyze the data 24/7 and provide options based on hard data over years of transactions.
  • Leverage all types of data, from all types of systems – Hadoop can handle can handle all types of data (structured, unstructured, log files, pictures, audio files, communications records, email, etc).  
  • Scale beyond anything you have today – Hadoop increases storage and compute capacity by adding additional commodity servers.  It automatically incorporates the servers into the cluster. 

The following overview assumes that you are familiar with Hadoop setup:

Server Hardware Configuration

  1. Choosing a server platform –Focus on platform cost and power efficient servers.  Dual-socket servers are optimal for Hadoop deployments.
  2. Selecting and configuring the hard drives – A relatively large number of hard drives per server (4 to 6) are recommended for Hadoop applications. The balance between cost and performance is generally achieved with 7200 RPM SATA drives. 
  3. Memory sizing – Typical Hadoop applications require 12 to 24 gigabytes (GB) of RAM for servers based on the Intel® Xeon® 5600 series.  For the best performance, we recommended that you populate your memory banks with error-correcting code (ECC) dual in-line memory modules (DIMMs) across all available channels.
  4. Choosing processors – The Intel® Xeon® 5600 series and beyond provide excellent performance for highly distributed workloads. The latest Intel Xeon processors are not only faster at Hadoop tasks, they also handle more throughputs. We define throughput as the number of tasks completed per minute when the Hadoop cluster is at 100% utilization processing multiple Hadoop jobs.
  5. Networking - The Intel® 82576 Gigabit Ethernet Controller, which can be found on select server boards or on the Intel® Gigabit ET Dual Port server adaptor, is recommended.

System Software Selection and Configuration

  1. Selecting the operating system and Java Virtual Machine - Using a Linux* distribution, based on kernel version 2.6.30 or later and Sun* JVM 6u14 or later, is recommended when deploying Hadoop on current-generation servers.  This is due to the included energy and threading efficiency optimizations.
  2. Choosing Hadoop Versions and Distributions - When selecting a version of Hadoop for your implementation, you must balance the enhancements available from the most recent release against the stability of more mature versions.

Hadoop configurations and tuning - Achieving the best results from a Hadoop implementation begins with choosing the correct hardware and software stacks. Fine-tuning the environment requires an in-depth analysis, but this can take time. For more details on Hadoop tuning and configuration, you can find information in the article "Optimizing Hadoop* Deployments." 

Memcached:  optimizing the software to deliver higher performance

  • Memcached is an open-source, multi-threaded, distributed, Key-Value caching solution used for delivering software-as-a-service. It reduces service latency and traffic to database and computational servers.  To get better performance for this (or any) workload, tuning was required to overcome thread-scaling limitations, and to enable more effective utilization of servers with many cores.  This article, “Enhancing the Scalability of Memcached,“describes those optimizations in detail.
  • Memcached reduces page load time, giving the user a faster experience. Remember, speed is a feature.  Without memcached, each page load needs to perform database queries and HTML rendering over and over again. 
  • Memcached queries return in milliseconds, which is usually 100x faster than a database query or a complex render (depending on load and complexity). This speed allows shared database queries or renders that are common to some of your users.
  • If you’re starting with an empty cache, the first page load will execute in the same time as it would without memcache. However, after the expensive operations have been cached, that same page will load more quickly each successive time until your key is either expired or evicted from the cache. 

The following is a high level overview of the tuning techniques that were applied to memcached; these optimizations are now available for download from GitHub.

  1. Examine the architecture of the workload in question
  2. Identify the bottlenecks.  You want the workload to take full advantage of the multithreaded execution on many-core servers.
  3. Eliminate global cache locks if possible, and apply multithreaded programming techniques for enhancing the scalability of the workload.

For more details about the analysis, the speedup, and optimized memcached algorithm, read the article “Enhancing the Scalability of Memcached” and the optimized memcached source code v1.6.

In the optimized memcached version 1.6, the speed increase is attributed to the redesigned hash table and Least Recently Used (LRU) to utilize parallel data structures. These data structures remove the locks for GET requests, and mitigate the contention of locks for SET and DELETE requests, enabling linear speedup when measured from 1 to 16 cores on a 2-socket Intel® system.


Depending on your cloud environment, focus on optimizing the hardware, software stack, and the workloads to get the most performance.   To generate even greater performance out of your workloads, you will want to spend more time analyzing performance and tuning to maximize use of all of your cores, as well as considering how to overcome CPU bottlenecks (by eliminating cache or contention locks where possible) and I/O bottlenecks (by using faster disks, faster network processors, multiplexing I/O across many servers as done with Hadoop, and using data compression where possible.  If you have successfully been able to optimize your software using other approaches and setups, I encourage you to share them with us.


LRU (Least Recently Used) -- This is the eviction scheme used in memcached to determine which items are removed from the cache to free space for additional cache items.

Reference information for developers interested in multithreaded programming:

Essential Tools for Threading

Intel® Software Development Products

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804