Setting up a cloud environment is complicated, and it involves multiple elements such as database, network infrastructure, security, etc., (depending on the need). How do you increase the performance of this environment? We start small by focusing on one element at a time and try to speed up the process. In this blog, I will share two popular cloud workloads and some key things that were done to increase their performance, followed by a list of resources that you can reference for more optimization ideas...these different approaches to increase performance, which may trigger creative ideas for what you can do with your cloud environment.
Hadoop: Hardware and Software selections matter
Apache Hadoop is an open-source software framework that enables applications to work with thousands of independent computers and petabytes of data. This article: “Optimizing Hadoop Deployments,” focuses on the ideal combination of hardware and software knobs to set up a Hadoop Cluster.
Hadoop may be ideal in situations where your goal is to manage and analyze “big data.” Big Data is a combination of structured and unstructured data. Structured data is like information in a database. Unstructured data contains everything that is not in a formal database. Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files. So with traditional databases, big data is essentially unusable in any systematic way. Hadoop allows you to combine all of your data and examine as one database.
The following overview assumes that you are familiar with Hadoop setup:
Server Hardware Configuration
System Software Selection and Configuration
Hadoop configurations and tuning - Achieving the best results from a Hadoop implementation begins with choosing the correct hardware and software stacks. Fine-tuning the environment requires an in-depth analysis, but this can take time. For more details on Hadoop tuning and configuration, you can find information in the article "Optimizing Hadoop* Deployments."
Memcached: optimizing the software to deliver higher performance
The following is a high level overview of the tuning techniques that were applied to memcached; these optimizations are now available for download from GitHub.
In the optimized memcached version 1.6, the speed increase is attributed to the redesigned hash table and Least Recently Used (LRU) to utilize parallel data structures. These data structures remove the locks for GET requests, and mitigate the contention of locks for SET and DELETE requests, enabling linear speedup when measured from 1 to 16 cores on a 2-socket Intel® system.
Depending on your cloud environment, focus on optimizing the hardware, software stack, and the workloads to get the most performance. To generate even greater performance out of your workloads, you will want to spend more time analyzing performance and tuning to maximize use of all of your cores, as well as considering how to overcome CPU bottlenecks (by eliminating cache or contention locks where possible) and I/O bottlenecks (by using faster disks, faster network processors, multiplexing I/O across many servers as done with Hadoop, and using data compression where possible. If you have successfully been able to optimize your software using other approaches and setups, I encourage you to share them with us.
LRU (Least Recently Used) -- This is the eviction scheme used in memcached to determine which items are removed from the cache to free space for additional cache items.
Reference information for developers interested in multithreaded programming:
Essential Tools for Threading
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804