Data center planning and optimization
The world of IT has been evolving at an exceptional pace during the past decade. Not that long ago machine virtualization was considered a cutting edge technology which revolutionized IT operations. It is now one basic tool in the extensive IT toolbox. Big Data frameworks such as Hadoop* and Spark* have managed to tackle the task of parallelization problem by introducing new programming paradigms and SW tools. The availability of such tools has made the implementation of very complex functions and services possible.
On the other hand the question of performance or in the other words: “how well the service is implemented?” still remains a critical aspect. For example, how many nodes are needed in the data center to achieve the desired latency? What type of storage device needed? How should the SW stack be configured to get most out of the HW?
A simple solution would be to over-provision the system but this can lead to potentially unjustified capital and operational expenditure. Performance projections should really be a priority during planning phases as well as during HW refresh cycles. Projecting performance is a tricky task because it involves intricate SW/HW interactions not always well understood and/or anticipated.
What we can learn from the chip design industry
There is one industry that deals with very high capex and complex systems and which has managed to tackle the very same type of problem: the chip design industry. Designing a modern chip is complicated process that requires a lot of time and money. The semiconductor industry and especially tool chain vendors have provided a tremendous amount of effort to increase design tools analysis capabilities and efficiency so that chip performance can be accurately estimated early in the design phase, long before first prototypes are made available.
One methodology that enables this is modeling. Modeling is the art of building a more or less abstract representation of a system that can serve many different purposes depending on the abstraction level and the model description type: behavioral or functional. Functional models emphasize on data processing accuracy. They are usually used to validate algorithms and control flow. Behavioral models are more focused on the data flow and are often used to study system performance in terms of latencies and bandwidths. Abstract modeling is extensively applied in the chip design industry. Models are used at the chip specification level to validate target system performance. They are used during the middleware implementation phase where the chip is actually replaced by a model or in other words a virtual chip. The software is then written against this virtual chip, making it ready and validated before the physical chip is actually made available. Models are also used at the verification phase not long before tape out. Applications of modeling are numerous and they cover the whole chip design flow.
How it can help for data center planning and optimization
Designing and planning a data center is an optimization problem whose answer is given by the ability to predict system performance. The modeling and simulation approach has many benefits. Here are few:
- For “built-from-the-ground-up data centers”, projections can be done prior to any HW/SW infrastructure investment.
- For data center refreshes and/or scale-up, projections can be obtained without putting at risk existing service QoS.
- No more best case/worst case sizing. Since the system is simulated dynamically in its whole, the analysis encompasses all the mechanisms usually unaccounted for such as resource contentions.
- Rapidly test as many scenarios as possible with various SW, HW and workload variations. One of the huge benefits of abstract modeling is that no time is spent on simulating behaviors that do not impact performance, so simulation is very fast.
But as one could expect there are few issues with this approach as well:
- Modeling requires expertise. Accurate models can be built only if target systems are very well understood.
- Modeling requires plenty of time, especially for data center workloads with huge and complex SW stacks.
- Modeling is tricky and it is very easy to end up in a dead-end.
So the cost of modeling must be well balanced against the potential benefits.
To make sure our customers get most out of their application, Intel offers Intel® CoFluent™ Technology which is based on abstract modeling, as applied in the chip design industry, to simulate complex data centers workloads with a focus on big data. Thanks to its broad coverage of Hadoop* and Spark* SW stacks target workloads and data centers can be modeled and simulated very rapidly, lowering the high cost-barrier to data center modeling from the ground up.
As of today it is used to provide data center design and optimization guidance and recommendations.