My current gig is mostly about performance. I manage a group of software engineers dedicated to the languages becoming really important to the cloud and the datacenter.
One of those languages is Python*. It's truly astonishing to me how this little open source language with a 25 year history has become the language of choice for cloud infrastructure systems. In particular, OpenStack*.
We usually like to measure ourselves on real customer workloads, rather than "micro" benchmark. What's a micro benchmark? It exercises one particular function of the system or some small algorithm like generating the mandlebrot set. It isn't usually at all like what a real paying customer would care about. These micros are useful in their own right to optimize some particular function. But nobody will pay you good money just to optimize something unless it means more throughput or better response time in their own datacenter.
When you measure yourself with real customer workloads, you would be delighted to find a software change which would net you a 10% improvement.
What about 100% or more?
At this spring's OpenStack Summit in Austin, we jointly presented with our partners SwiftStack our latest Python work. We showed a truly amazing 111% throughput improvement with no source code changes in the OpenStack code. (Here's a link to the video of our talk).
How did we pull this off?
First, we started with Swift, which is OpenStack's object storage project. Swift is being used at such sites as Wikipedia.org, eBay.com, pac12.com and ancestry.com plus many others. For example, we understand that the hosting company ovh.com manages 75 petabytes of storage using Swift.
When we run "ssbench", the Swift project's benchmark, we observed that when you are at close to 100% CPU utilization, that 70-80% of the cycles are being spent in Python. We also observed that the processor is spending about half its cycles stalled in the front end of the processor pipeline.
A lot of this stalling effect is because interpreted languages like Python have a very large code footprint. For example, we did an experiment where we added two integers together. In native code, this takes 1 instruction. In Python, it takes on average 76 instructions.
So we thought, why not use a JIT?
There are several Python JITs available. We chose PyPy because of it's 10 year history, broad support for both Python 2 and 3 and because of its outstanding speed.
The results speak for themselves. Not only a 111% improvement in throughput, but up to an 87% response time improvement.Response time is really meaningful for most users, since faster response time translates into Wikipedia pages which load faster and seem snappier.
What does the future hold? We're really interested in seeing PyPy be reintroduced in the gate for Swift. We're also working on a proof of concept with all of OpenStack running under PyPy. This work is currently underway. For example, we're seeing a 37% throughput improvement with KeyStone, which is the main user authentication project in OpenStack. So we think we can boost overall performance of OpenStack with no code changes and just switching to PyPy.
Ultimately I believe that PyPy should be the default way that everyone uses Python. We're trying to give some more engineering love to PyPy and maybe we'll see some dramatic changes which should benefit everyone.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804