Running .NET on Intel® Xeon® processors: The Fastest Platform in Town

by Alan Zeichick


Introduction

The Intel® Xeon® processor is the top-end member of the processor family based on Intel's new NetBurst™ micro-architecture. Numerous improvements in the processor due to features of the Intel NetBurst micro-architecture make the Intel Xeon processor an optimal platform for running .NET applications. These features accelerate overall system performance as well as handling of XML documents, which are central to .NET's design. In addition, the numerous new encryption measures run decidedly faster due to other innovations discussed in this article.

But first, it's necessary to clarify what chip the term "Xeon" refers to. In the past, "Xeon" was appended to the name of the Pentium generation, as in "Pentium III Xeon." This appellation indicated a server chip (generally with a large cache) that employed internally the micro-architecture of the corresponding Pentium processor. As of the release of the Pentium 4 processor, Intel is splitting apart the Xeon and Pentium lines. "Pentium" now refers only to desktop processors designed for use in uniprocessor systems, while Xeon refers to the processors in dual- and multi-processing systems. "Xeon" will no longer be prepended by the Pentium modifier. So, in the context of this article, Xeon refers specifically to the current generation of server processors-it relies on the Intel NetBurst micro-architecture and is the sequel to the Pentium® III Xeon® processor.


New Intel® Xeon® Processor Features

Intel® Xeon® processors exhibit many features that make them an appealing choice when running applications based on Microsoft's* emerging .NET platform, especially when compared with Pentium III Xeon processors. The most obvious improvement is the clock speed-the Pentium® III Xeon® processor tops out at 900MHz, while the Intel Xeon processor's clock currently runs at 2GHz-but this difference tells only part of the story. The real details are under the covers in the NetBurst micro-architecture. The NetBurst name refers to a revamped design within the microprocessor core that comprise numerous disparate features including a faster front-side bus, a more-efficient Level 2 (L2) cache, revamped algorithms for pre-execution instruction fetching and processing, and new facilities for handling integer and floating-point computation.

First, let's look at the the front-side bus. Without getting into those hardware details that only a motherboard designer would care about, the Intel Xeon processor sports a 400MHz system bus which can transfer as much as 3.2 GB/sec. of data into and out of the processor. That compares to a maximum 1.06 GB/sec. transfer rate on the Pentium III Xeon processor's 133MHz system bus. Such capacious bandwidth will not make much difference if the server is just handling print jobs or serving static HTML pages-tasks of this kind simply don't tax the bus's capacity. However, if a busy .NET server is handling simultaneous requests for Active Server Pages (ASP) scripts, database indices, and processing server-side applications, performance will improve because the new bus enables data to be moved nearly three times faster between the processor and main memory than does the Pentium III Xeon processor.

The L2 cache is a key ingredient of Intel's 32-bit pro cessors. The role of the L2 cache is to store information retrieved from main memory, and feed it to the processor core at high speed. (Level 2 refers to the fact that the cache-which is really just high-speed memory-is stored on the same die as the processor core, but not in the core itself. Cache located in the processor core is called Level 1, or L1, cache.) With each generation of IA-32 processors, the L2 cache has become more efficient. In the Intel Xeon processor, a new dedicated 256-bit channel is placed between the L2 cache and the processor core, making data movement considerably more efficient than on the Pentium III Xeon chip. Intel calls this technology the Advanced Transfer Cache design. On the Intel Xeon processor, the cache can transfer 48 Gbit/sec on a 1.5GHz Xeon, compared to only 16 Gbit/sec on a 1GHz Pentium III processor. Additionally, the L2 cache retains commonly used code, so that frequently executed routines can be run without ever touching main memory or tying up the front-side bus.

The Xeon processor also aggressively prefetches and decodes instructions from the L2 cache. Employing a technique called Advanced Dynamic Execution, the processor can view the incoming software stream as far as 126 instructions ahead and make predictions about the likeliness of the code taking various branches. Based on these predictions, the processor preloads instructions from the most likely branches into the pipeline. With the Xeon processor, the branch pipeline is twice as deep as with the Pentium III Xeon processor's.

Using a 4KB buffer that stores the history of past branches, and by incorporating a more advanced branch algorithm, the Xeon processor reduces the number of branch prediction mismatches when compared with previous Xeon processors.

.NET applications benefit from these features because core code relating to the Windows kernel and the Common Language Runtime managed execution environment is likely to be executed repeatedly. Thanks to the efficiency of the Advanced Dynamic Execution algorithms, the Xeon processor can actually "learn" the branching sequences and improve prefetch performance.

And what about math? Intel gave the Xeon processor two arithmetic logic units, each of which can handle integer operations in only half a clock cycle. Also, the SSE-2 (or second generation of SSE extensions to the instruction set) adds 144 new instructions that further the Xeon processor's capability of executing a single instruction across multiple data items (SIMD). These two features reduce the number of instructions (and clock ticks) required to perform with on-the-fly encryption and decryption; multimedia, video, and speech processing; dynamic server-side graphics or reports, XML parsing, and database analysis.

The standard version of Windows .NET Server can run on uniprocessor Pentium 4-based systems or dual-processor servers using the Xeon processor. Windows .NET Enterprise Server (which replaces the older Windows 2000 Advanced Server) can leverage as many as eight Xeon processors. For heavy-duty mission-critical tasks, can turn to the Windows .NET Datacenter Server, which has the built-in capability to work with 8 to 32 Xeon processors.


Leveraging Microsoft* .NET

All the Xeon® processors' technological improvements come to the fore when running .NET appli cations on Windows servers. However, for the most part, .NET developers need not take any specific action to optimize their applications for the hardware, because Intel is working with Microsoft to build Xeon-processor-specific optimizations into the Windows .NET operating-system kernel and into the code for the Common Language Runtime managed execution environment. In fact, Microsoft's compilers for the Visual Studio.NET integrated development environment, (such as for its Visual Basic, Visual C#, and Visual C++ languages) do not even include options for processor-specific compiler optimizations. Rather they provide a consistent abstracted view of the computer to software applications and worry about the optimizations themselves. In other words, Windows and .NET do the heavy lifting, so run-time applications don't have to.

Similar benefits accrue, by the way, when running Java applications on a Windows .NET-based Java Virtual Machine (JVM). If the JVM designers incorporate the Xeon-processor-specific compiler optimizations into the managed execution environment, all Java applications running on such a JVM on a Xeon processor-based server automatically realize the benefits of the Xeon processor's improved capabilities.

Even so, there are areas where the Xeon processor's design offers exceptional benefits for .NET applications, including processing XML data, on-the-fly cryptography, and arithmetic-intensive computation.


XML

The processing of XML-formatted information is increasingly central to modern servers, and in particular to Microsoft's .NET Server and .NET Framework. XML is at the foundation of Microsoft's BizTalk server, and is used in SQL Server, ASP.NET, and in the .NET Framework. Along with SOAP (Simple Object Access Protocol) and WSDL (Web Services Description Language), XML is used extensively for data exchange between servers across a local-area network or the Internet.

XML is transmitted in the form of documents, where embedded tags describe not only the content of the document, but the meaning of that content (a concept referred to as metadata). Parsing XML documents to extract the content and storing new content into XML-formatted structures are computationally complex tasks that involve numerous branches of code. In many ways, the structure of an XML document can be likened to binary tree-although more complex. The Xeon processor's fast L2 cache and advanced abilities to predict execution of future code paths enables the server to process XML data more efficiently than the Pentium® III Xeon® processor.


Security

While Internet messages may be in the form of XML-encoded data, those messages themselves are likely to be encrypted using Secure Sockets Layer (SSL), digital certificates, or any of several Internet public-key cryptographic schemes. Applications can work with encrypted data using Microsoft's CryptoAPI; and the Windows .NET servers feature built-in services to issue and validate digital certificates-meaning that few programmers need to become involved in the nitty-gritty of those encryption specifications. However, they do need to worry about the performance penalty imposed by the constant computing required by encryption. This is especially true for e-commerce servers or secure remote access via v irtual private networks (VPNs). With these systems, lagging performance is especially costly, so deployment on Xeon® processors with their advanced capabilities for handling math is an important option to consider. The Xeon processors are likely the most effective hardware solutions to this problem short of dedicated encryption hardware.

The best news, with all of these improvements, is that programming code does not have to change at all to exploit the Xeon processor's capabilities. Intel and Microsoft are ensuring that Windows .NET's operating-system kernel, the Common Language Runtime managed execution environment, the core kernel libraries, and other resources leverage Xeon processors for running server applications. The first fruits of that collaboration can be found in Windows .NET beta three, released in November 2001, which contains some initial Pentium 4 and Xeon processor optimizations. Intel and Microsoft are continuing to work closely to add more of these processor optimizations to the .NET platform.

Intel has also enhanced its VTune performance analyzer to work with Xeon's enhanced instruction set, and for those occasions when it's necessary to code in assembly language for maximum performance, VTune's Code Coach can help developers optimize code using the SSE-2 instructions.

Learn more about the Xeon processor, and about Intel and third-party tools for developing applications which leverage the Xeon's advanced capabilities, at http://www.intel.com/products/server/processors/index.htm.


About the Author

Alan Zeichick is principal analyst at Camden Associates, an independent technology research firm focusing on networking, storage, and software development. He can be reached at zeichick@camdenassociates.com.


Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.