| Last Modified On : | October 17, 2008 1:45 PM PDT |
Rate |
|
by Andrew Binstock
Modern enterprise applications require servers that deliver significant processing power on a sustained basis. Only systems with the muscle power to crunch large volumes of data, store numerous transactions in cache, and scale seamlessly to mainframe levels can be trusted to perform high-volume transactions, large-scale encryption, and near real-time analysis of large data sets. Systems that cannot deliver at this level are destined to serve as point solutions in non-critical settings.
The Itanium® 2 processor, released in mid 2002, is Intel's high-end, enterprise-oriented processor. It is designed to respond aggressively to large-scale IT needs by delivering performance that sets new records for database transactions, security calculations, and data analysis, according to a variety of benchmarks published by several independent standards organizations. (This Intel website has all the benchmarks publicly available.) Obtaining these results is the direct result of several advances that are the key features of this second generation of Intel's 64-bit Explicitly Parallel Instruction Computing (EPIC) architecture.
An important reason for enterprises to evaluate migration to 64-bit systems is to obtain better performance from databases and data-analysis software. To provide database systems with the kind of throughput enterprises need, Intel made several interesting design decisions in the Itanium® 2 processor. The most important of these is the capacious 3MB on-die level-3 cache. Intel is the first major manufacturer to use a three-tier cache system in its chips. Caches hold data and instructions that the processor's execution engine is just about to access or has just finished accessing. Level 1 cache is located in the processor core. It holds the immediate next instructions and data the processor will work on; it is generally small and very fast. Level 2 cache—which like level 1 is common to all modern processors—acts as a high-speed buffer that feeds the level 1 cache and, on most processors, is itself fed by reads and writes to main memory. When people talk about processor cache on traditional processors, they are generally referring to this level-2 cache. It's larger and slightly slower than level 1 cache (of the order of a few clock ticks), but much faster than standard RAM (of the order of 100s to 1000s of clock ticks).
On Itanium 2-based systems, the level 2 cache does not interact directly with memory. Rather, it is fed by a very large level 3 cache (also located on the processor). Accesses to the L3 cache are a few clock ticks more than access to the L2 cache. Intel wanted a large on-chip cache to hold database transaction elements, among other things, greatly reducing the need to go to memory or disk often to get these same elements. The Itanium 2 processor's ability to hold and quickly act upon such large quantities of instructions and data results in substantial performance improvements. To enable this, Intel implemented a new 3-level caching system consisting of the in-core level 1 and the on-chip level 2 and level 3 caches.
Because of the 3MB size of the level 3 cache, Itanium 2-based systems are likely to be able to hold database records in cache for the entire duration of the transaction, which enables the I/O portion of the transaction to occur at speeds faster than memory access. (Once the transaction is complete, the record is written back to memory and disk at the most opportune time for system performance, but always in time for changes to be recorded before access by a different process or thread.)
Data movement into and from caches to system memory has itself been highly optimized by a special bus designed for the processor and its supporting chipsets. The Itanium 2 processor pushes system bus capacity to 6.4GB/sec throughput. It is hard to conceive of any system exceeding this bandwidth capacity even under the highest peak loads. (As a point of reference, disk drives in PCs today have a theoretical peak rate—that is, a rate that can be reached but not sustained—of 37MB/sec., meaning that approximately 200 disks delivering data at peak throughput would be required to consume this bandwidth.)
The result of the system buses this wide and of the large three-tier caches show up in the Itanium 2 processor's results in database benchmarks. In SAP AG's own benchmarks for its ERP software (specifically, the 2-tier SD benchmark published in July 2002 on http://www.sap.com/solutions/benchmark/*), 4-way Itanium 2 systems from Hewlett-Packard set a new record of 470 users, better than the fastest 4-way system in its class.
Architectures with 64-bit addresses can store reasonably large databases in memory and access them there with little thrashing or paging overhead. This is often done for databases that are constantly being accessed and for databases that serve as the basis for complex analysis. The theoretical maximum of 16TB for memory has never been tested, but multigigabyte databases are frequently run on 64-bit architectures. On machines using the Itanium 2 processor, in-memory databases will gain a performance edge because of the size of memory pages, which were expanded to 4GB. Pages this large mean that little, if any, page swapping occurs in work on in-memory databases.
Besides performance, the issue that most challenges enterprises today is security. The goals of enterprise security—locking malfeasant individuals outside the company firewall while making it simple for legitimate customers and employees to access the systems they need—rely in large part on encryption and cryptography. Encryption is used for two primary purposes (and many secondary ones, as well). These are authentication (making sure someone asking for access is who they say they are) and data protection (making sure that data sent back and forth during a transaction cannot be read or tampered with in transit).
Encrypted transactions generally encrypt the entire data stream. And in some cases, this encryption can occur at two levels simultaneously. For example, the IPsec standard, commonly used to provide security for virtual private networks (VPNs) performs encryption on every packet sent over the network. These packets themselves frequently contain data that is itself encrypted. As a result, enterprises need platforms that can encrypt and decrypt data very quickly. Anything short of top flight perf ormance can have serious repercussions on system throughput and can, in severe cases, cause users to abort transactions.
However, encryption/decryption unfortunately requires complex calculations. Depending on the security protocol in place, encryption can involve floating-point calculations or repeated integer transformations of data bytes. In addition, because the keys used for encryption are generally no less than 128 bytes, the data items involved are very large and in some ways cumbersome.
As a result, software that handles secure data requires hardware with excellent math and arithmetic capabilities to perform encryption and decryption on the fly in real time. Intel has provided for this need particularly well in its Itanium processor family. For example, when the original Itanium processor was released in 2001, it immediately took over the top spot in floating-point performance, (as reported in the January 2002 issue of Microprocessor Report, the leading analysis newsletter of the semiconductor industry).
The Itanium 2 processor advances this performance through considerably expanded arithmetic capabilities. This new generation of the chip sports 2 (rather than 1) floating-point units (capable of SIMD math—a high-performance computation that involves executing one instruction simultaneously across multiple data items), 2 memory/ALU units, 2 integer ALU units, and 3 branch execution units. The 2 memory/ALU units can perform integer computations when not busy. In addition, the Itanium 2 processor has 128 integer and 128 floating-point registers. This much horsepower has had a visible effect on performance for the processor. The 1 GHz Itanium 2 Processor (in a uniprocessing HP rx 2600 server) demonstrated by the SPEC tests, (which are the industry-standard benchmarks for measuring processor and system performance for integer and floating-point operations) a SPECint_base2000 rating of 810 (as posted on http://www.spec.org*) and a SPECfp_base2000 rating of 1356. These benchmarks were attained in July 2002, and they place the Itanium 2 processor ahead of RISC processors on comparable systems. Moreover, the floating-point result is an all-time record. Itanium 2 has the fastest math and arithmetic capabilities available today, making it ideally suited for encryption and security at IT sites.
Similar results were obtained, when the SPEC benchmark for security performance (the SPECweb99_SSL) was run. A 4-way Itanium 2 system scored 1520 simultaneous conforming connections, establishing a new record in performance for 4-way systems.
In addition to its leading performance, the Itanium 2 processor is characterized by broad scalability. The chip can be used in single- or dual-processor workstations, multiway workgroup servers, enterprise systems, and even supercomputers. For example, the largest American-built supercomputer, the TeraGrid project sponsored by the National Center for Supercomputing Applications (NCSA), is being constructed with Itanium and Itanium 2 processors. This project, a joint venture of the University of Illinois, the Argonne National Labs, the California Institute of Technology (Caltech), and the University of California in San Diego, plans to employ more than 3,000 Itanium 2 chips by completion in 2003. When finally deployed, the comput er—which is designed as a grid between the four institutions—will be capable of more than 14 billion floating-point operations (or 14 teraflops) and support 750 TB of storage. This capacity will place it second among all supercomputers, and significantly ahead of all other US supercomputers. (For more information on TeraGrid, see Grid Supercomputer Demonstrates Itanium 2 Processor Prowess)
A common perception is that an enterprise-scale processor capable of setting so many performance records is likely to be expensive. The 3MB on-die level 3 cache has furthered this expectation. However, the Itanium 2 processor is significantly less expensive than its competitors. When like system configurations are compared, the Itanium 2-based system is consistently a more attractive buy on dollar costs alone, quite apart from the favorable price-performance ratio.
However, the Itanium 2 processor further protects investments by supporting three distinct 64-bit operating systems: Linux, HP-UX 11i (Hewlett-Packard's variant of UNIX), and Windows .NET Enterprise Server 2003 and Windows .NET Data Center Server 2003 ( RC1, Aug 2002) Microsoft's version of Windows for 64-bit systems. No other 64-bit platform currently in the market enjoys the support of three widely accepted operating systems from three different vendors.
Finally, Intel's prior commitment to investment protection is apparent in the Itanium and Itanium 2 processor's native support for IA-32 (generally called x86) binaries—the same that run on Pentium and Intel Xeon processor-based systems. These binaries can execute without modification on both Itanium and Itanium 2 processors. And in fact, both x86 and 64-bit binaries can run simultaneously on the same processor. This allows a system that is only partially converted to 64 bits to run on an Itanium 2-based server. The dual native instruction sets is a unique design aspect that no other processor today offers. It is however recommended that the application be fully ported to run natively on the Itanium 2-based platform to reap all the benefits that 64-bit computing offers you.
By supporting multiple operating systems and two different instruction sets, the Itanium 2 processor provides IT sites with remarkable deployment flexibility and protection against application lock-in.
Over the years I've learned that developers of enterprise applications generally want their software to run on the fastest, most scalable machines. Only systems that fit both requirements are of real interest, because software vendors do not want their software compromised by the hardware limitations. As I have just discussed, the performance and scalability of Itanium 2 processors exceed those of contemporary RISC processors and do so at a lower cost.
The benefits of enjoying the support of multiple operating systems, backwards compatibility with x86 binaries, and the very favorable price-performance ratio suggest that developers committing to the Itanium 2-based platform are likely to find IT sites interested in running their software.
Intel® Itanium® Processor — Manuals
Introduction to Microarchitectural Optimization for Itanium® Processors
Andrew Binstock is the principal analyst at Pacific Data Works LLC. He was previously a senior technology analyst at PricewaterhouseCoopers, and earlier editor in chief of UNIX Review and C Gazette. He is the lead author of "Practical Algorithms for Programmers," from Addison-Wesley Longman, which is currently in its 12th printing and in use at more than 30 computer-science departments in the United States.
| April 16, 2009 1:56 AM PDT
THOMAS | I see Apple using Itanium's in the next xserver or one after I wish I had a Itanium Mac Pro |

Adam Kachwalla
890
Status Points:
390