With 64-bit processing becoming both more prevalent and affordable, developers trying to stay on the cutting edge will be wise to look at 64-bit migration scenarios. There a number of factors to consider when deciding whether to port or rebuild applications to take advantage of 64-bit processing. The key is to target the appropriate processor for the task at hand.
The Right Tool for the Right Job
A large number of educational and business server applications are rooted in 32-bit architecture. In many cases, it may be best to leave them on that platform for now, rather than incurring the overhead of moving to 64-bit. For 32-bit applications, impressive speed increases can be achieved by optimizing for the Intel® Xeon® processor. If an application really needs the horsepower of 64-bit processing, an abundance of tools and processor features, and Intel's solid track record make the Itanium® processor the obvious choice.
Both the Intel Xeon and Itanium processor families make use of an L3 cache. The Itanium processor is, however, more appropriate to target for large applications such as database servers, online banking, and other areas where large amounts of data must be dealt with quickly. For databases that need to access data beyond the 64 GB limit of 32-bit processors, the Terabytes of data that can be handled by 64-bit processors is a compelling reason to move to an Itanium processor-based platform. For applications that don't need additional math horsepower and processing capacity of the Itanium, moving to the 64-bit platform may not make sense, and the 32-bit Intel Xeon processor is more than likely the best choice.
Benefits of the Intel® Xeon® Processor (32-bit
A large number of single- and dual-processor servers and workstations are based on the Intel Xeon processor family. Available in speeds of 3.06, 2.80, 2.66, 2.40, and 2 GHz, the processor's 533 MHz system bus frequency supports greater memory, I/O, and graphic bandwidths. Based on the Intel NetBurst® microarchitecture, the Intel Xeon processor also includes 512KB L2 cache (3.06 GHz processors are available with 1MB cache) and Hyper-Threading Technology.
Hyper-Threading Technology improves processor performance for multi-threaded applications or multi-tasking environments by supporting multiple software threads on each processor system. Initially introduced for dual processor and multiprocessor servers, Hyper-Threading Technology is now available for Pentium 4 processor-based platforms. One of the key strengths of both Pentium processor and Intel Xeon processor is that they have Streaming SIMD Extensions 2 (SSE2), enabling use of double-precision floating point calculations in parallel. This works to improve performance of both floating-point calculation and multimedia applications.
Benefits of the Intel® Itanium® Processor with 64-bit Capability
The Intel® Itanium® processor with 6MB L3 cache is designed for demanding enterprise and technical applications. It is a socket-compatible successor to the original Itanium processor, delivering investment protection for OEMs and end-users. I n addition, it is binary-compatible with existing Itanium-based software and can provide performance increases of up to 30 to 50 percent or more over the original Itanium processor. With its execution resources, 6.4 GB per second system bus bandwidth, 6MB integrated L3 cache and 1.50 GHz core speed, the latest Itanium processor is a force to be reckoned with on both price and performance.
Itanium architecture increases performance by offering high levels of parallelism for enterprise and technical applications. The architecture's floating point performance enhances analytic and scientific design and visualization applications, and its 64-bit addressing and resources combine to provide a platform that can handle terabytes of data with improved memory latency and fewer branch misses.
For applications that include large databases, large-scale data analysis, Mechanical Computer Aided Engineering (MCAE), Electronic Design Automation (EDA), and 3D rendering, the Itanium processor provides flexibility through its support of a range of operating systems that include Windows Server* 2003, HP-UX*, and Linux*, as well as a rich ecosystem of applications targeted at high-end enterprise and technical computing environments. The Itanium family of processors is currently available in speeds of 1.50 GHz, 1.40 GHz, and 1.30 GHz; contains a Level 3 (L3) cache of 6MB, 4MB, and 3MB; 256 KB of Level 2 cache; and 32 KB of Level 1 cache. It also features an Enhanced Machine Check Architecture (MCA) with extensive Error Correcting Code (ECC) and a 400 MHz, 128-bit wide, 6.4 GB/s bandwidth system bus.
Serving Smaller Markets
It's important to carefully evaluate the needs of users to determine the right platform on which to run your software. It's unlikely, for example, that an end-user will purchase an Itanium server to run office applications such as a word processor.
At present, the 64-bit platform is generally best suited for heavy-duty server applications such as databases. Great care should be taken in making the decision to target desktop applications for the 64-bit platform, since many of these programs-and their users-don't need the power of 64-bit machines. If applications need to run in both 32- and 64-bit environments, as is often the case with workstation apps like CAD and 3D animation, it's best to optimize for each Intel platform individually to take full advantage of platform features.
Easing Portability and Migration
The appropriate platform for any given application will be determined primarily by performance and data capacity needs. However, the cost of moving software to a new platform is not a small consideration. If and when the decision is made to move a 32-bit application to a64-bit environment, the shift will be made much easier and less costly by taking advantage of Intel's development tools and shifting from Intel Xeon to Itanium processor-based systems. A number of tools, including high-performance libraries, are available for both processors. Using these tools allows a single code base to run best in either a 32-bit or 64-bit environment, increasing time-to-market while decreasing development costs.
- Intel® Compilers. Appropriate compiler use is the easiest and, in most case s, the single best way to take advantage of Intel Architecture performance. Intel compilers enable threading by supporting both OpenMP* and auto-parallelization. OpenMP is effective at threading loop-level parallel problems and function level parallelism. Intel® C++ compiler supports OpenMP API version 1.0 and performs code transformation for shared memory parallel programming. The compiler also supports auto-parallelization for the automatic threading of loops. The auto-parallelization feature detects loops capable of being executed safely in parallel and automatically generates code creating threads for these loops. Intel Compilers help make your software run optimally on Intel 32-bit processors (including Intel Xeon and Pentium 4 processors, 64-bit Intel Itanium and Itanium processors (64-bit), and Intel® Personal Internet Client Architecture (Intel PCA) processors. Optimizations include support for Streaming SIMD Extensions 2 (SSE2) in the Pentium® 4 and software pipelining in the Intel Itanium and Itanium processors. Inter-procedural optimization (IPO) and profile-guided optimization (PGO) can provide greater application performance. Intel's compiler family includes the Intel C++ Compiler 7.1 for Windows, Intel C++ Compiler 7.1 for Linux, Intel Fortran* Compiler 7.1 for Windows, and the Intel Fortran Compiler 7.1 for Linux.
- VTune™ Performance Analyzer enables developers to tune the performance of an application for Intel architectures. It provides time and event-based sampling, hotspot analysis, call graph profiling, an integrated view of the source code with detailed sampling information for each line of code, and event ratio displays. VTune analyzer can also save developers time by suggesting optimization approaches for processors that include the Pentium 4, Intel Xeon, and Itanium processors.
- Intel® Performance Primitives (IPP) save developers from the time-intensive task of hand-coding processor-specific optimizations, increase application portability and speed time-to-market. IPP supports more than two thousand primitives for signal and image processing, some of which are already threaded. IPP supports Pentium 4, Intel Xeon and Itanium processors, as well as processors based on Intel® StrongArm* technology.
- Intel® Math Kernel Library Version 6.0 extends the functionality of the prior version, providing a Vector Statistical Library (VSL) and Discrete Fourier Transforms (DFTs) in addition to linear algebra functionality (LAPACK and BLAS) and the vector transcendental functions (vector math library/VML). It also provides additional processor optimizations across various aspects of Intel MKL. Its "no royalty fee" software library allows you to redistribute unlimited copies of Intel MKL run-time libraries with your software products. The Intel MKL supports the Pentium 4, Mobile Intel® Pentium® 4 Processor - M, Intel Xeon, and Itanium processors. Built in parallelism (threading) capabilities provide excellent scaling opportunities for many applications. MKL provides a substantial subset of LAPACK for IA-32 and a full set for Itanium processor / Linux* versions. It also supports BLAS, FFTs and vector math functions. All level 2 and level 3 BLAS functions are threaded using OpenMP.
- Intel® Thread Checker is a plug-in component of the VTune analyzer environment, giving developers the look and feel used by the VTune Performance Analyzer. This component enables analysis, debugging, and verification of thread "correctness." Any errors are traced back to the actual line in the source code. Intel Thread Checker supports C/C++ and FORTRAN* applications that use OpenMP, Win32* threads and fibers, and POSIX* threads.
- Thread Profiler, an activity of the VTune analyzer environment, presents multiple views of OpenMP application performance data to help identify performance bottlenecks. Using an instrumented version of the OpenMP runtime libraries, it generates runtime data required for analysis. The instrumented versions of the libraries are selected either at compile time or at run time. Thread Profiler presents the runtime statistics file in various views, giving a breakdown of application performance by thread or by region. The profile shows time spent in serial regions, parallel regions, critical sections, and reveals various synchronization overheads.
No Speed Limit
Choosing and optimizing for the correct Intel platform means serving the broadest market with applications built specifically for the processors that will run them. Intel has a strong roadmap with architectures optimized for each market segment. Its long history of leadership in the microprocessor market is evidence that it considers both end-users and developers to be customers that deserve the highest quality products and service. It offers tools that help developers target Intel Xeon, Itanium, and all of its other processor families. When migration is necessary between platforms, the process is eased by Intel's cross-platform software development products. In addition, Intel Architecture now leads on key performance measures on servers ranging from two to 64 processors. Whether your application runs in a 32-bit or 64-bit environment, Intel Xeon and Itanium processor-based systems provide the flexibility, scalability, tools, and speed to meet customer needs today and tomorrow.
- Dual vs. Multiprocessor chips: What's the difference?
Which Intel® Xeon® processor family is right for a particular application? This article gives you what you need to choose between dual and multiprocessor family platforms.
- Advanced OpenMP Programming
This final paper in this series discusses the library functions, environment variables, how to debug yo ur application when things go wrong, and some tips for maximizing performance.
- Developing Multithreaded Applications: A Platform Consistent Approach
From the engineers who created the Intel® Threading Tools, this guide supplies the practical advice and code you need to adopt platform consistent threading practices. Covers tools, threading practices, synchronization and memory management.
- Tuning Strategies for World-Class Consumer Media Applications
Build a better multimedia application, and the world will beat a path to your door; Intel Application Engineer Kiefer Kuah shares best practices for increasing code performance.
- Parallel Programming
- Media Development
- Intel® Pentium® Processor Family
- Intel® Xeon® Processor Family
- Intel® Itanium® Processor Family
- Intel® 64 and IA-32 Architectures Software Developer's Manuals
Other Threading Resources
Services and Products
- The Intel® Software Partner Program provides software vendors with Intel's latest technologies, helping member companies to improve product lines and grow market share.
- For evaluation downloads and information about Intel software development products, including Compilers, Performance Analyzers, Performance Libraries and Threading Tools, visit the Intel Software Development Products home page: Intel® Software Development Products
- IT@Intel, through a series of white papers, case studies, and other materials, describes the lessons it has learned in identifying, evaluating, and deploying new technologies: IT@Intel
- The Intel® Academic Community provides a one-stop shop at Intel for training developers on leading-edge software-development technologies. Training consists of online and instructor-led courses covering all Intel® architectures, platforms, tools, and technologies: Intel® Academic Community
About the Author
George Walsh is a veteran tech editor and writer with experience in fields ranging from embedded systems programming to CAD. As freelance researcher and writer he has provided his expertise to over 30 clients in a wide variety of markets.