Simplify Threading for Linux* and Itanium® Processors

Submit New Article

Last Modified On :   August 17, 2009 12:10 PM PDT
Rate
 


Introduction
By Matt Gillespie

Developers must thread their applications properly in order to get the best performance out of the parallel architectures in Intel® processors. Threading is particularly essential for large enterprise applications that handle many simultaneous transactions, such as those deployed on Itanium®-based systems.

While threading is extremely powerful, it must be done properly to avoid introducing errors or severe performance penalties. For example, if multiple threads simultaneously update the same global variable, a race condition will occur, which can cause data loss. If the developer attempts to address race conditions using improper synchronization techniques, the overhead associated with that synchronization can waste enough processor resources to cause unacceptable performance deficits. An improperly threaded application can therefore perform far worse than the serial version it replaces.

Intel® Thread Checker and Thread Profiler are plug-ins to the Intel® VTune™ Performance Analyzer that enable developers to detect, analyze, and resolve threading issues. Version 2.1 of these tools provide support for applications running on the Itanium Processor Family under Linux*, as well as applications running on 32-bit Intel® architecture under both Linux and Windows*. These advances significantly extend the capability of developers to glean the advantages of threading in their applications.

This article provides developers and decision makers with background information to support the decision to adopt Intel Threading Tools in their environments. It particularly introduces the expansion of platforms supported by the version 2.1 releases of these tools to include the Itanium Processor Family and Linux.


Intel® Thread Checker 2.1 Detects Threading Errors
The first step in resolving threading issues is to identify them. Intel Thread Checker 2.1 allows developers to locate threading issues such as race conditions, deadlocks, and thread stalls. These issues can otherwise be very difficult to identify. Intel Thread Checker can therefore provide substantial cost savings, and with the release of version 2.1, those benefits are extended to Linux applications for both 32-bit and 64-bit Intel architecture. Version 2.1 of the tool includes support for OpenMP* and POSIX* threads on 32-bit and 64-bit Linux applications, in addition to the support for 32-bit Windows applications that was available in previous versions.

Intel Thread Checker identifies specific source-code lines associated with runtime errors. If the source code is available, the tool supports source instrumentation using compiler switches such as the –tcheck compiler option on Intel® compilers for Linux. Developers can instrument and build application code for Linux on the Itanium Processor Family using the Intel® C++ compiler or the Intel® Fortran compiler. For 32-bit applications, the tool supports Intel or gcc compilers under Linux and Intel or Microsoft compilers under Windows.

Source instrumentation allows Intel Thread Checker to drill down to the specific variable that caused the error. If the source code is not available, the tool supports bin ary instrumentation, which identifies the line of source code associated with the error, so that the developer can identify the specific variable through external debugging measures. Using either type of instrumentation, Intel Thread Checker monitors the threading behavior of an application during execution and generates diagnostic reports. Those reports categorize errors, along with full explanations (incorporating context from other threads) that aid in their resolution.

While Intel Thread Checker can collect data from applications running locally on Windows machines, it uses the Remote Data Collector to collect data on Linux machines. During installation of Intel Thread Checker (on a Windows machine), an HTML page appears that allows the developer to access a setup file for the Remote Data Collector that can be transferred via FTP to the Linux target machine. After transferring that setup file, the developer extracts it and runs an installation script to install the Remote Data Collector. Note that firewalls between the Windows host machine and the Linux remote machine may interfere with data collection.

In addition to detecting threading errors, it is also vital to ensure that the threading behavior of an application is running efficiently. Developers can measure and improve that efficiency using Thread Profiler, which is described below.


Intel® Thread Profiler 2.1 Simplifies Performance Tuning for Threading
Performance issues associated with threading can be exceedingly complex, and developers need the right tools to resolve them. Thread Profiler is another plug-in for the Intel VTune Performance Analyzer, version 7.1 or higher, that analyzes the parallel performance of applications in real time. This analysis reveals performance issues such as excessive synchronization or threading overhead, as well as gauging the effectiveness of the application in distributing work evenly among threads (load balance).

Like threading errors, threading performance issues can be quite difficult to detect and troubleshoot, which translates into potentially large expenses for developers. Thread Profiler increases the efficiency with which developers can tune the threading behavior of applications, by allowing them to efficiently detect, analyze, and correct performance issues. Version 2.1 of the tool extends this functionality to applications for the Linux environment on both 32-bit and 64-bit Intel architectures, in addition to the 32-bit Windows support provided by earlier versions.

Thread Profiler runs as an activity in the VTune environment that provides runtime statistics for OpenMP threads on 32-bit and 64-bit Intel architecture, as well as POSIX threads on 32-bit Intel architecture. It presents those statistics in a number of views that allow for flexible analysis of application performance. Developers may choose to view statistics by thread or by region of the application. By drilling down into the behavior of a specific section of code, one can identify hotspots in serial regions, parallel regions, and critical sections, as well as the impact of various synchronization tasks. That analysis allows the developer to identify and prioritize specific sections of code for tuning. The Thread Profiler/VTune environment provides online tuning advice to aid in the resolution of spec ific threading performance issues.

For analysis of OpenMP threads on Itanium®-based systems or IA-32 systems running Linux, developers may select instrumented versions of the libraries either at run time or at compile time. In order to do so at run time, one compiles with the –openmp option and runs the application within the VTune environment with Thread Profiler. To instrument OpenMP threads at compile time, one builds the application with the Intel compiler –openmp_profile option, which links the instrumented libraries directly to the compiled code. (You must still compile the source files with the –openmp flag in order to use the –openmp_profile flag at link time.) Running the application from the command line then generates a runtime statistics file to be viewed from within the Thread Profiler tool.

Developers accomplish analysis with Thread Profiler on Linux machines using the Remote Data Collector, which one installs and uses in the same fashion as described above in the discussion of Intel Thread Checker.


Conclusion
Threading is a must to leverage the parallelism in Intel® architecture-based server hardware. With the new support for the Itanium Processor Family and Linux, versions 2.1 of Intel Thread Checker and Thread Profiler extend the ability of the VTune Performance Analyzer to troubleshoot and tune the performance of enterprise applications. Developers and ISVs can now benefit more than ever from the efficiencies and performance available using Intel Threading Tools.

With the upcoming introduction of processors that incorporate multiple processor cores on the same die in the Itanium Processor Family, the parallelism of this microarchitecture will be further enhanced. The processor code-named Montecito will be the first dual-core processor in the Itanium Processor Family, followed by Tanglewood, which will be a multi-core processor.

These advances make the case for threading applications stronger than ever, since the performance attainable by these means is more profound than ever. As these platforms become more pervasive, ISVs that have incorporated threading thoroughly into their designs will benefit competitively, as will individual programmers who have built their threading expertise.

The resources listed on the next page provide the basis for rich expertise in application threading that allows developers to meet the challenges of this complex topic. Threading applications will give a competitive advantage to those ISVs that stay the course and master the threading technologies and tools provided by Intel. By comparing the performance of a threaded application against its serial version, developers and ISVs can gauge the success of their threading efforts. Intel threading tools can improve that level of success.


Other Threading Resources from Intel
Because of the importance and complexity of threading, Intel provides a wide variety of resources for developers to help with threading issues. The Intel® Developer Services Threading Developer Center provides a wealth of information, including manuals, white papers, and an industry discussion forum moderated by Intel experts.

The Threading KnowledgeBase provides concise answers to common threading issues in an efficient-to-use Challenge and Solution format. Each KnowledgeBase item also includes a link to its larger source document, which navigates users to more in-depth information.

Developing Multithreaded Applications: A Platform Consistent Approach (PDF 2.1MB) provides an in-depth approach that addresses design recommendations for the development of threaded applications across the full range of Intel architecture, including symmetric multiprocessor (SMP) systems and systems with Hyper-Threading Technology. This guide provides threading-performance optimization guidelines that are applicable across hardware architectures.

Support resources for Intel threading tools include self-help Web sites for Intel® Thread Checker for Windows*, Intel® Thread Checker for Linux*, and Intel® Thread Profiler for Windows* that provide answers to frequently-asked questions, product documentation, product errata, and known issues and solutions. More individualized support is available from Intel® Premier Support, which allows the submission of issues directly to a technical support engineer, who will provide a response within 24 hours.



About the Author
Matt Gillespie is an independent technical author and editor working out of the Chicago area and specializing in emerging hardware and software technologies. Before going into business for himself, Matt developed training for software developers at Intel Corporation and worked in Internet Technical Services at California Federal Bank. He spent his early years as a writer and editor in the fields of financial publishing and neuroscience.