These Windows Perfmon* techniques should be your first line of defense to proactively monitor system performance and identify potential resource-contention issues.
The Windows Performance Monitor (Perfmon) is one of the most powerful, yet most underutilized features of the Windows operating system. Perfmon is widely used by the performance benchmarking community, who rely on the tool daily to identify system-level and application-level performance issues. Most end users, however, either ignore the tool completely or fail to make full use of its tremendous potential. IT professionals often fail to realize the full value Perfmon can add by enabling them to perform advanced performance monitoring and analysis of their Windows infrastructures.
Intel® Solution Services uses several software tools to stress, monitor, analyze, and tune enterprise solutions during customer engagements. This article describes how to get the most out of one of Perfmon, one of their more popular choices.
Getting Started with Perfmon
To launch Perfmon, simply type "start perfmon" at the command prompt. You can also invoke Perfmon by navigating to Start | Programs | Administrative Tools | Performance Monitor or Start | Run | Perfmon.
After you have launched Perfmon, you must select a target system, or computer, to monitor. This system can be the localhost from which you launched Perfmon or another Windows system on the local network. Because of the overhead associated with running Perfmon, it is recommended that Perfmon run on a remote computer while scrutinizing production servers for performance issues across a network. To select a system, click on the plus sign icon in the Perfmon toolbar. This action invokes a network browser showing localhost as the default monitoring target.
After you have selected the system you wish to monitor, choose an object, or subsystem, to monitor. These include such components as system, memory, network interfaces, or disk I/O subsystems. Next, choose the counters you wish to monitor. A counter is an indicator of a subsystem that monitors performance that can be quantitatively measured, such as the percent CPU time, pages in and out per second for virtual memory, or packets sent and received per second for the network interface. Figure 1 shows the Perfmon dialog box to select hosts, objects, and counters.
Figure 1. Perfmon Host, Object, and Counter Selection Dialog Box
After selecting the target system, its objects, and the counters to monitor, you can run Perfmon in any of three modes:
- Chart mode: graphically displays information on the counters of a selected object and allows you to monitor the system in real time.
- Log mode: collects all counter data from selected objects, allowing you to analyze the logged data later.
- Report mode: allows raw data from counters to be easily imported to spreadsheets like Microsoft Excel* for the creation of graphs and charts.
Getting the Most Out of Perfmon
Perfmon users commonly fail to utilize the tool fully. Of the three modes, most users run Perfmon in the default Chart mode and never explore the other two modes. Charting in real-time mode poses two significant limitations:
- Data is not captured, but merely viewed for a short time before it simply falls off the display.
- Detailed analysis of on-screen data in real time is difficult, if not impossible.
The Log and Report modes overcome these limitations by allowing the user to capture the data, rather than simply viewing it for a limited time interval.
Chart Mode Provides a Quick Glance Only
In Chart mode, Perfmon displays information on a selected object’s counters in either a horizontal graph (default) or a vertical histogram. The default graph view is usually adequate, although the histogram may be useful for comparing the values of many instances, such as the thread state of many threads that belong to a single process. The horizontal graph fails to present this type of comparison in a way that is easily understandable.
In Chart mode with the vertical histogram view, you can adjust the sampling rate, by either increasing or decreasing the rate at which the tool samples and displays data. The default rate is one second; that is to say, Perfmon takes and displays samples at one-second intervals. The vertical histogram offers only real-time information from the last sample taken. Once the counter changes value, the previous data is gone and irretrievable: Perfmon does record an average, but only over a limited period.
In Chart mode with the horizontal graph view, you can monitor the levels of specific counters in real time and receive history information. The amount of history available is extremely limited, however, because the display size is relatively small.
In either case, Chart mode is not intended to capture data. Its main purpose is to give the user a quick glance at a few metrics over a short period; it does not provide in-depth monitoring capability over time.
Log Mode Captures Information
In Chart mode, you can only monitor the system in real time. Even if you decrease the sample rate from the default of one second, you are not capturing information. To analyze system performance properly, you must log the data from the counters for analysis later.
To log counter data, follow these steps:
- Bring up the log dialog box by selecting the disk icon on the Perfmon toolbar.
- Select the plus sign icon on the toolbar to display the browser, which allows you to select both the host and the objects you wish to capture.
- From the Options menu in the Perfmon menu bar, select Log.
- In the log dialog box, name your log file and specify the path where you wish to store it. Specify a sample rate, and then save the log file.
- From the plus sign icon on the toolbar, select the objects you wish to monitor. All of the counters associated with a particular object will be captured and available for analysis later.
- Select Start Log in the log dialog box to start logging system performance. Once the log as been started, you will notice the status of the log change from "closed" to "collecting." This means that Perfmon has begun collecting object and counter information and is saving the data to your log file. You can return to Chart mode while your log is collecting information, to view some counters of interest to you in real time. The disk icon appears in the lower right-hand corner of the chart along with its current size. The role of this icon is to remind the user that a logging session is currently underway and will continue to grow in size until the user stops the collection process.
- Select the log dialog box and click Stop Log to halt the collection process. Your log file is now ready for immediate analysis in the chart mode. To view its contents, simply select Chart from the drop-down menu on the toolbar and select Data From. The choices are "current activity" (real time) or "log file," in which case a file browser dialog will appear, enabling the user to browse the system for their newly created Perfmon log.
To obtain the greatest benefit, you should log several objects on a particular system, depending on what application is running and what subsystems are involved in moving and processing data. For example, Web server applications tend to stress processor and network resources, but not memory or disk resources. Database applications tend to stress processor, memory, and disk resources, but rarely network resources. Messaging applications tend to stress all of the above.
You should begin in log mode by monitoring more objects rather than less, at least until you have determined that particular subsystems are not at risk for potential performance issues.
After capturing the right data by logging the appropriate objects on a particular system, you can move to the analysis stage. To help narrow your area of focus, you should analyze the counters of a single object. Do not bring up counters across multiple objects for analysis until you have properly analyzed each object separately and documented all your findings (data).
Because Perfmon was written with hardware and software subsystems in mind, it is logically organized to lend itself to views and presentations sorted by subsystem type. For example, the System object enables you to monitor counters such as % Processor Time, % Privileged Time, System Calls/sec, Interrupts/sec, and so on. These counters are all related to hardware and software system-level resources. The Memory object allows you to scrutinize counters such as the paging file reads or writes per second, memory utilization, and cache faults per second.
When logging, choosing the right objects to capture is not your sole consideration. It is important to choose a sample rate that best suits your needs. The log-file size will also depend upon the sample interval, as well as the number of objects you monitor. If you are monitoring a production server for performance issues over a normal workday (8 to 10 hours), your sample interval should be somewhat longer than if you were running a 10-minute benchmarking test.
A fairly frequent sample interval, such as 5, 10, or 15 seconds, captures specific events and produces very granular and detailed information. It also generates a lot of data, and therefore can lead to massive log files when you sample systems over long periods of time. Since disk space is sometimes a very sought-after resource, you might want to consider this issue.
If you log with sample intervals that are too long, on the other hand, you may miss crucial system events or transitions. For benchmarking scenarios, in which tests rarely last longer than 30 to 60 minutes, a log-file sample interval between 5 and 15 seconds is ideal. For longer tests, or for monitoring production servers over several hours, you should adjust the interval to between 15 and 30 seconds, and in some cases to one minute or more. There is no substitute for experience, so experiment often to determine which interval is best for your particular needs.
To end the logging process, return to the log dialog box and select Stop. The log file then closes and is ready for analysis in Chart mode. Figure 2 shows the log dialog box for selecting sample rate, log path, and filename.
Figure 2. Perfmon Log Dialog Box for Selecting Sample Rate, Log Path, and Filename
Reporting Mode Supports Data Export and Formatting
The Reporting mode simply shows raw data as taken from Perfmon. This enables the user to import the data to a spreadsheet program for analysis, create charts or graphs, and so forth. Its purpose is primarily to enable the formatting of data, rather than to collect it. The Reporting function displays the raw data from object counters in a table-like format. This mode lends itself to documentation more than to charting, since it makes visual comparison difficult and relationships hard to recognize.
Proceeding to Analysis
You should use all three modes together to attain the full benefit of Perfmon. Log mode is the most useful for capturing data for detailed analysis in the chart mode. Report mode is most useful for viewing and formatting raw data. The input of the chart can be toggled between the current system activity and data from a log file captured earlier.
After properly logging with Perfmon, you can begin analysis in the chart mode. The best method is to select each object individually, analyze it, write down all the displayed data, and determine if a performance problem exists. Figure 3 shows some typical Perfmon objects and associated counters that Intel Solution Services engin eers often monitor during performance-analysis work.
|System||% CPU Time
% Privileged Time
File Control Operations/sec
File Data Operations/sec
Demand Zero Faults/sec
|Logical Disk||Average Disk Queue Length
Average Disk sec/Transfer
Average Disk Bytes/Transfer
|Network Interface||Bytes Total/sec
Output Queue Length
|Web Service||CGI Requests/sec
Current Anonymous Users
|Active Server Pages||Requests/sec
Request Wait Time
Request Queue Length
Request Execution Time
|SQL Server General Stats||Cache Hit Ratio
I/O Lazy Writes/sec
Max Tempdb Space Used
RA (read ahead) Pages Found in Cache
|SQL Server Locks||Page Locks – Exclusive
Max Users Blocked
Total Blocking Locks
Lock Wait Time
Figure 3. Popular Perfmon Objects and Counters
Note that this list is not exhaustive and does not define individual counters. For a complete list of all counters and their technical definitions, click on the Explain button in the counter-selection dialog box within Perfmon.
You can begin the analysis stage by pulling up Perfmon in Chart mode and choosing the Data From option. This action selects your log file as the input, or source. Next , begin analysis by adding counters from a single object; you may want to start with the System object. If you choose System, begin adding the associated counters mentioned in Figure 3, such as % CPU Time, % Privileged Time, and so on. Figure 4 shows 12 popular counters and their recommended ranges for normal system operation.
|Counter (Parent Object)||Recommended Range|
|% CPU Time (System)||0-90% ( > 90% indicates potential processor bottleneck; may also indicate thread contention problem; investigate Context Switches/sec and System Calls/sec for potential thread issues)|
|% Privileged Time (System)||0-40% ( > 40% indicates excessive system activity; correlate with System Calls/sec)|
|Context Switches/sec (System)||0-10,000 ( > 10,000 may indicate too many threads contending for resources; correlate with System Calls/sec and threads counter in Windows Task Manager to identify process responsible)|
|File Control Operations/sec (System)||Ratio dependent (The combined rate of file system operations that are neither reads nor writes [file control/manipulation only, non-data related]. Inverse of File Data Operations/sec)|
|File Data Operations/sec (System)||Ratio dependent (Combined rate of all read/write operations for all logical drives. Inverse of File Control Operations/sec)|
|System Calls/sec (System)||0-20,000 ( > 20,000 indicates potentially excessive Windows system activity; correlate with Context Switches/sec and threads counter in Windows Task Manager to identify process responsible)|
|Interrupts/sec (System)||0-5000 (> 5000 indicates possible excessive hardware interrupts; justification is dependent on device activity)|
|Pages/sec (Memory)||0-200 ( > 200 warrants investigation into memory subsystem; define reads (pages in) versus writes (pages out); check for proper paging file and resident disk configuration; May indicate application memory allocation problems, heap management issues)|
|Average Disk Queue Length (Logical Disk)||0-2 ( > 2 indicates potential disk I/O bottleneck due to I/O subsystem request queue growing; correlate with Average Disk sec/Transfer)|
|Average Disk sec/Transfer (Logical Disk)||0-.020 ( > .020 seconds indicates excessive request transfer latency and potential disk I/O bottleneck; define reads/sec versus writes/sec; correlate with Average Disk Queue Length)|
|Bytes Total/sec (Network Interface)||Depends upon interface type (10baseT, 100baseT) A potential network I/O bottleneck exists when throughput approaches theoretical maximum for interface type. (For example, 10baseT theoretical maximum = 10 x 1,000,000 bits = 100 Mbits/sec divided by 8 = 12.5 Mbytes/sec)|
|Packets/sec (Network Interface)||Depends upon interface type (10baseT, 100baseT)|
Figure 4. Common Counters and Recommended Ranges
Note that these general guidelines are meant only to help detect a potential performance issue. Further investigation will confirm or deny whether a performance issue actually exists.
Resolving Performance Issues
The performance monitor logs may show that one or more of your target’s subsystems appear to be excessively utilized or that some component has reached its saturation point. Your first instinct could be to provide additional hardware resources at each bottleneck; but this strategy is not always the correct approach.
If you have now identified a resource contention issue on your system, you should properly isolate the root cause of that contention. First, you must understand what the hardware needs to do and why. The operating system simply facilitates and carries out requests from applications.
Applications can request that a system (and its hardware) perform work in a variety of ways. There are good and bad ways to perform work or move data on any operating system. Solid, source-controlled network operating systems (NOSs) rarely exhibit true scalability problems. Whether they do or do not is immaterial, because they are always static, beyond your control, and finite for that particular release or service release. Combative scalability issues do exist within the software applications written for a particular NOS. Therefore, you should be at least as familiar with the applications you are running, including their behaviors and architectures, as the operating system and hardware you are running them on.
Testing and Tuning Application Performance with Intel® Solution Services
Intel Solution Services uses a laboratory environment that is designed to help customers and software vendors test and tune their applications and solutions before those applications and solutions go live. Intel Solution Services engineers are specifically trained in the performance-tuning disciplines and adhere to a strict process methodology.
The heart of the Int el Solution Services tuning methodology is a process known as the "Top-Down, Closed-Loop Approach", which focuses on scrutinizing the entire solution stack from the top down. The approach starts at the system level (CPU, memory, I/O), works down through the application level (application code and logic), and concludes at the microarchitecture level by testing for Intel® processor instruction-level performance issues (related to Level 1 and Level 2 cache, branch prediction, single-instruction multiple-data, stream processing, and so on).
An Intel Solution Services engagement involves the installation and configuration of a customer solution or vendor application on Intel® architecture-based servers and RAID storage hardware, followed by stress testing to evaluate performance. Stress testing involves one of several workload or benchmarking tools to generate stresses that are representative of real-world client/server activity.
Although hardware subsystems often require adjustment, tuning, or modification to defeat resource-contention issues, you must examine all the factors carefully before tackling a particular performance foe. In addition, you should store all of your performance monitor logs for future reference, as both hardware and software technology change at a rapid pace. This practice will help you to identify problems in the future by comparing the performance information of a properly tuned system with one you believe to be performance-impaired.
One key to a healthy system is to use Windows Perfmon often to measure and monitor its performance for potential resource-contention issues. Let Perfmon be your first line of defense. Use lower-level tools (such as Filemon*) when necessary, to increase the granularity and detail of your data, as well as your understanding of the root causes of these problems, and also for data-correlation purposes. Never allow one piece of data from a single source to convince you that you have found a potential performance problem or resolution!
You should also consider the Microsoft Platform Software Development Kits (SDK) and Windows Resource Kits for a wealth of low-level tools to help you identify hardware and software performance issues.
The following resources provide additional information about performance monitoring:
- MSDN Performance Tuning Overview*
- Microsoft Platform Software Development Kit (SDK)*
- Windows Resource Kits Tools Downloads*
- Microsoft Windows* SysInternals*
- Microsoft* Winternals*
- Integrating Rich Client Communications with Microsoft Real-Time Communications API
- Enhancing Rich Client Communications with the Microsoft Real-Time Communications API
- Microsoft Windows*
- Digital Home and Media
- High Performance Computing
- Intel® Pentium® 4 Processor
- Intel® Xeon® Processor
- Intel® Itanium® Processor Family
- Intel® Visual Fortran Compiler for Windows forum
- Intel® VTune™ Performance Analyzer Forum
- Threading on Intel Parallel Architectures Forum