Detecting CPU-bound Applications in Server Systems

Applications in data centers process huge workloads every day. Many of them are CPU intensive, disk I/O intensive, network I/O intensive or a combination thereof. Maintaining a data center is challenging because the amount of work being run, and data being processed is getting larger, which may result in bottlenecks. When an application has a bottleneck (either CPU, disk I/O or network), the effects may result in degradation of the whole system’s performance. This blog is part of a small series of articles to share basic systems and workload analysis concepts so that users can quickly identify underperforming processes and bottlenecks, and take proper action. This blog is intended as a guideline for analyzing CPU bound applications and mentions some commonly available systems tools; be aware there are other commercial tools that allow you to track down CPU problems more precisely.

This first blog shares some useful Linux* commands users can run in order to profile whether or not their applications have are CPU-bound. Successive blogs will discuss disk bottleneck and networking bottleneck issues.

The Linux command uptime displays the system load averages in the last 5, 10, and 15 minutes. Load average is defined as the average number of processes running and waiting (for CPU, disk and network I/O). These numbers are relative to the number of cores in the system. In a single core system, ideally we want to have those numbers below 1.0. The load average numbers give an indication of system performance. In general, high load averages (relative to the number of cores) may indicate a potential problem.

In a single core system, if the load average is 0, that means there is no process running in the CPU. If the load average is 1.0, that means on average there is always a process which uses CPU time, and the core is occupied fully. If the load average is greater than 1.0, that means on average there are always processes waiting for CPU time. These numbers are relative to the number of cores available in the system.

In a server system with 32 logical cores, a load average between 0.0 and 32.0 indicates the system still has room to handle more jobs. A load average greater than 32.0 suggests the system has a high load problem. In general, if these numbers are greater than the number of cores in your system, then you have a high CPU load problem. In my tests for this blog, I used sysbench, a system benchmark tool, to generate an intensive workload and illustrate the results.

#uptime
15:01:31 up 20 days, 6:33, 10 users, load average: 28.32, 11.27, 4.46

In the example above, the load average in the last 5 minutes, 10 minutes and 15 minutes are 28.32, 11.27 and 4.46 respectively. This system has 32 logical cores. Since 28.32 is close to 32 (the total number of cores), this number indicates that this system may be experiencing a high load issue.

Load averages are shown also in the command top. The command top shows other useful information. The first line shows the load average numbers as shown in uptime. The second line displays the numbers of tasks in the system and their states. The third line, starting with Cpu(s), display user CPU time (us), system CPU time (sy), low priority user mode or nice (ni), idle time (id) and I/O wait time (wa) in percentage. User time is the CPU time used by user processes, system time is the CPU time spent by kernel processes, idle time is the time that CPU doesn’t do anything, and wait time is the time that CPUs wait on I/O activity. If the system has a high load (e.g. close to the number of cores) and the wait time percentage is high, it is likely that you have an I/O bound issue. If user percentage time or system percentage time is very high, it is likely that you have a CPU-bound issue. On the other hand, if the idle time is high, it is likely you don’t have CPU-bound issue.

#top

top - 15:53:45 up 20 days,  7:25,  9 users,  load average: 31.93 22.77,13.37
Tasks: 761 total,   1 running, 760 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.2%us,  0.0%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi,  0.0%si, 0.0%st
Mem:  49375504k total,  6675048k used, 42700456k free,   276332k buffers
Swap: 68157432k total,        0k used, 68157432k free,  5163564k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU  %MEM     TIME+  COMMAND
 89883 root      20   0 76528 2700 1752 S 3186.4 0.0 193:30.19 sysbench
 88310 root      20   0 15628 1812  960 S  0.7   0.0   2:01.82 top
 89939 root      20   0 15628 1788  944 R  0.3   0.0   0:00.04 top
     1 root      20   0 19396 1564 1256 S  0.0   0.0   0:03.24 init
     2 root      20   0     0    0    0 S  0.0   0.0   0:00.00 kthreadd
     3 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/0
     4 root      20   0     0    0    0 S  0.0   0.0   0:00.09 ksoftirqd/0
     5 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/0
     6 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 watchdog/0
     7 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/1
     8 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/1
     9 root      20   0     0    0    0 S  0.0   0.0   0:00.00 ksoftirqd/1
    10 root      RT   0     0    0    0 S  0.0   0.0   0:00.04 watchdog/1
    11 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/2
    12 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/2
    13 root      20   0     0    0    0 S  0.0   0.0   0:00.00 ksoftirqd/2
    14 root      RT   0     0    0    0 S  0.0   0.0   0:00.01 watchdog/2
    15 root      RT   0     0    0    0 S  0.0   0.0   0:00.88 migration/3
    16 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/3
    17 root      20   0     0    0    0 S  0.0   0.0   0:00.00 ksoftirqd/3
    18 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 watchdog/3
    19 root      RT   0     0    0    0 S  0.0   0.0   0:00.07 migration/4
20 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/4

In the example above, the load averages in the last 5 minutes, 10 minutes and 15 minutes are 31.93, 22.77 and 13.37 respectively.  Combining with high percentage of user time (99.2%), low idle time (0.7%), and no wait time (0.0%), we can say the system has a CPU-bound problem. Note that if idle time is high, which means there are available CPU resources, you may want to look more closely at optimizing the application.

Going down further to the process lines shown in top, all the information for the processes is sorted based on their CPU consumption. The most important fields are %CPU and %MEM fields. CPU usage per process is expressed as percentage of total CPU time. Memory usage per process is expressed as percentage of available physical memory. In this example, the application sysbench gets most of the total CPU time (shown in %CPU). The %CPU of the process sysbench is 3,186.4 (note that the maximum number is 3,200 - 100% times 32 logical cores in this server system). This is the process that is CPU-bound.

One equivalent command is ps with the aux option (“ps –aux”). This command returns a long list of all current processes in the system.

vmstat is another useful command that provides information on CPU overall utilization among other information. Under the cpu column, the user CPU time (us) is about 98% of the total CPU. Idle time (id) takes 2%. In practice, because of overhead, if processes consume more than 70% total CPU, you need to think about upgrading the system.

#vmstat

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b swpd     free   buff  cache si   so   bi    bo       in   cs us sy id wa st
33  0   0 42701636 279708 5163400  0    0    0    15    31118 6333 98  0  2  0  0

We can also inquire about the CPU information of each individual core in the system by using the command mpstat. The below command mpstat with option “-P ALL” displays CPU information for each core in the system:

#mpstat –P ALL

Linux 2.6.32-220.el6.x86_64 (knightscorner5)  12/16/2013  _x86_64_ (32 CPU)

11:25:00 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest %idle
11:25:05 AM  all   97.24    0.00    0.01    0.02    0.00    0.00    0.00    0.00  2.74
11:25:05 AM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    6   99.80    0.00    0.20    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM    8   86.40    0.00    0.00    0.00    0.00    0.00    0.00    0.00 13.60
11:25:05 AM    9   83.77    0.00    0.00    0.00    0.00    0.00    0.00    0.00 16.23
11:25:05 AM   10   78.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00 22.00
11:25:05 AM   11  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   12   95.73    0.00    0.00    0.39    0.00    0.00    0.00    0.00  3.88
11:25:05 AM   13  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   14  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   15   68.80    0.00    0.00    0.00    0.00    0.00    0.00    0.00 31.20
11:25:05 AM   16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   17  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   18  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   19  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   20  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   21  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   22  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   23  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   24  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   25  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   26  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   27  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   28  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   29  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   30  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00
11:25:05 AM   31  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00

In the example above, the output command shows all the CPU utilizations for all logical cores in this system, 32 in all. Almost all of the user CPU times in all cores are close to 100%, which mean all cores are busy.

As a system administrator, if you observe a CPU-bound application, you may try to improve it by trying many solutions, such as checking the system configuration, improving the performance of affected applications, rebalancing workloads, etc. But if you still face a CPU-bound problem after these attempts, you may want to consider upgrading your server with a higher performance processor, or augmenting your install base with additional servers. Besides offering a high-performance (higher frequency, higher number of cores), the latest Intel® Xeon® processors provide efficient management, smarter data protection and higher cache and memory bandwidth. Intel® processors also provide a platform for building the best data center based on energy efficient performance, internet connectivity and security. For more information on Intel processors, please refer to http://www.intel.com/content/www/us/en/servers/server-products.html

Finally, Intel provides a complete solution to maximize your data center performance by combining many leading industry standard products. To find out more, please refer to http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/powerful-relief-for-data-center-pain-points-brief.pdf

 

For more complete information about compiler optimizations, see our Optimization Notice.