Detecting Network-bound Applications in Server Systems

Following my previous blogs, Detecting CPU-bound Applications in Server Systems and Detecting Disk I/O-bound Applications in Server Systems, I will continue the discussion in this blog on detecting a network-bound application.

When network I/O applications run, they can consume almost all available network bandwidth, which may cause resource contention and overall system performance issues, especially if multiple applications are trying to use those resources at the same time.

I will use this blog to illustrate how to detect a network I/O issue on a server (using a synthetic workload) and what tools are available to find the applications that are causing that contention. Additionally, I will show, using the same synthetic workload, the capacities of newer 10 GbE interfaces relative to older 1 GbE interfaces.

For illustration purposes, I downloaded a network benchmark tool called netperf (http://www.netperf.org) to generate heavy network traffic and installed it on two servers running the Linux* operating system. Both machines are equipped with built-in 1 GbE Intel® Ethernet Gigabit Adapters which connect to a 1 GbE Ethernet switch.

We can get information about the network interface by using the command ethtool. For example, to display the information about the first Ethernet card, we can see its speed is 1 Gb/s.

# ethtool eth0 

Settings for eth0:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Full 
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Full 
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Speed: 1000Mb/s
 Duplex: Full
 Port: Twisted Pair
 PHYAD: 1
 Transceiver: internal
 Auto-negotiation: on
 MDI-X: Unknown
 Supports Wake-on: pumbg
 Wake-on: g
 Current message level: 0x00000003 (3)
          drv probe
 Link detected: yes

To begin the netperf workload from the source machine (10.20.3.33) to the destination machine (10.23.3.34), I first initiated the network server on the destination machine that will handle traffic requests.

# netserver

On the source machine (10.20.3.33) I ran the benchmark tool netperf. The option –H specifies the host destination address; option –t TCP_STREAM specifies the type of test; option -D 1 to update the results every 1 second; option –l 20 to run the test in 20 seconds; and option -f g to display the unit in Gbits/s

# netperf –H 10.23.3.34 –t TCP_STREAM –D 1 –l 20 –f g

While the traffic is being generated, I used the common command top to look for any abnormal activity. Normally I would take this step if I experience performance issues, or if they are reported by consumers of the system. The top utility shows there is 0.0 % in I/O waiting time (%wa). This implies that there is no disk I/O activity detected. Among these applications, we see netperf is using 9% CPU.

# top
top - 12:20:44 up 35 days, 20:31,  8 users,  load average: 0.00, 0.01, 0.05
Tasks: 351 total,  1 running, 348 sleeping,  0 stopped,   2 zombie
Cpu(s):  0.1%us,  0.2%sy, 0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:     48220M total,    21068M used,   27151M free,      744M buffers
Swap:     2045M total,        0M used,    2045M free,    18886M cached

   PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+  COMMAND  
 75043 root      20   0 10952  944  756 S      9  0.0   0:00.43 netperf 
 40180 snaik5    20   0  374m  26m  18m S      3  0.1 108:10.47 micsmc-gui 
 38473 snaik5    20   0 1182m 251m  67m S      1  0.5  49:10.48 amplxe-gui
  5094 root      20   0  125m  17m 6116 S      0  0.0 254:44.35 X 
  5336 gdm       20   0  286m  61m  12m S      0  0.1 108:29.45 gdm-simple-gree 
 38385 snaik5    20   0  362m  19m  14m S      0  0.0   0:08.09 gnome-panel 
 75042 root      20   0  9048 1332  820 R      0  0.0   0:00.07 top 
     1 root      20   0 10528  808  676 S      0  0.0   0:24.12 init 
     2 root      20   0     0    0    0 S      0  0.0   0:00.68 kthreadd 
     3 root      20   0     0    0    0 S      0  0.0   0:11.41 ksoftirqd/0  
     4 root      20   0     0    0    0 S      0  0.0   0:00.07 kworker/0:0 
     6 root      RT   0     0    0    0 S      0  0.0   0:00.00 migration/0 
     7 root      RT   0     0    0    0 S      0  0.0   0:08.97 watchdog/0 
     8 root      RT   0     0    0    0 S      0  0.0   0:00.00 migration/1 
    10 root      20   0     0    0    0 S      0  0.0   0:14.15 ksoftirqd/1 
    12 root      RT   0     0    0    0 S      0  0.0   0:08.41 watchdog/1  
    13 root      RT   0     0    0    0 S      0  0.0   0:00.00 migration/2 
    15 root      20   0     0    0    0 S      0  0.0   0:21.60 ksoftirqd/2 
    16 root      RT   0     0    0    0 S      0  0.0   0:07.83 watchdog/2 
    17 root      RT   0     0    0    0 S      0  0.0   0:00.00 migration/3 
    18 root      20   0     0    0    0 S      0  0.0   0:00.00 kworker/3:0 
    19 root      20   0     0    0    0 S      0  0.0   0:10.72 ksoftirqd/3 
    20 root      RT   0     0    0    0 S      0  0.0   0:08.04 watchdog/3 
   < truncate here>

The command mpstat confirms that there is no disk I/O activity
# mpstat 1
Linux 3.0.13-0.27-default (knightscorner1)  01/30/14  _x86_64_

15:36:10     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
15:36:11     all    0.33    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.67
15:36:12     all    0.08    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.90
15:36:13     all    0.06    0.00    0.15    0.00    0.00    0.00    0.00    0.00   99.80
15:36:14     all    0.33    0.00    0.53    0.00    0.00    0.00    0.00    0.00   99.14
15:36:15     all    0.21    0.00    0.62    0.00    0.00    0.00    0.00    0.00   99.18
15:36:16     all    0.09    0.00    0.18    0.00    0.00    0.00    0.00    0.00   99.72
15:36:17     all    0.11    0.00    0.16    0.00    0.00    0.02    0.00    0.00   99.71
15:36:18     all    0.16    0.00    0.55    0.00    0.00    0.05    0.00    0.00   99.24
15:36:19     all    0.23    0.00    0.68    0.00    0.00    0.00    0.00    0.00   99.09
15:36:20     all    0.14    0.00    0.23    0.00    0.00    0.00    0.00    0.00   99.62
15:36:21     all    0.08    0.00    0.15    0.00    0.00    0.00    0.00    0.00   99.77

The two commands confirm that there is no CPU or disk I/O resource contention. As a next step, I tried to see if there is any network activity. I used “netstat –i" to show total packets received/sent at each interface.

# netstat -i

To show the packets sent/received at each interface, I used the command sar

# sar –n DEV 1 20

However, netstat and sar do not display the network bandwidth used. To figure out whether or not the network bandwidth is fully utilized, we can use the utility iftop. The utility iftop listens to network traffic and displays current bandwidth utilization by source/destination hosts over a network interface. In this example, when running iftop, I noticed that the network bandwidth between the two machines is almost saturated (0.99 Gb).

# iftop

The above figure shows that there is heavy traffic from knightscorner1 (10.23.3.33) to knightscorner2 (10.23.3.34) system and the usage is close to 1 Gb, which is the maximum bandwidth supported by the adapter.

To figure out which application is causing such heavy traffic, I installed the nethogs tool from http://nethogs.sourceforge.net . This is the top command equivalent for network bandwidth. nethogs monitors the network traffic bandwidth of each process on the system and helps identify the application that consumes the most network bandwidth.

The above screen shot shows that the application (in my experiment,  ‘netperf’) consumes the majority of the available network bandwidth.

Finally, to compare the performance of a 1 GbE adaptor with a 10 GbE adapter, I ran the same tool on two new systems both having 10 GbE adapters. Both machines (in my example, 10.23.3.62 and 10.23.3.64) have built-in 10 GbE ports and they both connect to a 10 GbE switch. I ran the netperf utility with the following –t option:

  • TCP_STREAM to measure data transfer performance

  • TCP_RR (request/response test) to measure the transaction time.

TCP_RR calculates the transaction rate which can be used to deduce latency. Latency is defined as the time needed to send a packet from one host to the other, i.e., the round-trip time. RRT = 1/transaction rate.

In the first host 10.23.33.64, I ran these two tests:

#netperf –H 10.23.3.62 –t TCP_STREAM –D 1 –l 20 –f g

#netperf –H 10.23.3.62 –t TCP_RR

Similarly, in the second host 10.23.3.33, I ran the same tests:

#netperf –H 10.23.3.34 –t TCP_STREAM –D 1 –l 20 –f g

#netperf –H 10.23.3.34 –t TCP_RR

Each test was repeated three times. I calculated the average throughput (in Gb/s) and latency (in usec) for 1 GbE and 10 GbE adapters. The figures below summarize the results of my tests

 

                          Throughput measured for 1 GbE and 10 GbE adapters

                       Latency measured for 1 GbE and 10 GbE adapters

From what we can see here, the 10 GbE network interface is able to sustain far greater throughput, while minimizing latency. For these reasons, if you are experiencing system performance issues due to network I/O contention, it is worth considering upgrading to 10 GbE.

This blog shows a simple approach to analyze whether your system is suffering performance issues due to network I/O contention. If a network-bound problem is identified and there is no other way to improve the network availability (by better distributing network I/O workload across more systems, or reducing application I/O reliance through data compression), you may consider upgrading network interfaces with an Intel® Ethernet Gigabit Server Adapter to increase the available network bandwidth.

For more information on Intel® 10 GbE products, please refer to http://www.intel.com/content/www/us/en/network-adapters/gigabit-network-adapters/ethernet-server-adapters.html and http://www.intel.com/content/www/us/en/network-adapters/converged-network-adapters.html

 

For more complete information about compiler optimizations, see our Optimization Notice.