Considerations for tuning Your Intel® Xeon Linux*/Apache* Server

In this blog, I will discuss a list of useful Linux commands to run, to tune your system for running the Apache web service, as well as some Apache tuning configuration changes which have shown to improve performance in some of our benchmarks.   In our tests, we used a software benchmark (SPECweb2009) to simulate load, however the analysis and tuning process described here applies to anyone who is trying to get more performance out of their Linux web servers.

We ran our tests on an internally developed system, however equivalent platforms can be found easily through your normal OEM providers.  Note: Canoe Pass is the platform that we used internally but there are equivalent server boards on the Intel(R) website.

Here is a sample of a system configuration:

Table 1

Below are some areas and the associated Linux commands to examine the system environment for tuning.  Tools that can assist in determining the right load for the benchmarks are:

On Linux, “sar” and “top” commands were used to collect data.  “kSar” is the a Java application visualizing your “sar” data, creates easy-to-read graphs which is licensed under the BSD license.  You can download the “kSar” package here:

Here are some Linux commands:
1)CPU Usage
>sar –u – use option –u to display CPU usage for the current day that was collected until that point
>sar –P ALL -- If the value for %iowait (percentage of the CPU being idle while waiting for I/O) is significantly higher than zero over a longer period of time, there is a bottleneck in the I/O system (network or hard disk). If the %idle value is zero over a longer period of time, your CPU(s) are working to full capacity.
>top (stands for table of processes) – use to display a list of processes that is refreshed every two seconds.

2)Memory Usage -- tells the number of paging operations to disk during the measurement interval, and this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. 
>sar –r  -- use option –r to generate an overall picture of the system memory (RAM). The last two columns (kbcommit and %commit) show an approximation of the total amount of memory (RAM plus swap) the current workload would need in the worst case (in kilobyte or percent respectively).

3)Kernel paging statistic
>sar –B -- Use option -B to display the kernel paging statistics. The majflt/s (major faults per second) column shows how many pages are loaded from disk (swap) into memory. A large number of major faults slows down the system and is an indication of insufficient main memory. The %vmeff column shows the number of pages scanned (pgscand/s) in relation to the ones being reused from the main memory cache or the swap cache (pgsteal/s). It is a measurement of the efficiency of page reclaim. Healthy values are either near 100 (every inactive page swapped out is being reused) or 0 (no pages have been scanned). The value should not drop below 30.

4)Disk I/O Statistics - tells the percent time that your hard disk is idle during the measurement interval. In this case it's time to upgrade your hardware to use faster disks, add more disks, use a RAID controller, or scale out your application to better handle the load.
>sar –d -- Use the option -d to display the block device (hdd, optical drive, USB storage device, and etc...). Make sure to use the additional option -p (pretty-print) to make the DEV column readable.  If your machine uses multiple disks, you will receive the best performance, if I/O requests are evenly spread over all disks. Compare the Average values for tps, rd_sec/s, and wr_sec/s of all disks. Constantly high values in the svctm and %util columns could be an indication that the amount of free space on the disk is insufficient.

5) System Context Switching statistic – tells how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode.  The heavier the workload, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant.
>sar –w – Use the option –w to display context switch per second.  This tells how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode.  The heavier the workload, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant.

6)Swap Space Usage
>sar –S – Use option –S to reports the swap statistic. If the “kbswpused” and “%swpused” are at 0, then your system is not swapping.  Here are some other options:
• Use “sar -R” to identify number of memory pages freed, used, and cached per second by the system.
• Use “sar -H” to identify the huge pages (in KB) that are used and available.
• Use “sar -B” to generate paging statistics. i.e Number of KB paged in (and out) from disk per second.
• Use “sar -W” to generate page swap statistics. i.e Page swap in (and out) per second.

7)Network Usage Statistics -- ensures that network interrupts are distributed evenly on all available cores.
>sar – n KEYWORD
KEYWORD can be one of the following:
• DEV – Displays network devices vital statistics for eth0, eth1, etc.,
• EDEV – Display network device failure statistics
• NFS – Displays NFS client activities
• NFSD – Displays NFS server activities
• SOCK – Displays sockets in use for IPv4
• IP – Displays IPv4 network traffic
• EIP – Displays IPv4 network errors
• ICMP – Displays ICMPv4 network traffic
• EICMP – Displays ICMPv4 network errors
• TCP – Displays TCPv4 network traffic
• ETCP – Displays TCPv4 network errors
• UDP – Displays UDPv4 network traffic
• SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
• ALL – This displays all of the above information. The output will be very long

Apache Tuning
For more details on the Apache tuning parameters used please see:

In our tests, one important Apache tunable that can give you a performance boost is to enable Keep-Alive. With Keep-Alive on, the same connection between client and web server will be reused to transfer multiple files. Consequently, this will reduce latency associated with HTTP transfers and lower CPU utilization. However, Keep-Alive will also increase memory usage on the Server as it has to keep connections open for new requests, so we recommend you monitor your system’s utilization after changing this parameter to make sure the change did not result in performance regression for your clients 

“IfModule worker.c” is a section in the “HTTPD.conf” Apache configuration file.  After multiple runs of synthetic workloads (such as the SPECweb2009 benchmark), we adjust the settings to reach maximum performance.  For our setup, here is an example of the settings for the “IfModule worker.c” section:

<IfModule worker.c>
       ServerLimit             1024
       StartServers            8
       MaxRequestWorkers       20000
       MaxClients              16384
       MinSpareThreads         64
       MaxSpareThreads         20000

       ThreadStackSize         128000
       ThreadLimit             64
       ThreadsPerChild         64
       MaxConnectionsPerChild  64

       Timeout                 3600
       MaxRequestsPerChild     0
       KeepAlive               on
       KeepAliveTimeout        30
       MaxKeepAliveRequests    0

        ListenBackLog           128

Performance tuning is unique to each system and network environment’s demands; what I have provided in this blog is one method for tuning on the Intel® architecture.  If you have other methods, I would encourage you to share them.  

For more complete information about compiler optimizations, see our Optimization Notice.