Performance Testing and Tuning - Part II

by ChandraMohan Lingam


Capture good performance requirements when tuning an application.

As you have read in Part 1 of this series, we provided insight into designing good performance tests and strategies for identifying potential problems.

Now in Part 2, we will examine techniques for capturing good performance requirements and discuss practical issues that developers face when performance-tuning an application. Often, performance testing is done on the complete system and while this is comprehensive, it is expensive and allows issues to accumulate over time. Having a mechanism to quickly performance-test key components and interaction among key components can significantly minimize the performance related risks.

Practical Issues

Capturing Throughput Requirement

A good performance requirement forms the foundation of performance testing and tuning activity. Having a clear and measurable performance requirement allows the team to gain a clear picture on the overall goals and objectives. If the team is capturing performance requirements for the first time, issues like identifying throughput and how to handle concurrent users can be challenging. However, with a systematic approach, it is a fairly straightforward exercise to build a reasonable estimate.

Throughput Calculation for a Use Case
Summary of Steps

In order to calculate the throughput requirements, it must be identified how an end-user will use the system. Please note that the term user represents a person or another system.

  1. For each use case, the user performs a series of steps to complete an operation.
  2. For each step you must:
    • Determine the maximum acceptable system response time.
    • Identify the average number of calls that can be made if more than one call can be made to the system during a step.
  3. Calculate the total number of requests for an operation.
  4. Calculate the total time for an operation.
  5. Identify the simultaneous users who will be using this use case.
  6. Extrapolate requests based on the number of simultaneous users. This step is used to identify the impact of concurrency on completion time and hence the throughput. There are several factors that can impact throughput:
    • Shared access to same data.
      For example, a row in a database or a method with static variables.
    • Shared hardware resources, like a CPU cache.
    • Resource constraints like memory and network bandwidth.
    • Nature of the application. The application can be processor intensive, I/O intensive, or a mixture of both. The latter type typically scales better with concurrency.
      Note: If the application deteriorates significantly with concurrent users, Intel® Thread Profiler tools are useful in pinpointing the bottleneck.
  7. Calculate the throughput in requests per second.
  8. Define the performance requirements.


Throughput Calculation

Example 1:
In this example, imagine a retail store warehouse receiving a truckload of products. One of the use cases could be:

A store employee has to verify the product against the purchase order, visually inspect the boxes for damage, and record the receipt of the products in the system.

The steps the user goes through when receiving a product:

  1. The employee performs a quick visual inspection of the box for damages. – 5 seconds
  2. The employee scans the box and the system pulls the relevant purchase order details. – 2 seconds
  3. The employee verifies the information and confirms the receipt. – 3 seconds

Throughput calculation for this use case:

  1. The employee takes approximately 10 seconds to confirm one receipt (including time spent by the user and system).
  2. In one minute, an employee can receive 6 boxes.
  3. With 25 employees, in one minute, total number boxes that can be received = 25 X 6 = 150 boxes/minute.
  4. Each receive operation takes two requests (retrieving purchase orders and confirming the receipts). The total number of requests equals 300/minute.
  5. Total Requests/Second:
    • 300/60 = 5 requests/second (with uniform distribution of traffic).
    • If 25 employees submit requests at the same time, then the total requests may be 25 requests/second.
    • With 30% buffer, total requests comes to 33 requests/second.


Now, we can capture the performance requirement for the receive product use case:

A store employee has to verify the product against the purchase order, visually inspect the boxes for damage, and record the receipt of the product in the system. The system should support 25 users simultaneously using this capability and respond to a receipt request in less than 1.5 seconds with a sustained throughput of 33 requests/second.

Example 2:
Let's take a look at a retail store checkout process use case.

A customer arrives at the checkout counter with items to purchase. A store employee records the items, collects a payment, and completes the transaction.

Here, the steps performed are:

  1. The store employee scans each product. – 0.3 seconds/product
  2. Customer selects a payment option and pays for the purchase. – 30 seconds
  3. The store employee confirms the payment and completes the transaction. – 2 seconds


In this example, several products are scanned for each sale. The traffic for scanning products and traffic for payment processing arrive at different rates.

Scan throughput:

  1. The employee takes approximately 0.3 seconds to scan each product (including time spent by the employee and system). If the average number of products per sale is 20 items, then total time for scanning 20 items is about 6 seconds, and in 1 minute, an employee can scan 200 items.
  2. If there are 10 checkout counters, a total of 2000 items can be scanned each minute.
  3. Average requests/second is 2000/60 = 33.33 requests/second.
  4. With 30% buffer, total requests comes to 44 requests/second.

Payment processing throughput

  1. The customer selects a payment option and pays for the purchase. It takes 20 seconds to complete the payment. In 1 minute, we can receive a maximum of 3 payment requests per checkout counter.
  2. With 10 counters, total payment requests = 30 payments/minute
  3. Total Requests/Second:
    • 30/60 = 0.5 requests/second.
    • If 10 checkout counters process payment at the same time, total requests would be 10 requests/second.
    • With 30% buffer, total requests comes to 13 requests/second.


Now, we can capture the performance requirement for the checkout process use case:

The customer arrives at a checkout counter with items to purchase. A store employee records the items, collects a payment, and completes the transaction.

The system should respond to a scan request in less than 0.3 seconds with a sustained throughput of 44 requests/second while supporting 10 simultaneous users.

The system should respond to a payment processing request in less than 10 seconds with a sustained throughput of 13 requests/second while supporting 10 simultaneous users.

Retail Store System

Use Case Priority Simultaneous Users Throughput Requests/second Response Time(seconds) Transaction Mix
Employee Logon High 100 150 3 Read, Update
Checkout Process - Product Scan Request High 10 44 0.3 Read
Checkout Process – Payment Processing Request High 10 13 10 Read (total cost), Insert (sale), Update (inventory)
Warehouse Receipt Request High 25 33 1.5 Read (purchase order), Update (inventory), Insert (product receipt)


The benefits with this approach are:

  • Performance goals are measurable.
  • Baseline application performance can be measured, and areas where work is needed can be determined.
  • Teams can confidently communicate when the performance goals are met.
  • Capacity planning becomes easier.
  • Provides valuable metrics on application trend and overall impact because of application performance tuning activity.


Performance Test Early and Often

Often performance testing is performed at major milestones in a project to verify the application's ability to meet the performance objectives. The problem with this sequential approach is lack of performance-related feedback and potentially letting performance issues accumulate over time instead of addressing them early in development. In Agile software development practices, a lot of emphasis is placed on unit testing software early and often in the lifecycle. By using automated testing tools like NUnit and JUnit, unit integration level function testing is performed every night. This gives tremendous confidence to the team on the overall quality of the product.

A similar approach should be used for performance testing. Instead of waiting for specific milestones to verify the scalability of the application, the team should adopt an aggressive approach by running the unit and integration level performance tests (see Part I) early and often. These tests are used to verify the scalability of a component or a set of components. After a software change, developers can immediately verify if there are any new performance issues. This instant feedback mechanism will allow developers to identify potential issues early and confidently communicate on whether a change will impact the scalability.

A system level performance testing (see Part I) still needs to be performed at major milestones to ensure that the system as a whole meets the objectives.

While NUnit and JUnit are excellent tools for functional unit testing, they have limited capabilities for performance testing. Tools like Microsoft ACT* have more powerful performance testing capabilities, but they are limited to either Web-layer testing or testing components that support specific automation technologies like Microsoft COM. Microsoft ACT also allows you to measure various performance counters when the tests are run and record it as part of the test results. The choice of tools will be dictated by the development language and environment. One tool may not meet all your requirements; use your creativity in selecting the right set of tools for your application.

For example, even though Microsoft ACT is designed for performance testing Web applications and Web services, we can take advantage of its COM automation capabilities to performance test the database layer. Using COM data access objects, it is fairly straight forward to invoke a SQL statement or stored procedure. This approach can be used to perform timing tests, concurrency tests and measure throughput at the database layer. As the tool supports scripting languages, several parameters can be changed a t runtime to simulate the real world usage.

Sample code for performance testing a stored procedure within ACT:

Option Explicit
Dim g_oConnection
' Initialize the arrays
Dim g_aSQLStmt
g_aSQLStmt = Array( _
"exec spNameOfStoredProcedure @p1=value, @p2=value, @p3=value", _
"exec spNameOfStoredProcedure @p1=value, @p2=value, @p3=value", _
"exec spNameOfStoredProcedure @p1=value, @p2=value, @p3=value" )

' GetRandomElement
' Parameters:
' 	[in] aArray : Array to grab a random element from.
' Purpose:
' 	Returns a random element from the 'aArray'
Function GetRandomElement(aArray)
    GetRandomElement = aArray(Int((UBound(aArray) - LBound(aArray) + 1) * Rnd + _
End Function

' Entry Point.
Sub Main()
    Dim a 
    Dim rs
    Dim sql 
    Set a = CreateObject("ADODB.Connection")
    a.Open ("provider=sqloledb;server=;uid=;pwd=;database=”)
    sql = GetRandomElement(g_aSQLStmt) 	
    Set rs = a.Execute(sql)
End Sub
Main ()


New development vs. modifying an existing application

How often do you get an opportunity to build applications from scratch? When building from a clean slate you have lot of opportunities to experiment, design, and develop the software. However, the majority of development is done on existing applications or code base. In these cases, you have to work with performance issues in the legacy code. This should not stop you from following the proper process for performance tuning as identified in Part I of this article. Prioritizing the use cases and identifying performance objectives will allow focusing on specific areas of a legacy application.

Another issue that can make performance testing existing code base challenging is the testability of the application. If the application functionality is divided into logical layers with well-defined interfaces, it should make it easy to test by layers. However, when there is no logical layering or if an underlying layer is coded for specific consumer type (Web client or Windows* client), then presentation aspects can get mixed up with the business logic. In this case, it is better to re-factor the code to make it more suitable for testing. Software with loose coupling and strong cohesion is much easier to understand, test, and maintain.


Coupling refers to cross-dependency that exists across components in an application. In the worst case, every component depends on every other component in the system. Any issues with one component will impact all the other components, and the system will be very unstable and would require regression testing after minor changes to verify there are no other side effects.

For example, in the Strong Coupling diagram (Figure 1), if A changes, it impacts B, C and D. If B changes, it impacts A, C, and D.

Figure 1. Strong Coupling

In the loose coupling diagram (Figure 2), components do not have cyclic dependencies. A depends on B and C, B depends on D and C and D depends on C. C does not depend on A, B or D.

If A's behavior is modified, then it does not impact B, C or D. If B's behavior is modified, then it impacts just A.

Figure 2. Loose Coupling


Cohesion usually refers to the clarity of implementation. When there is strong cohesion, functions perform a well-defined task and all logically related functions are grouped together.

For example: A routine ComputeAndDisplayFibonacci that does both computation of nth Fibonacci number and presenting to the User interface has a weak cohesion as it is trying to do two unrelated tasks. If nth Fibonacci number needs to be computed in another part of the application, the logic will have to be duplicated.

A better approach is to implement Fibonacci number computation ComputeFibonacci as a reusable routine and delegate the responsibility of presentation to the caller. This allows ComputeFibonacci to be reused in several places without any changes or duplication.

Applications with loose coupling and strong cohesion have well-defined interfaces and are logically layered. This in turn allows for more granular performance testing.

Data to Collect during Performance Testing

In a distributed application, performance issues can be caused in any of the tiers: Web tier, middleware, database tier and so forth.

Potential areas where problems could occur:

Infrastructure and server level:
Lack of network bandwidth, CPU, insufficient memory, excessive disk I/O

Application level:
Inefficient algorithms/queries, lack of indexes, resource contention, chatty inter-process or inter-machine communication.

Client level:
Slow client machine (CPU, memory, disk I/O), slow peripherals like printers, etc.

At a minimum, the following information needs to be captured to pinpoint the location of the bottleneck.

System Level – Sampled in All Servers

Data Captured Description
CPU Utilization High CPU utilization indicates processor-intensive algorithms. Could also indicate that CPU is wasting cycles to filter out unwanted data, when a downstream system returns more data than necessary.

Low CPU utilization usually indicates network or disk I/O bottleneck or a bottleneck in the downstream system.
Disk Queue Lack of sufficient memory can cause disk thrashing. Lack of indexes or inefficient queries can cause database table scans. Average disk seek time ranges anywhere from 4-10 ms. The process is blocked until data is retrieved from the disk.
Network I/O Helps identify bandwidth problems. If the application makes too many remote calls (chattiness), it can cause severe performance issues. If an application returns more data than necessary, it can place a lot of overhead in transmitting the data.
Memory Available Available physical memory. Inefficient data structures or retrieving more information than necessary can cause high memory utilization. For example: keeping a large XML document in memory for parsing.
Memory pages/sec Indicates swapping of pages from memory to page swap file due to lack of physical memory.


Web/Middleware/Database Tiers

Data Captured Description
Invoked Routine Entry point routine within a tier. Could be a Web/Web service request URI, middle tier component and function name, Database Stored Procedure or SQL Query.
Elapsed Duration Total time to process the request.
Number of bytes – Request Number of bytes sent by the invoking tier
Number of bytes – Response Number of bytes in the response


By capturing the above information during the performance test, you will be able to identify the tier that is causing performance issues. Based on this information, you can use more sophisticated tools like Intel® VTune™ Performance Analyzer, Microsoft .NET* Profiler, Microsoft SQL Server Profiler*, Query Analyzer*, and Index Tuning Wizard*, and Compuware DevPartner* to pinpoint the bottleneck and tune the application.


This article covers techniques for capturing good p erformance requirements. In addition, we discussed how performance testing can be performed early and often at unit, integration and system levels; this approach significantly minimizes performance risks by allowing issues to be identified early in the lifecycle. Finally, we reviewed some of the key parameters that need to be measured to characterize the application.




Special thanks to Patti Kolnik and Srinivasan Krishnamurthy for reviewing this article and helping improve its overall quality.

Additional Resources


Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.