Get insights into designing well-built performance tests and strategies to identify potential problems. Perform successful performance testing and design tests to realistically reflect the usage scenarios.
Application performance tuning is an interesting, challenging and rewarding exercise. When performance is treated as a key requirement, applications would be designed with performance and scalability in mind. In reality, however, performance often does not get enough priority until it is too late with customers complaining. Application development techniques are gradually evolving toward a collaborative model where applications are composed by leveraging existing services and software components. The users of an enterprise application can be located worldwide, and network connectivity needs to be considered in order to deliver a good end user experience. All these introduce several external variables and factors that can cause performance issues.
The primary focus of this article is to provide some insight into designing well-built performance tests and strategies for identifying potential problems. Most of the discussion is generic in nature and can be applied to any technology and platform.
The goal behind performance testing is to ensure that the application is capable of handling specific work loads while maintaining a good response time. Just like any other software development activity, performance testing costs time and money. Not going through performance testing would cost a company in terms of lost customer revenue and goodwill. Having an efficient process to address performance issues is crucial to ensure quality while minimizing time and costs.
Some of the high-level objectives behind performance testing are:
- Verify performance requirements
- Characterize application performance
- Assess scalability of the application
In order to do so, we need a series of tests that are:
- Capable of identifying bottlenecks
- Comprehensive in detecting potential issues
- Require minimal time
Depending on the complexity of the application, performance testing and tuning is often a difficult and time-consuming exercise. Adequate time and resources need to be budgeted for performance testing and tuning activities.
Performance Test Design
Types of Performance Testing
From an application quality perspective, there are several levels of testing:
- Unit Testing
Verify that the individual components within the system are working correctly
- Integration Testing
Verify that the components and the interfaces are interacting correctly
- System Testing
Verify the overall system functionality.
Similar concepts can be adapted for Performance testing. In QA testing, black box testing is the widely used approach. The system is thoroughly tested from the end-user perspective by simulating various usage scenarios. From a performance testing perspective, white box testing often gives better results. Knowledge of implementation of the system is very useful in testing various sections of the application to confirm the scalability.
In order to ensure that the application has good performance, a programmer needs to ensure that the individual components themselves each have good performance. If any of these components have poor performance issues, it will ripple across the entire application. In an execution path, performance is only as good as the weakest link, so, ensuring all important components perform well is probably the most critical step.
The next step is verifying the performance of the integrated components. This step is used to identify performance issues at the interfaces level and component interaction level. At this step, running through all possible scenarios can be very expensive as the number of potential test cases starts expanding. So, testing the most commonly used scenarios is good way to focus. If the application depends on existing Web services or components, these should be independently tested to verify their scalability.
The final step is verifying the performance of the complete system. The primary goal is to mimic how the consumers will interact with the system. The complexity of testing depends on how the services are exposed to the consumers; remember, the number of potential usage scenarios can be very large. Running through all possible scenarios can consume time and resources and so must be dealt with carefully.
In case of a non-GUI system, performance testing can be straightforward. However, if the application has several user interfaces, this testing could turn out to be the most complex type of performance testing. The user behavior needs to be mimicked using test code and is closely tied to the presentation aspects of the user interface. For example, selecting a product, adding a product to the cart, entering credit card information, placing an order, and so forth.
Any change to the user interface impacts this layer of testing. Very quickly you will find yourself in a situation where the bulk of the test code is filled with User Interface (UI) issues rather than true performance testing.
A good strategy to use is to build the tests as reusable functions so that more complex test scenarios can be built using the basic tests. An example of this is a logon test function that can be used in routines that place orders.
This is just the beginning; to really characterize the performance of an application, additional types of testing need to be performed. The unit, integration and system testing can be performed under normal, peak and extreme loads.
Load testing involves exercising several functionalities concurrently under normal and peak loads. The goal behind this step is to identify concurrency issues when several sections of the application are exercised simultaneously. Visualize the scenario of concurrently performing all the steps. For example, users placing orders, c ustomer service reps accessing order details, order fulfillment workflow and so forth.
Stress testing is exercising several functionalities under extreme loads. The goal is to identify application and the infrastructure issues when pushed beyond its designed limits.
If the application is hosted in a central server or end-users are in multiple locations, then network testing is essential for verifying the usability of the application from various locations. This is used to measure the response times at the end user’s machine. This step can be performed by walking through various scenarios and using a stop watch to measure the time, however, more sophisticated load generation tools can be used if needed. In most cases, simple time measurements along with number of bytes transferred, network bandwidth and user distribution will be sufficient to extrapolate the performance under real-life conditions.
Tool selection is one key characteristic of performance testing in that most testing requires tools to generate traffic. A large number of users may need to be simulated to test the performance under heavy usage. Doing the testing manually may be insufficient, imprecise and incorrect. Performance testing also requires good distribution of data to ensure that same or similar set of data is not used for all requests generated during a particular run. For example, you might want each test request to place orders for different products using different customer accounts.
To do all this, you need tools that are flexible enough to define your rules and requests. Tool vendors promise automated code generation capability by recording your interactions with the target application and later use this generated code to run your tests. In reality, this is a poor way to do performance testing. The problem is that these scripts mimic your exact interaction with the system and capture the exact data you entered during the recording session. In order to make the script more dynamic (i.e., selecting different products, using a different user account, performing a different action), you would end up spending significant time in cleaning up the scripts and adding your custom code. Also, any change to the user interface will invalidate the scripts.
The tool should support a programming interface to customize the test routines. If you are able to specify the behavior using scripts, then it offers more flexibility to exactly define how the tests should behave. In addition, the tool should allow collection and reporting of various metrics, like bytes transferred and received, response times, request URL, total requests generated, and so forth. You would also need tools to monitor the memory usage, CPU utilization, network I/O, disk I/O, and transaction time. The load testing tools, generally, have features to capture server metrics.
There are several load generation tools available in market: Microsoft ACT*, Mercury LoadRunner*, Rational Performance Tester*, and Empirix e-Test* are a few examples.
Test Results Verification
Even though performance testing is not focused on verifying functionality, a programmer would still need to verify the results of each request to make sure the requests were successfully processed. After all, who wants an application that is very responsive, but gives the wrong results? After each request, there should be logic within the test to verify the results. All concurrency, timeout, deadlock and any other type of runtime errors are going to come back as a response to the request. The script should check the responses to ensure that the data is in line with the expected results and if not, should log appropriate messages.
Data Generation and Partitioning
As discussed in the earlier sections of this article, performance testing may have to be done in several stages to characterize the application performance. As large numbers of requests are generated during each run of the performance tests, you would need a good collection of data for testing. For example, in an online retail store, you would need a grouping of products and categories, sample customers, and different shipping carriers.
If the purpose of the test is to verify the backend fulfillment and work flow process, then a large number of orders should be available in the system. This would allow the automated tests to use orders and take them through the fulfillment process. Data here can be generated in a few different ways:
- Run the performance tests so that data creation is in itself a performance test. For example, order placement by a customer is an independent performance test. Once this test is run, a large number of orders will be available in the system. Subsequent tests can use these orders and take it through the workflow.
- Build an application to generate the data.
- Use existing production data.
The generated data should also be pseudo-random in nature so that it mimics how the data will be in the real system. In our online store example, you would not want to create all orders under a single customer or use one product in all the orders; rather, you would want to distribute the orders across multiple customers and orders should have a variety of products.
If the application is a database driven application, then you would want to make sure there is a good volume of data in the system. If the system needs to handle 10,000 orders a day, then in a month the system would have more than 300,000 orders, and in a year it would have more than 3.6 million orders in the database. How is the application going to work with progressively increasing data? To assess the performance over time, a large amount of data (around a year’s worth should be a good place to start) would need to be generated. Most of the performance issues will start showing up when there is a good collection of data available.
Repeatability and Maintainability
As there are several steps in successful performance tuning, you will have to run the tests several times under varying factors. Once again, the tool you select should support your need to change the test parameters and data quickly to allow you to re-run the tests. By keeping the overhead to a minimum, you can potentially study the application under several conditions and thereby get a better understanding on the problems.
Maintainability refers to the ability to modify the test code as the application changes. When the UI or business functionality is changed, you would need to modify the t ests to validate if the application performance is acceptable. To do so, the test scripts should be easy to modify and a version control system will be needed to ensure that you are keeping track of changes.
One common pitfall when doing performance testing is testing only simple scenarios. Testing simple scenarios can be very inadequate and give a false sense of security. Testing complex scenarios requires more design work and code to properly simulate the use case. Using complex scenarios early in the testing life cycle would help uncover performance issues that would otherwise go undetected. If complexity prevents you from executing full system testing, try breaking the tests into smaller unit or integration level tests.
Performance Testing Infrastructure
In order to perform realistic testing, you would need to generate a large number of requests by simulating several concurrent users. In order to generate such a load, you would need fairly powerful systems. Having a less powerful machine would cause bottlenecks in the load generation side of the testing. This would potentially skew the results and may result in incorrect conclusions. Having powerful traffic generation servers would allow you to stress the application to your identified goals.
The test infrastructure should be as close to the production infrastructure as possible. For example, if the application is going to be load balanced across multiple Web and application servers, a similar setup in the performance testing infrastructure would be required to accurately verify the behavior of the application. This would allow for verification of the scalability of the load balancers, simulation of server failures, and ability to redirect the traffic away from the failed servers to the working ones.
Performance Testing Process
The entire performance tuning process can be broadly classified into the following steps:
Prioritization of Use Cases
Participants: Domain Experts, Customers, Developers
Outcome: List of use cases and Priority (High, Medium, Low or numeric ranking)
Identify areas of functionality that are critical to business and set performance goals. From a performance perspective, not all use cases are of same priority; focusing on important and high-risk use cases is crucial for successful performance testing. The key idea behind this effort is to set goals on what needs to be achieved.
During requirements gathering, if use cases were prioritized based on business needs, the same use cases could be used for performance prioritization. Use cases need not be elaborate; a paragraph of descriptive text is often sufficient.
Defining what each of these priorities mean would help in categorizing usage scenarios correctly. A good way to prioritize is to assess the dollar impact or productivity impact:
- High – Use cases are crucial for running the business. Lack of availability or degradation in performance could impact large number of users and cause potential loss of business.
- Medium – Business can still run successfully; slowness in some of the functionality could potentially impact a small subset of users.
- Low – Rarely used functionality or alternate mechanism exists to perform similar function.
Workload and Performance Objectives
Participants: Domain Experts, Developers
Outcome: For each prioritized use case, identify number of users, worldwide user distribution, transaction mix (read, write, update, delete), number of transactions per hour, expected response time.
Design and Development of Automated Tests
Participants: Developers and some input from Domain Experts
Outcome: Repeatable automated tests
This is one area where agile management and development techniques can be heavily utilized. By going through the prioritization and workload estimation phases, the team will have a good understanding on the areas where the performance is critical. As you work through each use case, it is often beneficial to complete the end-to-end optimization. Go through the entire process of performance tuning from test creation, testing, measurement, to tuning, and finally refine the process based on the learning.
Measuring Application Performance
Participants: Developers and Server Administrators
Outcome: Response time under varying loads. For example, Data Transferred in Bytes, Sampling of Server Metrics like CPU, Memory, Disk I/O, and Context Switches.
Participants: Developers, and Server Administrators
Outcome: Based on the measurements and the performance objectives, optimizations can be performed at the application, database, or at the infrastructure level.
This article introduces some of the most basic issues that you need to consider when doing performance testing and tuning. Performance testing and tuning requires knowledgeable resources that have a good understanding of software programming and protocols used in an application. The keys to successful performance testing are understanding the architecture of the application, identifying performance requirements, and designing tests to realistically reflect the usage scenarios.
- Performance Testing and Tuning, Part II
- [Kaner 1997] Cem Kaner. Pitfalls and Strategies in Automated Testing, Computer. April 1997
- [Vasireddy 2004] Srinath Vasireddy, JD Meier, Ashish Babbar, Alex Mackman. Improving .NET Application Performance and Scalability*, Microsoft Patterns and Practices.
- [Fowler 2003] Martin Fowler. UML Distilled: A Brief Guide to the Standard Object Modeling Language, Addison-Wesley.
- [Tsai 2005] Wei-Tek Tsai. Software Verification, Validation and Testing*. Arizona State University.
- [Lingam 2005] ChandraMohan Lingam. HTTP Compression for Web Applications, Intel Developer Zone
Special thanks to the following reviewers: Vladislav Rudkovski, Mark Olson and Ram Dharmarajan.