Java* is increasingly popular as a programming language in the enterprise IT environment. Many companies have invested substantial time and money in terms of hardware, application server software, and databases. Yet most of these companies are not getting full use of the equipment and software that they have purchased or the full use of the software they have developed for those products. This is because all of these components usually require some amount of tuning before they communicate efficiently, and tuning several interdependent components is difficult and time consuming without a proper approach. This article is the first of two that present a systematic technique for optimizing Java code, specifically addressing a three-tier architecture that is extensible to any multi-tier environment. Read this article first, then move to J2EE Application Tier Tuning.
To better illustrate the discussion, we'll use a fictitious online pet store as an example. All implementation details refer to a Windows* 2000 environment, but should be applicable with few or no changes to a Windows NT or XP environment. All concepts will apply equally to a Linux* environment, but no specific Linux resources will be suggested.
This document is intended for software developers who have a minimum of one year of Java development experience, familiarity with developing test workloads, and use of performance and monitoring tools. Your environment, however, may require advanced techniques not covered in this document and may require professional services, such as Intel® Solution Services, which has extensive experience assisting independent software vendors on all Intel® based platforms.
Before You Begin
Before beginning there are several points to consider. First of all, undertaking a proper tuning project is not a small task. At a minimum, the tuning project will likely occupy two engineers full time for two months. Additional engineers will be required at various points when their area of the product falls under the microscope. That being said, once the initial investment is made, subsequent tuning projects can be accomplished more swiftly.
You must also determine the right time to start the tuning process. Preparation for tuning will probably take you about one month, possibly more depending on the complexity of the product. The tuning itself will take at least one month for a team that is inexperienced in performance tuning. This must all be timed so that the tuning can take place on code that is near ready for release or code that has recently been released but can be easily patched for your customers. Don't invest resources tuning code that will be replaced before it is ever used in production.
Define the Scope
As with any project, the first step is to define the problem. What performance problem are you trying to solve? For most applications the goal will be to tune performance for the common usage case. It is not a good idea to try to tune the application for every usage scenario. In our pet store example, users are more likely to buy puppies than they are fish, so it makes sense not to spend as much time optimizing the nuances of fish purchases. With your priorities clearly in mind you can more easily manage the scope of the project and get better results for your efforts.
Define the Workload
Now that we have limited the general scope of the activity, we need to take the definition a step further. In order to tune the application we must see how it behaves when stressed. The mechanism to stress an application is called a workload. Creating the workload is the single most important part of the tuning project. How you tune your application will depend entirely on how it is stressed. A proper workload must be:
In the case of a Web application, the concept of the workload is straight forward, but there are still some pitfalls. The easiest way to make a workload is to select a set of URLs. This can be done by identifying common user patterns. For example:
- A user accesses the home page
- Surfs through the available pets
- Adds one to the shopping cart
- Goes to checkout
- Arrives at the order confirmation page
Another way of creating a workload is to record a portion of a day's activity. The http requests can then just be played back. Either method you use will require a harness to submit the requests, maintain sessions, simulate think time, and so forth. There are several commercial load generation tools available that supply the necessary functionality. Selecting the tool for your company will depend on your budget (some products are licensed at a cost per virtual user and per day) and the amount of load capacity that you need to generate.
Consider these factors when creating a workload manually:
- You may want to create weighted URLs. In step 2 of the user scenario described previously, 50 percent of the users may surf through puppies, 40 percent through kittens, and 10 percent through your rare turtle collection.
- Not all of your users will go all the way from start to finish. That is, not all users will actually buy a pet. You may find that many users never complete the checkout stage.
- Actual users don't all choose the same item to buy. If every user in your workload tries to update the same record in the database, you will not get an accurate picture of your database query efficiency.
In short, there are many considerations to creating a representative workload.
In addition to being representative, workloads must also be measurable. They must have a key metric that can be used to gauge the performance. In the case of a Web site, hits per second, average response time, maximum response time, and transactions per second are all common metrics. These metrics are also all easily measurable in most load generators. In fact, most load generation tools will supply you with several of these metrics and more. It is important to select just one of these, though, as your main metric.
You can collect data on as many metrics as you like, and they may all be helpful in determining the speed of the Web site; however, if you target more than one metric, you will have difficulty measuring the magnitude of any change. If, for example, you monitor average response time and maximum response time, then you make a change to the system that reduces the maximum response time but increases the average response time, was that change successful? That all depends on what your goal is.
If the goal of the project is to ensure that no users, including those shopping for yaks, have to wait more than two seconds for a Web page, then your change was successful in helping move toward that goal. If, on the other hand, your goal was to minimize how long the average user spent waiting for a Web page, then your change had a negative impact. Consequently, you must select a metric that matches the project's aims and consistently track that metric.
Your efforts to analyze the application's environment will be hampered if your workload does not deliver static results. Static refers to the workload's ability to provide consistent results over the course of a run. A workload that changes behavior over time will make analysis difficult, as the system characteristics will not be meaningful. There are two common situations where workloads are not static. The first is the case of a warming period. After a reboot, an application will often run slower because its caches have been flushed and the server has not reached a state of equilibrium. This is fine so long as the warm-up period is fixed. If, as the workload is run, performance continues to change for better or worse, it will be difficult to measure the overall performance of the run.
Likewise, if the workload is created such that all users request the same pages at the same times, you will likely see swings in the performance during the course of the run. Often workloads are essentially synchronized so that all the users hit the home page, and then all the users hit the login page, and so forth. If your workload is designed this way you will end up seeing lots of sharp increases and decreases in activity that will stress the application in an incorrect fashion.
In addition to not changing over the course of the run, a workload should also not produce different results between runs. Without consistent results you will not be able to evaluate the success of an experiment. Suppose there is a 20 percent variance between runs in a static environment. If you change the Web server's cache size and the following run is 20 percent faster, how will you know whether the performance change is a fluke or a valid result? Ideally a workload should not have more than a 5 percent variance between runs.
If you are unable to remove the variance in a workload, you can regain precision at the expense of time. Either lengthen the duration of the test, or run the test repeatedly for each experiment and take the average. The downside of this is increasing your test schedule.
Apply theTesting Strategy
Once you have developed your workload and found it to meet the four essential elements, it is time to start testing. The first step is to establish a baseline. Then it is a simple circular process of analysis and experiment. How you approach the analysis is the key factor. The trick with performance tuning is to envision your environment like plumbing. If one part of the pipeline is clogged, water can only trickle through the rest of the pipes. The slowest part of your environment, be it the Web server or the homemade cross-over cable connecting the application server to the database server, limits the performance of the rest of the environment.
Figure 1. The Data Pipeline.
Once you identify the slowest component in your system, drill into it and identify the specific bottleneck. Eliminate that problem and repeat the process from the top. You may find that you never actually have to tune the source code of your application, because the other components of your system are always holding it back.
The first step is to establish a baseline measurement. This measurement is your starting point for evaluating the performance experiments. Run the workload without any profiling tools attached, collect the target metric, and repeat. Repeat the test at least five times or until you are certain of its reproducibility. Vary how you run the test by stopping and starting different elements. For instance, record the impact of rebooting the servers, not rebooting the servers, stopping and starting the application server processes, and so forth. This will give you important information to refer to down the road. Inevitably, a test run will provide strange results and you will wonder whether or not it was due to your experiment or because you rebooted a server. The better you understand the workload's behavior, the less you will have to rerun tests to validate results.
System Level Tuning
Once the baseline is established you are ready to start testing in earnest. Start out with the big picture. How well are your servers communicating, which server is stressed the most, and so on? In order to determine this information you need a tool to monitor the servers. If you are running in a Windows environment, you have a great tool already on your servers. Microsoft Performance Monitor* (Perfmon.exe, see Figure 2) has substantial functionality. It allows you to collect statistics using a wide range of server counters including CPU utilization, disk activity, and system calls per second. There are hundreds of counters available to you. Perfmon also provides two other indispensable features: logging and remote collection.
Figure 2. Microsoft Performance Monitor.
Logging gives you the ability to look back to earlier tests to evaluate counters you may not have cared about at that point. Of course, for the counters to be available you must have selected them. For that reason, it is always best to select more counters than you think you need. You do not have to display all the counters when you review the log, but it's good to have them in case you need them. This is another tip that will save you from rolling back changes and re-running tests.
With remote collection you can use one machine to control the tests and look at the state of all the servers in your environment. Another benefit to collecting all the data together is that it is already combined so you can see how the application server responds when the Web server's load spikes. You must make a few tweaks for Perfmon to collect data from the other machines. Some of these settings will create an insecure environment, but your test network should be isolated from the rest of your company and the Internet anyway. Don't let extraneous network traffic confuse your experiments.
Configuring Perfmon for Remote Collection
- Make sure the system time is the same on every machine.
- If the times are different, you can still view data in real time but your logs will all be corrupted.
- Set the same logon account for the performance log service on every machine.
- Open the Services from the Management Console (Control Panel | Administrative Tools | Services)
- Open the properties page for "Performance Logs and Alerts"
- Select the Logon tab
- Select the "This account" radio button.
- Set the account to Administrator.
- If you are on a domain, use Administrator with a standardized password.
- If you are on a workgroup, just use Administrator and set the password to blank on all the machines.
- Set the Windows registry to enable autologin for this service.
- HKEY_LOCAL_MACHINE | SOFTWARE | Microsoft | Windows NT | CurrentVersion | Winlogon
- Select "Edit" from the file menu and then select "Add Value."
- A dialog box appears. Enter "AutoAdminLogon". This name is case sensitive. Leave the data type as REG_SZ (text) and press Enter.
- A dialog box appears. Enter the value of 1. The only other value recognized will be 0. This will turn on or off the auto logon feature respectively and will work only if there is a "DefaultPassword" set in the next step.
- Type in the valid password for the domain account. One caveat; if the domain is not available or the password provided is not valid, the "AutoAdminLogon" value is automatically set to 0 (disabled).
See the Perfmon documentation to learn how to create a log file populated with the counters of your choice. You should start with at least the following counters. (Note that Perfmon counter names tend to change between versions of Windows. The counter names may differ on your platform.)
|System||Total, 0,1,...,N||% Processor Time|
|System||Total||% Privileged Time|
|PhysicalDisk||Total, 0,1,...,N||Disk Transfers/sec|
|PhysicalDisk||Total, 0,1,...,N||Avg. Disk Sec/Transfer|
Figure 3. Suggested Performance Counters.
While Perfmon provides counters to look at various network statistics, those statistics are not always collected accurately. Moreover, collecting network info with Perfmon tends to steeply increase the overhead on the system. Network analysis can be accomplished much better using a sniffer or other network tools, or both. Intel® switches come with a product called Device View* that lets you get into the switch and monitor for errors and bandwidth consumption. It is particularly useful if you are stacking switches (linking switches with stack interface modules).
Put it All Together
You now have all the tools, tests, and methodology in place to tune your system. Begin by ramping up the load on the environment. Monitor the change in stress on each tier. Remember, you are trying to find where the pipe is clogged. You will likely see the stress on one of the tiers grow more quickly than that of the other tiers. CPU utilization (%Processor Time in Perfmon) is the most common indicator of stress, but do not forget to look at the other areas suggested previously (see Figure 3). The process will be to repeatedly find a bottleneck and eliminate it. Follow the methodology in Figure 4 as you go.
Figure 4. Closed Loop Methodology
You're likely to encounter these scenarios:
1. The stress on one of the tiers grows more quickly than the others.
This is the cut and dry case. The tier that stands out is your first bottleneck. Follow the test methodology.
2. No tier shows an increase in stress as the load is increased.
You will probably also find that the response time is increasing linearly with the load. This is an indication that the bottleneck is on the front end of the system. Possibly the Web server is not configured to accept enough connections or another piece of hardware up front, such as a load balancer, is improperly configured.
3. The stress on the application tier grows more quickly than the others.
This advantageous scenario indicates that the pipes on the front and the back of the application server are wide enough to keep it busy.
When am I Done?
You are done with the first round when you have eliminated the bottlenecks in front of and behind the application server(s). Then you are free to optimize the application tier. Of course, once you have made a significant improvement to the application tier, you will have to re-evaluate the other tiers to ensure that they are capable of keeping the application tier busy at its new capacity.
About the Author
Dan Middleton has worked in the computer industry for the past eight years. He has been employed on a number of development projects ranging from medical imaging software to enterprise business applications. For the last two years he has been with Intel Corporation where works with independent software vendors to optimize their products and evaluate their products' scalability. In addition to his work with Java enterprise applications, Dan also specializes in 3D graphics programming.