Supercharge Java* Applications on Multi-Core Servers


Enterprise Java applications may take advantage of multi-core processors without changing a single line of source code. In this paper, we examine performance speedup of about 1.90x moving from single core to dual core Intel processors for a J2EE application. We discuss performance characteristics of J2EE applications and why they are already quite suitable for dual core performance. We will also discuss some new features of dual core processors that will help bring out better performance from existing applications.


The Java programming language, unlike other traditional programming languages such as C, provides built-in mechanisms for concurrency including support for threads and locks [Java programming language specification, Gosling, Joy, Steele, Bracha, 2000].  Creating an application thread in Java can be as simple as defining a class that extends java.lang.Thread and creating an instance of that class.[1]  Thread operations such as suspending, resuming, and scheduling are also as simple as invoking a method on a thread object.  In addition, most Java-based application frameworks provide an environment for easy thread management.  This is especially true for most J2EE application servers such as BEA WebLogic Server and IBM WebSphere Application Server that have sophisticated built-in thread management systems to take full advantage of Java programming language’s concurrency feature.  Application developers may simply rely on a J2EE application server’s thread management and not worry about explicit thread management: For example, J2EE application servers will manage threads using an internal thread pool to handle different Enterprise Java Beans (EJBs) in an EJB container.  Use of threads and their smart management are important in J2EE applications which must service numerous and simultaneous requests from multiple clients quickly.


While Java language features and J2EE application server’s sophisticated thread handling make managing concurrency in J2EE applications easier and more efficient, these applications have to execute ultimately on a processor. 

Intel’s new dual core processors have two fully functional processing cores packaged into a single chip.  Dual core processors have double the decoder and execution resources, double TLB and first level caches for both instruction and data, and double second level cache resources with a significant front-side bus speed increase when compared to a single core processor.  All of these features increase performance of multithreaded applications significantly as more processor resources are available to the application. 


To take full advantage of dual core Intel processors, single-threaded applications have to be rewritten and recompiled, but since most J2EE application servers are already heavily threaded, the performance gain by moving to dual core Intel processors will come at no or small cost.  That is, existing J2EE applications should be able to take full performance advantage by moving from a single core processor to a dual core Intel processor without having to rewrite a single line of source code.

In the following section, we examine the performance of a multi-tier J2EE application running on single core processor and dual core processor systems.


Performance Comparison of a J2EE Application

We examine and compare the performance of a multi-tier J2EE application that exercises many J2EE technologies including the web container, Enterprise Java Bean (EJB) container, Java Messaging Service (JMS), transaction management, and database connectivity running on single core processors and dual core processors.  The application makes use of a number of J2EE services including distributed transactions, object persistence, dynamic web page generation, messaging and asynchronous task management, and transactional components among others.


The application server we used to run this J2EE application creates a number of threads to service different types of client requests such as remote method invocation (RMI) based on configuration files.  Since dual core processors are capable of handling a larger number of threads more efficiently, we specified the use of different numbers of threads for single core and dual core processor systems to obtain the best performance for both systems.  However, some Java application servers already have the ability to adjust the number of threads adaptively based on server load and response times, so this step may not be required depending on which Java application server is being used.


Moreover, since running the application on dual core system may result in performance bottlenecks not observed on single core system due to its higher throughput, we performed performance analysis of our dual core setup to remove any performance bottlenecks (e.g., network) that may have been introduced.


This J2EE application is a three-tier application consisting of client tier (drivers), application server tier, and database tier.  Figure 1 illustrates an example of how this J2EE application may be configured and deployed.


Figure 1.  A multi-tier J2EE application


In this figure, a multi-node application server cluster is used to service the requests from the clients (drivers).  This is a scalable architecture, where the total performance depends on the number of application servers and the performance of each application server. In the following table, we compare the performance on a single application server node. Two Intel servers are compared: one server with two single core processors and another server with two dual core processors. Table 1 shows detailed system configuration used in this comparison.


Table 1.  Detailed system configuration


Single core

Dual core


2P Intel® Xeon™ processor running at 3.6GHz with 2M L2 cache per processor package

2P Intel® Core™ 2 Duo processor running at 3.0GHz =




Operating System (OS)

RedHat* Enterprise Linux AS 4.0 Update 2

Database Processor

4P Intel® Itanium® 2 processor 1.6GHz with 9M L3 cache



As it can be seen from Table 1 both single core and dual core configurations use the same database backend.  Moreover, both the single core application server and the dual core application server run on the same software stack including the operating system, Java virtual machine, J2EE application server, and the same application.  The only component that varies in the comparison is the hardware used in the application server node which includes the processor and the type of memory[2].


Figure 2.  Performance comparison between single core and dual core application server


Figure 2 shows the performance of our J2EE application running on a single core processor and running on a dual core processor system.  The performance is measured in terms of throughput, which is measured by the number of transactions that were successfully completed by the application server in the presence of certain response time criteria.  As Figure 2 shows, the performance of our J2EE application running on the same software stack increased by a factor of about 1.90 by moving from a single core processor server to a dual core processor server.  It is worthwhile to note that our J2EE application is relative free of contended locks and that further tuning may be needed for J2EE applications that suffer from other performance bottlenecks.  We address the issue of performance analysis in the following section.


Performance Analysis Methodology

Not all applications may benefit equally from moving to dual core processors due to their runtime characteristics as mentioned above.  Moreover, applications may develop new performance bottlenecks when moving to dual core processors.  This is especially true for J2EE applications because they usually involve multiple computers linked via a network such as a database machine.  To address performance bottlenecks that may have been introduced, one has to look at different systems in isolation to address intra-system performance issues such as disk bottlenecks as well as in conjunction with other systems to address inter-system performance issues such as network bottlenecks.

The performance analysis methodology we recommend is top-down, data-driven, and iterative approach which is explained in detail in “Enterprise Java Performance: Best Practices” by Chow et al. [Intel Technology Journal, Vol 7, Issue1, February 2003].  In it, the authors recommend taking a systematic approach of examining performance data from higher level system data to lower level micro-architectural data to identify performance issues.  Figure 3 illustrates this top-down approach.


Figure 3 - Top-down performance analysis approach


Once one identifies performance issues by using the top-down approach, he or she needs to take “data-driven” and “iterative” approach to address those performance issues.  “Data-driven” means that one must measure performance data to guide next steps and “iterative” means that the process must be repeated until the desired performance level is achieved.  Figure 4 illustrates this iterative approach.


Figure 4 - Iterative performance analysis approach


By using top-down, data-driven, and iterative performance analysis approach, one can take the full advantage of moving to dual core processor systems.

Making the switch

We have shown in the above example that one of the simplest and easiest ways of boosting the performance of some threaded J2EE applications is to switch to a dual core processor.  Intel’s new dual core processors double the decoder and execution resources, TLB and instruction and data cache which result in increased performance.  In addition, because many J2EE applications are multithreaded by their nature and most J2EE application servers have sophisticated threading capabilities already built-in, J2EE application developers may not have to change a single line of source code to take full advantage of Intel’s dual core processors.  Top-down, data-driven, and iterative performance analysis approach will ensure that J2EE applications take full advantage of dual core processor systems.

[1] This is, of course, rather simplified view of Java threads.  There are other ways of creating application threads in Java and many operations that can be performed on them.

[2] The size of memory remains the same across the two servers

For more complete information about compiler optimizations, see our Optimization Notice.