This paper will showcase the scalability and performance of IBM* DB2 9.5 on the newest Quad-Core Intel® Xeon® processor, compared to IBM DB2 9 on Dual-Core Intel® MP System. You will be able to observe how the new DB2 9.5 pureXML features, in combination with the Quad-Core Intel® Xeon® processor delivers excellent performance and provides real measurable benefits. We will review the new IBM DB2 9.5 pureXML features, the new 45nm Hi-k next generation Intel® Core™ microarchitecture, and the resulting scalability of a native XML application on the latest Quad-Core Intel® Xeon® processor. To show the real benefits of the next generation Quad-Core Intel® Xeon® processor 7300 Series with the latest IBM DB2 9.5, we will be comparing this combination to the Dual-Core Intel® Xeon® Processor 7100 Series running DB2 9.
Databases are an inherent part of a complex IT infrastructure. These databases can be composed of modern and legacy systems. Exchange of data among these systems, and adaptation to rapidly changing business conditions are critical to the success of today’s organizations. With an increasing number of firms turning to XML to help them implement their infrastructure, many savvy IT leaders are looking for ways to effectively share, search, and manage the wealth of XML documents and messages that their firms are accumulating. With the recent launch of the next generation Quad-Core Intel® Xeon® processor 7300 Series and IBM DB2 9.5, businesses can now take advantage of the processing power and performance benefits these two products have to offer.
This paper shows the combined performance of the Quad-Core Intel® Xeon® processor 7300 Series and IBM DB2 9.5 pureXML technology enhancements. Several measurements and tests comparing the next generation Quad-Core Intel® Xeon® processor 7300 Series with the latest DB2 9.5 to Dual-Core Intel® Xeon® Processor 7100 Series running DB2 9 are provided to illustrate the processing power and capabilities of these latest products. This is a follow-on paper to the initial results we published last year on the Dual-Core Intel® Xeon® Processor 7100 Series with DB2 9 pureXML technology.
The test scenario is the same as the earlier paper  simulating a data-centric financial brokerage scenario using the TPoX Benchmark. First, baseline performance metrics are presented for the query-only workload based on seven read-only queries. The scalability and the functionality of DB2 9.5 are then demonstrated by applying XML Inlining and compression when creating the database. Last, the efficiency of the Quad-Core Intel® Xeon® processor 7300 Series is demonstrated by comparing it to the previous generation Dual-Core Intel® Xeon® Processor 7100 Series and by studying the behavior of the XML workload while increasing the memory and storage in the test configuration.
Intel Xeon MP Server Processors
As computing demands grow, there is an increasing need for an improved platform that offers peak performance and reliability. Additionally, data demands drive the need for server consolidation and increased parallelism. Increased concurrency within an application’s architecture is readily and efficiently harnessed with multi-core processors that are equipped with high capacity caches. Intel has been deploying multi-core processors across their entire product line from notebooks all the way up to high-end servers, in which two or more “execution cores” are placed within a single physical processor. This expansion of parallel processing capabilities makes Intel® Xeon® server processors a perfect fit for applications that can inherently take advantage of such parallelism, such as DB2. Please see  for more details on Intel Xeon 7300 processor Series.
Quad-Core Intel® Xeon® processor 7300 Series
For this performance study, the newly released Quad-Core Intel® Xeon® processor 7300 Series for MP server systems is used. Based upon the 45nm Hi-k next generation Intel® Core™ microarchitecture, the Xeon® 7300 Series processor has numerous key features aimed at providing high performance and reliability.
Quad-Core Intel® Xeon® processor 7300 Series features:
- 45nm Hi-k next generation Intel® Core™ microarchitecture
- Quad-core processor
- Intel® 64 bit Architecture
- Large 8MB L2 cache per core
- 1066 MT/s Dedicated High-speed Interconnects
- Intel® Virtualization Technology (Intel® VT)
- Demand-Based Switching (DBS) with Enhanced Intel SpeedStep® Technology
The Quad-Core Intel® Xeon® processor 7300 Series is built for virtualization and data-demanding applications. Virtualization
Offering maximum flexibility, IT managers can now build one compatible group of platforms for live migration across all of their 45nm Hi-k next generation Intel® Core™ microarchitecture based servers including 1 processor, 2 processor, and the new 4 processor Quad-Core Intel® Xeon® processor 7300 Series -based servers. The ability to conduct live VM migration offers tremendous flexibility for fail-over, load-balancing, disaster-recovery, and real-time server maintenance scenarios. And thanks to a new feature called Intel® Virtualization Technology FlexMigration (Intel® VT FlexMigration), IT will have the capability to add future Intel® Xeon® processor based systems to the same resource pool when using future versions of virtualization software. This gives IT the power to choose the right server platform to best optimize performance, cost, power and reliability. These processor enhancements are further supplemented by Intel® networking technology improvements, such as Virtual Machine Device Queues (VMDq), which sort data through multiple queues in the silicon, resulting in efficient network processing.
Quad-Core Intel® Xeon® processor 7300 Series-based platforms help drive the data- and transaction-intensive applications faster. This a llows applications to know more, know it faster, and respond more quickly. With tremendous 64-bit performance and broad 32-bit application support, data centers can become more efficient and responsive to business needs. By delivering better performance for applications requiring reliable, large-scale computing solutions, one can deploy increasingly powerful business tools to track the marketplace and identify previously hidden opportunities.
Servers based on the Quad-Core Intel® Xeon® processor 7300 Series contain 2x the cores, 4x the memory capacity, and up to 8 MB of on-die L2 cache as compared to the Intel® Xeon® 7100 Series. Together with dedicated high-speed interconnects (DHSI) and the performance-enhancing and energy-efficient technologies of the Intel Core microarchitecture, these servers help well-threaded, data-demanding applications perform at their peak.
The new features in the Quad-Core Intel® Xeon® processor 7300 Series allow it to outperform the Intel® Xeon® 7100 Series processor by up to 78% on business processing (ERP, SCM, CRM), up to 53% on database applications, and more than 100% on e-commerce applications . These performance increases are combined with lowered powered consumption, so that the processor delivers up to 3x performance per watt improvement over the previous generation.
Intel® 7300 Chipset
Intel® has surrounded the Quad-Core Intel® Xeon® processor 7300 Series processor with a platform that provides balanced computing. The Intel® 7300 Chipset  with data traffic optimizations improve data movement across Intel® Xeon® processor 7300 based servers by increasing interconnect bandwidth, optimizing system bandwidth, increasing memory capacity, and improving network traffic processing while reducing I/O latency as compared to previous platforms. All these platform advancements contribute to the improved performance of the Quad-Core Intel® Xeon® processor 7300 Series. The Intel 7300 Chipset has 28 lanes of PCI Express* with support for third-party expanders for additional I/O.
DB2 9.5 Data Server
The DB2 9.5 data server with its data automation and performance enhancements is designed to reduce the cost of deployment and management of data. Key enhancements include workload management to reduce administrative burdens on database administrators as well as integrated automated failover and backup to improve high-availability and minimize downtime. Additionally, the DB2 9.5 data server introduces additional functionality to make XML handling more powerful and efficient. Below is a list of the functionalities that are supported by DB2 9.5:
- Support for pureXML in Non-Unicode databases
- Sub-document level XML updates (XQuery Update Facility)
- XML base-table storage and compression
- Compatible XML schema evolution
- Schema validation triggers
- XML replication
- XML federation
- XML load
- Easier parameter passing and optional syntax simplifications for various XML query functions
- Enhanced built-in XSLT functionality
- XML index and performance enhancements
- DB2 Data Web services
Some of the key features relevant to this article are described below.
Sub-document level XML updates
DB2 9.5 allows a user to update a portion of an XML document stored in the database without having to replace the entire document. It introduces four updating expressions - insert, delete, replace and rename of XML elements and attributes (nodes). These expressions comply with the XQuery Update Facility which is currently being standardized by the W3C.
The insert expression is used to insert new XML nodes into an existing XML document. You can specify the position of the insertion within the XML document. The replace expression updates the value of a particular node, or replaces a node with a new node. The delete expression deletes a particular node from the XML document. The rename expression changes the name of an attribute or element node.
Base-table row storage/compression (XML Inlining)
In DB2 9, XML data is stored in a different storage location than the relational data. This storage location is called XML data area (XDA). DB2 9 stores all XML documents in this storage location which means accessing XML values along with relational data requires additional I/O.
DB2 9.5 introduces base table row storage of XML data. This means that XML data can be stored along with relational data on the same physical page if the total size of relational and XML data per row doesn’t exceed the size of a page. In case it does, XML data is stored in the XML storage location (XDA) as usual. The maximum page size allowed in DB2 is 32 KB, so the maximum inline length of an XML value has a limit of 32 KB. If the size of the internal tree representation of the document is less than the specified inline length, they are ready to be in-lined.
One of the main benefits of inlining is compression. As the XML data is stored along with the relational data, it can be compressed using the regular row compression technology that was introduced in DB2 9. Compression ratios of 60% to 70% are possible. This in turn means that a lot less I/O is needed to read the same amount of information from disk. This can significantly improve performance, especially in I/O-bound systems.
Inlining of XML data in the base table size also allows more direct access to the XML. This gives a benefit while querying the data if XML data is accessed as often as other relational columns in the table.
DB2 9 supported two ways of populating a table with XML values: Insert statement to insert the XML values to the table; and the import utility to import bulk XML data into the table. In DB2 9.5, the load utility now also supports XML data. From the user perspective, loading XML data is very similar to importing XML data with many of the same command options. However, the load utility automatically parallelizes the work which allows drastically higher performance. A suitable degree of parallelism is chosen autonomically by DB2 but can also be set manually.
TPoX Benchmark Overview
Transaction Processing over XML (TPoX)  is an open-source and application-level XML database benchmark based on a financial application scenario. It is used to evaluate the performance of XML database systems, focusing on XQuery, SQL/XML, XML storage, XML indexing, XML Sche ma support, XML inserts, updates and deletes, logging, concurrency and other database aspects. TPoX simulates a security trading scenario and uses a real-world XML Schema (FIXML) to model some of its data. The benefits of using TPoX benchmark are as follows:
- It aims to evaluate a database system that contains XML data
- It is intended to represent all relevant characteristics of a real-world XML application
- It exercises all aspects of XML databases, including storage, indexing, logging, transaction processing, and concurrency control
- It also offers high level of flexibility and allows the benchmark users to configure and run different configuration settings and phases (inserts, query only and mixed workload) independently.
The TPoX benchmark is available as open-source at http://sourceforge.net/projects/tpox . It is a flexible benchmark that can be adjusted to run against any XML database system.
Test and comparison details
The TPoX benchmark models an online brokerage system that aims to be representative of a real brokerage in terms of XML document structures, transactions, and XML schemas. The main logical data entities in this scenario are:
- Customer: A single customer can have one or more accounts.
- Account: Each account contains one or more holdings.
- Holding: A number of shares of a security.
- Security: A stock, bond, or mutual fund.
- Order: Each order buys or sells shares of exactly one security for exactly one account.
Figure 1 below shows the relationship between the entities.
Figure 1 TPoX Entities
For each customer, there is a CUSTACC document that contains all customer information, account information, and holding information for that customer. Orders are represented using FIXML 4.4. FIXML is an industry-standard XML schema for trade-related messages such as buy or sell orders (www.fixprotocol.org). ORDER documents have many attributes and a high ratio of nodes to data. A SECURITY document represents the majority of US-traded stocks and mutual funds, and uses actual security symbols and names.
The TPoX benchmark is driven by a Java workload driver that runs the workloads and collects performance measurements. The TPoX benchmark defines an XML insert workload, a query-only workload, and a mixed read/write workload. This article focuses on the query workload.
In this study, the TPoX query workload was run for 30 minutes on the system under test (SUT), using a two-tier setup where the workload driver application was run on a client stressing the SUT. The database was loaded with 50 GB of raw data, corresponding to of 3 million custacc documents, 15 million orders and 20833 securities. For this paper, inserts were performed without XML schema validation and insert performance was not measured. During the test sequence, 7 different read only queries were randomly executed on the data. The number of concurrent users that drove the work load was varied to achieve optimal throughput at peak CPU utilization. The TPoX performance metric for the query workload is TQPS (TPoX Queries Per Second) and produced by the workload driver as an average over the 30 minute test period.
The test environment consisted of DB2® 9 running on an Dual-Core Intel® Xeon® processor 7100 Series processor-based server (“Tulsa”) and of DB2® 9.5 running on an Quad-Core Intel® Xeon® processor 7300 Series-based server (“Tigerton”). A secondary Intel® Xeon® 7000 Series processor-based server was configured as a client machine to drive the workload. The same DS4800 storage system with 78 disks configured as RAID 5 was attached to both the Tulsa and the Tigerton server. The operating system installed on both the client and data server was Novell* SUSE Linux Enterprise Server 10, 64 bit, service pack 1.
Intel® Xeon® MP Server
Quad-Core Intel® Xeon® processor 7300 Series based platform
- 4 x 2.93 GHz Intel® Xeon® 7300 processor (“Tigerton”) with 2 x 4MB L2
- Intel® 7300 chipset (1066 MHz FSB)
- first 16GB, later 32GB DDR2-400 memory
Dual-Core Intel® Xeon® processor 7100 Series based platform
- 4 x 3.4 GHz Intel® Xeon® 7140M processor (“Tulsa”) with 1MB L2/16MB L3
- Intel® E8501 (“Twin Castles”) chipset (dual 800 MHz FSB)
- 16GB DDR2-400 memory
DB2® 9 and 9.5 Setup
To allow a fair comparison, identical database configuration was used for DB2 9 and DB2 9.5. The database was created using automatic storage on 6 logical volumes on a DS4800 storage system. Each logical volume consisted of 13 physical disk drives. Six table spaces were created and configured: 3 for the tables CUSTACC, ORDER, and SECURITY, and 3 for the indexes on each table. For the tests with 16GB of main memory, four buffer pools with an aggregate size of 9.5 GB were defined for the CUSTACC table, the ORDER table, and their. When the Intel Xeon 7300 system was upgraded to 32G, the aggregate bufferpool size was increased to 21.2 GB. DB2’s self tuning memory management feature (STMM) was turned on but was INACTIVE since all memory consumers that STMM works on were assigned a manual value (sort heap, lock lists, package cache, etc). This was to make sure all tests on both systems (for v9 and v95) used the same memory configuration.
The client machine was configured with SLES 10 SP 1 and 16 GB memory. The TPoX workload driver only consumed a small portion of CPU and I/O on the client machine and the client machine was connected to the servers using a dedicate 1GB network.
Performance Comparison Results
For the first part of the experiment, Intel Tulsa (Xeon 7100) running DB2 9 was compared directly against Intel Tigerton (Xeon 7300) running the latest DB2 9.5 pureXML functionality (Figure 2). The Intel Tigerton system saturated rapidly due to the limitation of the storage system (maximum IOPS capacity of the storage system was reached). Performance was 1.3x at ½ the processor utilization in case of the next generation Intel and DB2 products (DB2 9.5 on Tigerton). Also, DB2 9.5 provides a 30% reduction in the XML database storage consumption as compared to the same data set in DB2 9. This is without inlining or compression but simply due to more efficient XML storage in DB2 9.5.
Figure 2: Baseline setup – Compare Xeon 7300 with DB2 9.5 vs. Xeon 7100 with DB2 9, with 16GB memory
Next, XML inlining and compression in DB2 9.5 was applied when creating the database. This reduced the database storage size drastically by about 52% compared to DB2 9.5 uncompress (about 67% compared to DB2 9). The compression, in turn lowered the IOPS (I/O operations per second) resulting in higher CPU utilization and system throughput. Figure 3 shows that the system throughput after compression is 60% higher than before compression, and 90% higher than for DB2 9 on Tulsa.
Figure 3: Apply deep compression and Inlining to Xeon 7300 with DB2 9.5 vs. Xeon 7100 with DB2 9 and 16GB memory
Finally, the system memory on the Xeon 7300 system was doubled from 16G to 32G after applying DB2 compression and inlining. Since the Xeon 7300 has twice the number of cores as the Xeon 7100, the system memory was doubled for another round of tests to keep the memory per core constant. As a result, the system utilization was nearly maximized (90% CPU) and throughput was more than doubled as compared to the previous generation processor and DB2 9, as seen in Figure 4.
Figure 4 Double the memory of Xeon 7300 to 32G with DB2 9.5 vs. Xeon 7100 (16 G) with DB2 9
In summary, the performance is doubled and the database size is drastically reduced by 67% (See Figure 5). However due to the limitations of the storage system that was used in the aforementioned experiments, it was observed that the processing power of the Xeon 7300 system was not fully stressed.
Figure 5 New Generation DB2 on New Quad Core Intel Processor Xeon 7300 Series Results
In order to maximize the CPU utilization, additional experiments were conducted with different storage subsystem on various Intel platforms, namely the Intel Xeon 5300 Series (code named - Clovertown) and Intel Xeon 7100 Series (code named - Tulsa) and Intel Xeon 7300 Series (code named - Tigerton) (Figure 6). The Tulsa and Clovertown machines were running the previous generation DB2 9 while the Tigerton was running DB2 9.5. Additionally, Tulsa was configured with 16G of RAM and 96 disks while Clovertown was using same amount of RAM and disks as Tigerton (32G and 112 data disks).
Tigerton’s advantage over Clovertown and Tulsa is striking with the performance power of the latest DB2 9.5. For instance, DB2 9.5 is thread-based while the DB2 9 engine was process based. This enabled performance opportunities such as less expensive context switches because of the increased context sharing enabled by the threaded architecture. Thus, the 2.68X advantage of Tigerton over Clovertown comes mostly due to the much improved DB2 9.5 thread-based software architecture, the doubling of cores (i.e. 16 in Tigerton vs. 8 in Clovertown), and the 1.1X CPU frequency advantage of Tigerton over Clovertown. On the other hand, the dramatic 2.84X advantage of Tigerton over Tulsa comes in part from the doubling of cores and the extra 16G RAM in Tigerton but more importantly from the improved software efficiency of DB2 and the much improved micro-architecture of the Tigerton cores.
Figure 6 Intel Xeon 7300 Series vs. Prior Generation Intel Processors XML Workload Performance Comparison Results
 Additional information on TPoX benchmark can be found at:
 DB2* 9 pureXML* Scalability on Intel® Xeon® MP Platforms Using IBM N Series* Storage:
 Additional performance data can be found at:
 For more information on the Quad-Core Intel® Xeon® processor 7300 Series, visit:
 For more information on the Intel® Virtualization Technology (Intel® VT), visit:
 For more information on the Intel® 7300 Chipset, visit:
 For more information on DB2 9.5 pureXML features, visit
This paper shows that the advances in next generation IBM DB2 and Intel multi-core architecture results in more than double the performance of XML data management for enterprise applications. These benefits are achieved due to the combined result of hardware and software advancements (doubling of cores, advancements in Intel micro-architecture and performance improvements in DB29.5 along with deep compression and XML inlining). Finally, this paper demonstrates that for data-intensive XML workloads, the advancements in Intel multi-core architecture make the Quad-Core Intel® Xeon® processor 7300 Series a “Best in Class” among MP server processors.
About the Authors
|Miso Cilimdzic has been part of the IBM DB2 Performance group since 2000. He has worked on numerous performance projects showcasing DB2 as the best performing database. Some examples are: record breaking TPC benchmarks, pureXML performance and customer engagements. He was a representative for IBM on a TPC committee from 2003-2006. His current focus is in various performance areas, with emphasis on customer solutions. Miso has a B.Sc from University of Saskatchewan in Software Engineering.|
|Rekha Raghu has been part of the Intel Enterprise Platform Enabling team since 2005. Prior to that she worked for over 10 years at Motorola designing software for next-generation two-way radio dispatch systems. At Intel, she is working on enabling IBM DB2 products to run the best on Intel platforms. Over the past year she has been working with IBM in optimizing the TPoX benchmark and help make the benchmark public. Her current focus is in optimizing XML workloads. Rekha has a M.S. in Computer Science and Engineering from University of Illinois at Chicago.|
The authors would like to thank the following people for their help with this work: Agustin Gonzalez, Paul Gryskiewicz, Roger Herrick Jr., Sunil Kamath, Anju Kapur, Irina Kogan, Matthias Nicola, Lakshmi Talluru, and Kevin Xie.