Dedicated users of the previous generation’s VTune™ Performance Analyzer remember that the tool supported Java application profiling. Over time, this feature disappeared from the radar, but since then customers have clamored for Java support in the current VTune Amplifier XE. Profiling pure Java applications and more importantly mixed Java and native C/C++ applications is becoming necessary again. In response to this request, Java profiling has been added in the new Intel(R) VTune™ Amplifier XE 2013 in addition to the JITed application profiling support.
Why does someone need Java application profiling? The main purpose of performance profiling is identifying functions or code locations which take up most of CPU’s time, and finding out how effectively they use this computing resource. Even though Java code execution is handled a Managed Runtime Environment, it can be as ineffective in terms of data management as in programs written using native languages. For example, if you’re conscious about performance of your data mining Java-application, you need to take into consideration your target platform memory architecture, cache hierarchy and latency of access to memory levels. From the platform microarchitecture point of view, profiling of a Java applications is similar to profiling native applications but with one major difference: since users want to see timing metrics against their program source code, the profiling tool must be able to map performance metrics of the binary code either compiled or interpreted by the JVM back to the original source code in Java or C/C++.
With VTune Amplifier XE Hotspot analysis you get a list of the hottest methods along with their timing metrics and call stacks. Note that a workload distribution over threads is also displayed in the time line view of results. Thread naming helps to identify where exactly the most resource consuming code was executed.
Those who are pursuing maximum performance on a platform may apply some tricks like writing and compiling performance critical modules of their Java project in native languages like C or even assembly. This way of programming helps to employ powerful CPU resources like vector computing (implemented though SIMD units and instruction sets). In this case, the heavy calculating functions become hotspots in the profiling results, which is expected as they do most of the job. However, you might be interested not only in hotspot functions, but in identifying locations in Java-code those functions were called from through a JNI-interface. Tracing such cross runtime calls in mixed language algorithm implementations could be a challenge.
In order to help analysis of mixed code profiling results, VTune Amplifier XE is “stitching” the Java call stack with the subsequent native call stack of C/C++ functions. The reverse call stacks stitching works as well.
java.exe -Xcomp -Djava.library.path=mixed_dll\ia32 -cp C:\Design\Java\mixed_stacks MixedStacksTest 3 2
amplxe-cl –collect hotspots -- run.bat
amplxe-cl –collect hotspots -- java.exe -Xcomp -Djava.library.path=mixed_dll\ia32 -cp C:\Design\Java\mixed_stacks MixedStacksTest 3 2
- It’s difficult to support all Java Runtime Environments (JRE) available in the market, so at the moment we support Oracle* Java 6 and 7.
- Java application profiling is supported for Hotspots analysis and Hardware Event-based analysis (e.g. Lightweight Hotspots), but Concurrency analysis is limited as some embedded Java synchronization primitives (which do not call operating system synchronization objects) cannot be recognized by the tool. As a result, some of the timing metrics may be distorted for Concurrency as well as for Locks & Waits analysis.
- The tool cannot attach to a Java process on Linux. We support attach on Windows at the moment.
- There are no dedicated libraries supplying a user API for collection control in the Java source code. However, you may want to try applying the native API by wrapping the __itt calls with JNI calls.
Additional command line Oracle JDK Java VM options that change the behavior of the Java VM
- On Linux x86 use client Oracle JDK Java VM instead of the server Java VM, i.e. either explicitly specify “-client” or simply do not specify “-server” as an Oracle JDK Java VM command line option.
- On Linux x64 try specifying the ‘-XX:-UseLoopCounter’ command line option which switches off on-the-fly substitution of the interpreted method with the compiled version.
- On Windows try specifying '-Xcomp' that forces JIT compilation for better quality of stack walking.
Note: when you force the JVM to compile initially interpreted functions, the timing of your application may change and for small and rarely called functions compilation would be less performance effective than interpretation
- Click the New Analysis button in the VTune Amplifier XE tool bar
- Choose the ‘Hotspots’ analysis type and right-click
- Select ‘Copy from current ’ in the context menu
- In the opened ‘Custom Analysis’ dialog select ‘After collection’ in ‘Stack unwinding mode’ drop-down list and press ‘OK’ button
- Start collection using this new analysis type.