Profiling Julia code with Intel® VTune™ Amplifier

[2013 Oct 17: Blog updated to split patch into two patches, one for Intel® VTune™ Amplifier changes and one for MKL/ifort changes.]

[2013 Oct 22: Support for Intel® VTune™ Amplifier became part of Julia master sources.  Look for USE_INTEL_JITEVENTS in Julia/Make.inc for how to enable Amplifier support.]

[2014 June 26: Added patch as attachment.  The patch works around a problem where line numbers reported by Intel® VTune™ Amplifier are off by one.  The patch compensates by adjusting line numbers off by one in the opposite direction.  Apply it to julia/deps/llvm-3.3/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp before building Julia.  Do not apply the patch if using the latest version of LLVM, since the bug was fixed in the LLVM sources on 2014-12-14.]

[2014 July 16: Removed patch for building Julia with Intel ifort and MKL, since now there is built-in support for doing so.]

I found out how to profile Julia code with Intel® VTune™ Amplifier, and was pleased with the good results for minor effort.  This blog summarizes how to modify the Julia sources to to it.  Be warned that it requires building Julia from source.  I've only lightly tested it on Linux* and Windows* 7.  I'm a newbie at modifying the Julia sources, so apply one of the attached patches at your own risk and only after reading it carefully!

Background: I've been poking around with Julia and was curious about the code being generated.  Leah Hanson's blog Julia introspects is an excellent introduction to how to look at the code at various levels.  Alas the "code_native" interface in Julia currently often shows code much less efficient than what actually executes, along with a warning:


Warning: Returned code may not match what actually runs.

So I decided to try to use Intel® VTune™ Amplifier, which needs some notifications from the JIT about when and where code is generated.  Fortunately, the necessary support comes with LLVM.  I just had to enable it where Julia builds its execution engine by adding six lines of C++ and making a small change to a Makefile.  [2014 July 16: These changes are now part of the Julia source distribution, so I've deleted the patches.]

If you are applying the patch after having built LLVM, remember to rebuild LLVM using the new flags. I found that "make clean-llvm" in the julia/deps/ directory was not enough.  I resorted to removing the entire subdirectory julia/deps/llvm-3.3/build_Release.  To verify that "Intel JIT events" support is really enabled, after the build search julia/deps/llvm-3.3/build_Release/config.log  for the string"LLVM_USE_INTEL_JITEVENTS.  You should see many occurrences of | #define LLVM_USE_INTEL_JITEVENTS 1.

The attached file julia-bubblesort.png file is a sample screenshot, with both Julia source and assembly code views turned on.  I chose bubble sort for my initial experiments because I know it's a horribly inefficient way to sort random value, and so it would show up prominently in a profile.  The code shown by Intel® VTune™ Amplifier is far from optimal, but pretty good for a JIT and definitely much better than what Julia's code_native interface displayed.  

Acknowledgements

Kirill Uhanov, Daniel Malea, and Alexei Alexandrov implemented the support for Intel® VTune™ Amplifier support in LLVM.  Andrew Kaylor explained what I had to do to enable it. Thanks to the developers of LLVM for building a wonderfully modular compiler.

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.