[2013 Oct 17: Blog updated to split patch into two patches, one for Intel® VTune™ Amplifier changes and one for MKL/ifort changes.]
[2013 Oct 22: Support for Intel® VTune™ Amplifier became part of Julia master sources. Look for USE_INTEL_JITEVENTS in Julia/Make.inc for how to enable Amplifier support.]
I found out how to profile Julia code with Intel® VTune™ Amplifier, and was pleased with the good results for minor effort. This blog summarizes how to modify the Julia sources to to it. Be warned that it requires building Julia from source. I've only lightly tested it on Linux* and Windows* 7. I'm a newbie at modifying the Julia sources, so apply one of the attached patches at your own risk and only after reading it carefully!
Background: I've been poking around with Julia and was curious about the code being generated. Leah Hanson's blog Julia introspects is an excellent introduction to how to look at the code at various levels. Alas the "code_native" interface in Julia currently often shows code much less efficient than what actually executes, along with a warning:
Warning: Returned code may not match what actually runs.
So I decided to try to use Intel® VTune™ Amplifier, which needs some notifications from the JIT about when and where code is generated. Fortunately, the necessary support comes with LLVM. I just had to enable it where Julia builds its execution engine by adding six lines of C++ and making a small change to a Makefile.
Attached are patches with the changes. There are two patches. .
- A patch that enables use of Intel® VTune™ Amplifier. This patch works for Linux and Windows. Be warned that on Windows, your build may fail with the message like:
.../julia/deps/llvm-3.3/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp:32:33: fatal error: EventListenerCommon.h: No such file or directory
Just restart the build (run make again) and the build should complete. I do not know why it fails the first time but succeeds the second time.
- A patch that enables use of the Intel® Math Kernel Library and the Intel® Fortran compiler. This patch works only on Linux so far, due to an MKL issue. (Specifics: Julia looks for "libmkl_rt", but it's "mkl_rt" on Windows. That's a fix for another day.) I include the patch so that readers can reproduce exactly the modifications that I've been using on Linux.
Be warned that the patches are only lightly tested, and created by me who is still finding his way around the Julia source base. The patches areshort, and so it should be fairly obvious what they are doing. The patch to enable Intel® VTune™ Amplifier does the following:
- Adds six lines to the Julia interface to LLVM, where it builds its execution engine. To minimize overhead when not in use, notification support is enabled only when LLVM is compiled without --with-intel-jitevents and ENABLE_JITPROFILING has a nonzero value. The environment variable ENABLE_JITPROFILING is set by Intel® VTune™ Amplifier.
- Compiles LLVM with support for the "Intel JIT events" that Intel® VTune™ Amplifier needs. The change affects the use of two LLVM flags, for three possible cases, as shown in the table below:
|LLVM Build Options||Rationale|
|Linux x86||--with-intel-jitevents||LLVM support requires linking with pthreads library, so leave off --disable_threads.|
|Windows||--with-intel-jitevents --disable-threads||LLVM support uses Windows threads. Use --disable_threads to avoid bringing in pthreads.|
|Other platforms||--disable-threads||No LLVM support available, so keep same as unmodified Julia|
If you are applying the patch after having built LLVM, remember to rebuild LLVM using the new flags. I found that "make clean-llvm" in the julia/deps/ directory was not enough. I resorted to removing the entire subdirectory julia/deps/llvm-3.3/build_Release. To verify that "Intel JIT events" support is really enabled, after the build search julia/deps/llvm-3.3/build_Release/config.log for the string"LLVM_USE_INTEL_JITEVENTS. You should see many occurrences of | #define LLVM_USE_INTEL_JITEVENTS 1.
The attached file julia-bubblesort.png file is a sample screenshot, with both Julia source and assembly code views turned on. I chose bubble sort for my initial experiments because I know it's a horribly inefficient way to sort random value, and so it would show up prominently in a profile. The code shown by Intel® VTune™ Amplifier is far from optimal, but pretty good for a JIT and definitely much better than what Julia's code_native interface displayed.
Kirill Uhanov, Daniel Malea, and Alexei Alexandrov implemented the support for Intel® VTune™ Amplifier support in LLVM. Andrew Kaylor explained what I had to do to enable it. Thanks to the developers of LLVM for building a wonderfully modular compiler.