Initial release feedback on CL 2013

Initial release feedback on CL 2013

Hi there,

First of all, thanks for releasing a new version of the Intel OpenCL toolkit! I've downloaded the new version and have found two issues:

  • First, the compiler is much slower than it used to be, say, on 2012. This is not bad in itself. What makes it bad is that compiling from binary appears to take the same amount as from source. As a result, developers are stuck waiting for kernels to compile every time. (i.e. binary caching is impossible) This gets old quickly...
  • The PyOpenCL (in git, http://github.com/inducer/pyopencl ) test suite fails in the segmented scan. Since this code runs successfully on AMD (CPU, APU, GPU), Nvidia, and Intel 2012, I am currently leaning towards there being a correctness issue in 2013. I'll continue to investigate though.

I'd appreciate your feedback on these issues.

Thanks!

Andreas

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quote:

inducer wrote:

  • First, the compiler is much slower than it used to be, say, on 2012. This is not bad in itself. What makes it bad is that compiling from binary appears to take the same amount as from source. As a result, developers are stuck waiting for kernels to compile every time. (i.e. binary caching is impossible) This gets old quickly...
  • The PyOpenCL (in git, http://github.com/inducer/pyopencl ) test suite fails in the segmented scan. Since this code runs successfully on AMD (CPU, APU, GPU), Nvidia, and Intel 2012, I am currently leaning towards there being a correctness issue in 2013. I'll continue to investigate though.

Hi Andreas,

For the first issue, was this on the CPU device or GPU? Can you send us a reproducer? The second issue - I haven't used PyOpenCL, what are the steps to reproduce the failure? Again, is this on CPU or GPU?

Thanks,
Raghu

Hi Raghu,

here's a reproducer for the compile speed issue. Btw, I'm on Linux, and all my complaints pertain to the CPU backend. On my i7 2620 (SNB), these are the numbers I get for the attached code:

Intel CL 2013:

from-source compile took 3.5429 s
from-binary compile took 3.71068 s

Intel CL 2012:

from-source compile took 0.197583 s
from-binary compile took 0.123464 s

As you can see, these times are worse by more than a factor of 10. To reproduce this, simply run the attached file "compile-times.py" using PyOpenCL. If you'd like to reproduce this independently of PyOpenCL, you'll also need the header "pyopencl-ranluxcl.cl" which I've also included.

Thanks!

Andreas

Attachments: 

Bump?

Hi Andreas,

Regarding compilation times.
I get similar values on 2013 release (~3.8 s), but latest internal version are a lot faster (~0.4 s). I didn't reproduce 2012 version values, but I think the difference might be explained by various compiler changes, optimizations, etc.

I will look for segmented scan test failures later.

Thanks,
Yuri

Great, thanks. Let me know if I can help somehow.

Hi, the feedback is very important for improve our apps.

Leave a Comment

Please sign in to add a comment. Not a member? Join today