Excessively slow binary load times...

Excessively slow binary load times...

Hi!

I have checked the recent update of Open CL drivers v1.5 and the loading times of compiled binaries have not improved. It takes almost 20 seconds for my application to start when loading binary program for Intel drivers. It starts almost instantly with Nvidia and AMD drivers. There is only a small difference between times required to compile the code and the time needed to load already precompiled binaries with Intel Open CL.

Are there any plans to improve on this?

Thanks!
Atmapuri

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Atmapuri,

Could you please clarify your situation in more detail? What is the compilation and execution flow you're implementing?

Can you reproduce this problem for a simple application and attach it?
Thanks in advance

Dear Eli,

My application works like this:

1.) Check if compiled binaries exist.
2.) If not, load the source code and compile the source and save the compiled binaries to disk.
3.) If compiled binaries do exist, load the binaries and continue execution.

It is point #3 which takes 20seconds as measured. The time is spent within the

Status = clBuildProgram(clProgram, 1, DeviceList, cFlags, NULL, NULL);

The clProgram is created with a call to clCreateProgramWithBinary which returns immediately. The 20 seconds delay does not happen with other vendor drivers. My code is also maybe specific in terms of Kernel count. It has about 500 kernels.

Thanks!
Atmapuri

Atmapuri,
Thanks. We'll check this issue and come back to you when we have more information.
Eli

Hi Atmapuri,

The binaries which are returned are not executables but rather in intermediate form. This means that when you build the program from these binaries we have to recompile them all the way to device executables.

To validatemy "theory", and to make sure there isn't another issue which needs further investigaion, I would like to kindly ask you to do another measurement. The measurement should include the time it takes you to compile the sources initially (described in the step 2 in the scenario where the binaries don't exist yet). Make sure you measure only the build program and not the io of saving to the disk.
If I am correct the results should be >= 20 seconds.

Please let me know what are the results so that we can proceed with the investigation.

Thanks,
Boaz

Dear Boaz,

Here are some compile times for my sources:

1.) Nvidia Open CL: 1 second
2.) AMD HD5770: 20 seconds
3.) AMD CPU: 35 seconds
4.) Intel CPU: 60 seconds

Binary load times:

1.) Nvidia Open CL: 1 second
2.) AMD HD5770: 1 seconds
3.) AMD CPU: 1 seconds
4.) Intel CPU: 20 seconds

So, you are correct, that the binaries actually are loaded and used, but the binary load times are by far the worst in the industry. (compile times as well). If kernels are independent from each other It would be possible to run the compilation also in parallel on all available cores. I currently I see with Intel 2 full cores being used during compile time and only 1 with the rest of the group.

Thanks!
Atmapuri

Hi Atmapuri,

Thanks for the feedback, we will need to work on this and improve our compilation times.
And another question,will improving our binary load times to 1 second resolve your issue?

Thanks,
Boaz

By all means : )

Thanks!
Atmapuri

Leave a Comment

Please sign in to add a comment. Not a member? Join today