Max size error on Creating Buffer using Alloc_host_ptr

Max size error on Creating Buffer using Alloc_host_ptr

Hello all,

The max mem alloc size of my cpu device (i5-3470) is 4266006528(less than 4GB) and that of gpu (hd-2500) is 425721856(less than 512MB).

Now i am creating a simple buffer clInput = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(type) * elements, NULL, &err);

But I am getting invalid_buffer_size for GPU when size reaches 512MB. This would have made sense if i am allocating buffer on GPU memory. the only purpose of using alloc_host_ptr flag was to use the max_mem_alloc size of cpu which is 4GB. Am i doing something wrong or is it a bug?

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I was able to reproduce the error. I will debug this and get back to you. Thanks for reporting.

Raghu

Sorry. I checked my test program again. My CPU device reports CL_DEVICE_MAX_MEM_ALLOC_SIZE as 536838144 which is less than 512MB, to be precise. So if I pass this exact number as the sizeof my buffer I didn't get the error. Can you use gpucapsviewer to see what your CPU device reports this number as? And try creating the buffer whose size is less than or equal to this number. Let me know if you are still getting the error.

I am curious why you got 4GB and I am getting 512MB.

Raghu

Hi thanks for the prompt response, I checked CL_DEVICE_MAX_MEM_ALLOC_SIZE for cpu again and found the same 4266006528 bytes. I am checking this number in opencl code using clGetDeviceInfo(device_id[d],  CL_DEVICE_MAX_MEM_ALLOC_SIZE , sizeof(tmpLong), &tmpLong, NULL) for cpu.

I am not familiar with gpucapsviewer. i tried typing it in cmd prompt but got unrecognizable command error.

Hi thanks for the prompt response, I checked CL_DEVICE_MAX_MEM_ALLOC_SIZE for cpu again and found the same 4266006528 bytes. I am checking this number in opencl code using clGetDeviceInfo(device_id[d],  CL_DEVICE_MAX_MEM_ALLOC_SIZE , sizeof(tmpLong), &tmpLong, NULL) for cpu. I am even able to create 2GB buffers on CPU (opencl)

I am not familiar with gpucapsviewer. i tried typing it in cmd prompt but got unrecognizable command error.

Quote:

If 09 wrote:
the only purpose of using alloc_host_ptr flag was to use the max_mem_alloc size of cpu which is 4GB.

Hi,

The ALLOC_HOST_PTR flag just tells the runtime to mirror the memory allocation on the host. The actual buffer is being created on the device.

With ALLOC_HOST_PTR you are taking advantage of the upfront allocation, so when you do clEnqueueMapBuffer and the mapped memory is already allocated, so you just got the pointer which is fast. In contrast, if you do the clEnqueueMapBuffer on the regular buffer, the runtime would need to allocate the mapped memory first. Finally, without mapping you would need to allocate memory yourself and use clEnqueueWriteBuffer.

If you are not going to populate and/or read the buffer from the host code you don't need any flags. Otherwise the best way to avoid copying from CL buffer into your internal structures (and back) is actually using USE_HOST_PTR.

Please correct me if I am not. According to my experimental results and my understanding, ALLOC_HOST_PTR allocates the buffers on host so the kernel accesses the buffers from host memory and this access is very slow. On the other hand, USE_HOST_PTR allocates buffer on device and so the time for kernel execution was faster in comparison to when buffer was created using ALLOC_HOST_PTR.

Leave a Comment

Please sign in to add a comment. Not a member? Join today