I am using Intel's VME extension to calculate ME and the time that it take to pass the image to the GPU is very long about 1Msec
I have tried 2 methods:
#1 Map /Unmap - about 0.4 Msec for 1280*720 image
queue.enqueueMapImage(*pRefImage,CL_TRUE,CL_MAP_WRITE_INVALIDATE_REGION,origin, region, &row_pitch,NULL,NULL,NULL);
memcpy(prefImageMemory,pRefBuf,arraySizeImageBytes); // Memory use HOST memory
#2 enqueueWriteImage - about 0.7 Msec for 1280*720 image
queue.enqueueWriteImage(srcImage, CL_TRUE, origin, region, currImage->PitchY, 0, currImage->Y);
Why doesn't it take so long?
How can I improve this?
Can I call map once and than unmap after each change in the image memory and save the "map" time/