I checkout the Intel HD 4000 integrated on IvyBridge GT2 I5-3320M with OpenCL.
The test kernel samples 262144x1250 times an unified 1024x1250 CL_R, CL_FLOAT image2D with CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR.
The sampling coordinates follow a curve. Calculating the curve consists of less and just add and mul operations. The inner loop is unrolled and has 4 read_imagef().
The kernel writes 262144 floats as result in a global buffer.
The kernel is called in a test loop several times. Every kernel call works on different input data (image2D content).
GPU-Z 0.6.8 mentions a texture Fillrate of 2.6 GTexels/s for the HD 4000.
The GPU frequency is after the 10th kernel call up to 1200 MHz, as per GPU-Z.
(a) If the image2D becomes wider than 1536 pixels, then the GTexels/s drop significant below the 2.6 GTexels/s.
(b) If the image2D is as given then it results in more than 2.6 GTexels/s.
I lack an official specification of the HD 4000 by INTEL. I just found http://www.realworldtech.com/ivy-bridge-gpu/1/
May I ask to explain, how to calculate the maximum GTexels/s for the image2D format specified above ?
From my understanding, (b) indicates a higher GTexels/s as given by GPU-Z. But under which conditions ?
Any hint, how I could avoid the drop of GTexels/s ass een in (a) ?