It is not unusual to allocate contiguous memory for an N-dimensional array. In FORTRAN 2D array for example Array(1:nX, 1:nY) you can construct a reference to Array(1:nX, y) which is a contiguous block of memory, as well as construct a reference Array(x, 1:nY) which creates an array descriptor with a stride other than one element of nX*element size. C/C++ can have analogous capability using a class.
Could you consider extending the AVX instruction set to have an alternate form of scatter/gather that takes a stride as opposed to the current table of indices?
Current form:
FOR j = 0 to 7
i= j * 32;
IF MASK[31+i] THEN
MASK[i +31:i] 0xFFFFFFFF; // extend from most significant bit
ELSE
MASK[i +31:i] 0;
FI;
ENDFOR
FOR j =0 to 7
i= j * 32;
DATA_ADDR= BASE_ADDR + (SignExtend(VINDEX1[i+31:i])*SCALE) + DISP;
IF MASK[31+i] THEN
DEST[i +31:i] FETCH_32BITS(DATA_ADDR); // a fault exits the loop
FI;
MASK[i +31:i] 0;
ENDFOR
Proposed alternate form:
FOR j = 0 to 7
i= j * 32;
IF MASK[31+i] THEN
MASK[i +31:i]= 0xFFFFFFFF; // extend from most significant bit
ELSE
MASK[i +31:i]= 0;
FI;
ENDFOR
FOR j = 0 to 7
i= j * 32;
DATA_ADDR= BASE_ADDR + (SignExtend(VINDEX1 * j)*SCALE) + DISP;
IF MASK[31+i] THEN
DEST[i +31:i]= FETCH_32BITS(DATA_ADDR); // a fault exits the loop
FI;
MASK[i +31:i]= 0;
ENDFOR
Where VINDEX1 now contains the stride (as opposed to address of table)
And it is the programmer/compiler responsibility to insert into BASE_ADDR the base address of the small vector iow the 0th element of the 8 floats.
This would provide for more efficient code in scatter/gather
Jim Dempsey




