"Privileged instruction" exception using SSE2

"Privileged instruction" exception using SSE2

I'm getting an "unhandled exception: privileged instruction" error at runtime. I'm trying to use the SSE2 intrinsics to do some image analysis. The image is saved as 8-bit grayscales in an aligned buffer. Unfortunately, the width of the scanlines is 532...to make it easier on myself I'm omitting the last 4 pixels of each scanline (to have an integer multiple of 16 for sse2 usage, i.e. 528 pixels/scanline) and loading the grayscales using _mm_load_si128(...). Then I skip the last 4 pixels and begin to load the next scanline. My test program crashes when skipping the last 4 pixels and loading the next scanline:

//test code follows:

//allocate memory aligned on 16 byte-boundary:
unsigned char *Image1;
Image1 = (unsigned char*) _mm_malloc(sizeof(unsigned char)*(500*532), 16);

__declspec(align(16)) unsigned char* I1ptr;
//Loop through image, skipping last 4 pixels of scanlines:
for(int row=0; row<500; row++)
Shift = 532*row;
I1ptr = (Image1 + Shift);//move toscan line beginning.

for( int col=0; col<528; c+=16)
//Load 16 grayscales via SSE2:
Image1Pixels = _mm_load_si128( (__m128i*)(I1ptr+col) );
}//End of col.

}//End of row.

I get the "privileged instruction" exception when I call "_mm_load_si128(...)" the first time during the second iteration of the row loop (i.e. after skipping the last 4 pixels and moving to the begining of the 2nd scanline). I just can't figure this one out. Anyone know the answer here? Thank you!

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I'm trying to understand this. It looks like the 2nd value of Shift is misaligned (not a multiple of 16), where maybe you are leaving space for the odd values you said you were skipping. Did you mean col += 16 where you put c+=16 ?

Yes that was a typo. Sorry. Should be col+=16. The main problem is that the width of the image is not a multiple of 16.

I'm currently getting around this by working on the entire image in one go (widthXheight = 532*500 which is a multiple of 16), but it's less ideal since I would prefer to load the first row (scanline) do some processing on it, load the second row, etc.

This line of code is problematic:
Image1Pixels = _mm_load_si128( (__m128i*)(I1ptr+col) );

If your lines are not 16 bytes aligned this will crash.

You should use _mm_loadu_si128(I1ptr+col) - note the U for Un-aligned.


The row size (532) in your code is not multiple of 16,
so after first row (=1) iteration I1ptr data is not aligned.
You should either use 528/544 row size or use
unaligned (slow) load.


Leave a Comment

Please sign in to add a comment. Not a member? Join today