Invalid memory access with ippiTranspose_8u_C3R and ippiTranspose_8u_C4R

Invalid memory access with ippiTranspose_8u_C3R and ippiTranspose_8u_C4R

Hi,

I encounter invalid memory access with Inspector XE 2011 with the ippiTranspose_8u_C3R and ippiTranspose_8u_C4R functions when used with negative strides. See the code below for an example.

This sometimes leads to access violations in our application.

I am running the latest 7.0 update 7 IPP on Windows 64-bit, Intel Core i7-2720QM.

Can this be reproduced in your test environment?

Best regards,

Jurrien

int transpose_test(void)

{

	char *src, *dst, *src_end, *dst_end;

	int w = 1792;

	int h = 2560;

	int img_size8  = w*h*1;

	int img_size24 = w*h*3;

	int img_size32 = w*h*4;

	IppiSize sz;
	sz.height = h;

	sz.width  = w;
	//-- 1-Byte --//
	src = (char*) calloc(img_size8, sizeof(char));
	  dst = (char*) calloc(img_size8, sizeof(char));
	// Top Down, ok

	ippiTranspose_8u_C1R((Ipp8u *)src, w,(Ipp8u *)dst, h, sz);
	// Bottom up, ok

	src_end = src + w*(h-1);

	dst_end = dst + h*(w-1);
	ippiTranspose_8u_C1R((Ipp8u *)src_end, -w, (Ipp8u *)dst_end, -h, sz);
	free(src);

	free(dst);
	//-- 3-Byte --//

	src = (char*) calloc(img_size24, sizeof(char));
	   dst = (char*) calloc(img_size24, sizeof(char));
	// Top Down, ok

	ippiTranspose_8u_C3R((Ipp8u *)src, w*3,(Ipp8u *)dst, h*3, sz);
	src_end = src + w*(h-1)*3;

	dst_end = dst + h*(w-1)*3;
	// Bottom Up, gives invalid Partial memory access in Inspector

	ippiTranspose_8u_C3R((Ipp8u *)src_end, -w*3, (Ipp8u *)dst_end, -h*3, sz);
	free(src);

	free(dst);
	//-- 4-Byte --//

	src = (char*) calloc(img_size32, sizeof(char));
	   dst = (char*) calloc(img_size32, sizeof(char));
	// Top Down, ok

	ippiTranspose_8u_C4R((Ipp8u *)src, w*4,(Ipp8u *)dst, h*4, sz);
	src_end = src + w*(h-1)*4;

	dst_end = dst + h*(w-1)*4;
	// Bottom Up, gives invalid memory access in Inspector
	ippiTranspose_8u_C4R((Ipp8u *)src_end, -w*4, (Ipp8u *)dst_end, -h*4, sz);
	free(src);

	free(dst);
	return 0;

}

19 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

i am not sure if negative strides are supported here - need to check with documetation.

sorry, the previous message was incorrect - the negative strides are supported by transpose funtions. we will check on our side - how it works

I swapped out your calls to calloc and free with IPP intrinsics and didn't have any problems. Here is the changes I made:

#include "stdafx.h"
#include

int _tmain(int argc, _TCHAR* argv[])
{
char *src, *dst, *src_end, *dst_end;
int w = 1792;
int h = 2560;
int img_size8 = w*h*1;
int img_size24 = w*h*3;
int img_size32 = w*h*4;

IppiSize sz;
sz.height = h;
sz.width = w;

//-- 1-Byte --//
//src = (char*) calloc(img_size8, sizeof(char));
//dst = (char*) calloc(img_size8, sizeof(char));
src = (char*) ippMalloc(img_size8);
dst = (char*) ippMalloc(img_size8);

// Top Down, ok
ippiTranspose_8u_C1R((Ipp8u *)src, w,(Ipp8u *)dst, h, sz);

// Bottom up, ok
src_end = src + w*(h-1);
dst_end = dst + h*(w-1);

ippiTranspose_8u_C1R((Ipp8u *)src_end, -w, (Ipp8u *)dst_end, -h, sz);

//free(src);
//free(dst);
ippFree( (void *)src );
ippFree( (void *)dst );

//-- 3-Byte --//
//src = (char*) calloc(img_size24, sizeof(char));
//dst = (char*) calloc(img_size24, sizeof(char));
src = (char*) ippMalloc(img_size24);
dst = (char*) ippMalloc(img_size24);

// Top Down, ok
ippiTranspose_8u_C3R((Ipp8u *)src, w*3,(Ipp8u *)dst, h*3, sz);

src_end = src + w*(h-1)*3;
dst_end = dst + h*(w-1)*3;

// Bottom Up, gives invalid Partial memory access in Inspector
ippiTranspose_8u_C3R((Ipp8u *)src_end, -w*3, (Ipp8u *)dst_end, -h*3, sz);

//free(src);
//free(dst);
ippFree( (void *)src );
ippFree( (void *)dst );

//-- 4-Byte --//
//src = (char*) calloc(img_size32, sizeof(char));
//dst = (char*) calloc(img_size32, sizeof(char));
src = (char*) ippMalloc(img_size32);
dst = (char*) ippMalloc(img_size32);

// Top Down, ok
ippiTranspose_8u_C4R((Ipp8u *)src, w*4,(Ipp8u *)dst, h*4, sz);
src_end = src + w*(h-1)*4;
dst_end = dst + h*(w-1)*4;

// Bottom Up, gives invalid memory access in Inspector
ippiTranspose_8u_C4R((Ipp8u *)src_end, -w*4, (Ipp8u *)dst_end, -h*4, sz);

//free(src);
//free(dst);
ippFree( (void *)src );
ippFree( (void *)dst );

return 0;
}

So I think you were probably running into an alignment issue.

It probably is an alignment issue, but a having a non 32-bit alignment is not supposed to give access violations in an application, just a decrease in speed.

The memory allocation of the source memory block which is processed in my application is not something I can control.

If you can also reproduce the invalid memory access with the Inspector I would consider this a bug that needs to be fixed in an upcoming update.

Quoting Jurrien De KnechtIt probably is an alignment issue, but a having a non 32-bit alignment is not supposed to give access violations
in an application, just a decrease in speed...

If you have aMicrosoft Visual Studio 20xx Professional Editionplease look at source codes ofCRT-function 'calloc'.
You will see that'calloc' uses a Win32 API function 'HeapAlloc'. It is hard to believe that Microsoft developers
missed an allignment issue.

Also, CRT-functions'calloc' and 'malloc' are different by nature and take a look:

'malloc'
Allocates a memory block ( not initialized )
Declaration: void * malloc( size_t size )
Where, 'size' is a number of bytes to allocate

'calloc'
Allocates an array in memory with elements initialized to 0
Declaration: void * calloc( size_t num, size_t size )
Where, 'num' is a number of elements, and 'size' is a length in bytes of each element

IPP-function 'ippiMalloc' is similar to CRT-function 'malloc'.

>>...is not supposed to give access violations...

Some SSE instructionsand intrinsic functionsshould work with alligned memory blocks and if they are
not allignedan Access Violation exception is thrown.

Best regards,
Sergey

The issue for ippiTranspose_8u_C3R remains, even after Iused yoursuggested changes.See the code below. To removesome'uninitialized memory access' warning from the Inspector output I added additional memory initialization.

So with the code below I still get 'Uninitialized partial memory access' in ippiTranspose_8u_C3R with the Inspector, while running the 32-bit 7.0 IPP update 7 on windows 7 64-bit. This is suspicous as this leads to chrashes in our application.

Can Intel verify that this is an issue in the implementation of ippiTranspose_8u_C3R?

#include "stdafx.h"

#include 
void InitMemory(char *p, int sz)

{

	// Init memory

	for (int i = 0; i < sz; i++)	{

		*p++ = 0;

	}

}
int _tmain(int argc, _TCHAR* argv[])

{

    char *src, *dst, *src_end, *dst_end;

    int w = 1792;

    int h = 2560;

    int img_size24 = w*h*3;   
    IppiSize sz;

    sz.height = h;

    sz.width  = w;   
    //-- 3-Byte --//

    src = (char*) ippMalloc(img_size24);

    dst = (char*) ippMalloc(img_size24);   
	InitMemory(src, img_size24);

	InitMemory(dst, img_size24);	
    src_end = src + w*(h-1)*3;

    dst_end = dst + h*(w-1)*3;   
    // Bottom Up, gives invalid Partial memory access in Inspector

    ippiTranspose_8u_C3R((Ipp8u *)src_end, -w*3, (Ipp8u *)dst_end, -h*3, sz);   
    ippFree( (void *)src );

    ippFree( (void *)dst );   
    return 0;

}

I appreciate your explanation of the differences between calloc and malloc, but mystatement remains the same: a non 32-bit alignment of a memory block shall not lead to invalid memory accesses when using IPP.

What is your position in this?

I would expect that the use of a non 32-bit aligned pointer to a memory block would be detected by the library and handled appropriately, probably at the cost of some speed.

Best regards,

Jurrien

Quoting Jurrien De KnechtI appreciate your explanation of the differences between calloc and malloc, but mystatement remains the same: a non 32-bit alignment of a memory block shall not lead to invalid memory accesses when using IPP.

What is your position in this?

I would expect that the use of a non 32-bit aligned pointer to a memory block would be detected by the library and handled appropriately, probably at the cost of some speed.

[SergeyK] Hi Jurrien, Intel Software Engineers could have a different point of view regarding usage
of 32-bit pointers. You're right regarding some problems with aspeed of processing
when non-alligned pointers are used.
Best regards,
Sergey

I get the impression that my reply #6 to #3 has gone unnoticed. Could you please try the updated example? There is still an issue in ippiTranspose_8u_C3R, even with alligned pointers.

Best regards,
Jurrien

Since all ippi functions actually works with step (scanline stride in bytes), I suggest that you rewrite your test to not use ippmalloc but src = ippimalloc(x,w,srcstep). This also means that src_end = src+ srcstep*(h-1), and that you call ippiTranspose_8u_C3R with ((Ipp8u*)src_end,-srcstep,(Ipp8u*)dst_end,-deststep,sz).

There has been many support topics in this forum about using w*pixsize instead of step, and of border issues etc.

So, think step, not w*pixsz.

With a bit of luck your problem might go away.

Thomas, thanks for your reply.

I took your suggestion and used ippiMalloc and the stride in bytes. Unfortunately the issue remains in ippiTranspose_8u_C3R. The inspector still gives me an 'Uninitialized partialmemory access'.

Is this an issue with the function, or am I doing something wrong? See the code below for the example.

#include "stdafx.h"

#include

#include 
void InitMemory(char *p, int sz)

{

	// Init memory

	for (int i = 0; i < sz; i++)	{

		*p++ = 0;

	}

}
int _tmain(int argc, _TCHAR* argv[])

{

    char *src, *dst, *src_end, *dst_end;

    int w = 1792;

    int h = 2560;

	int src_step;

	int dst_step;
    IppiSize sz;

    sz.height = h;

    sz.width  = w;   
	// Version 3: using ippiMalloc and step size
	//-- 3-Byte --//

	src = (char*) ippiMalloc_8u_C3(w, h, &src_step);

    dst = (char*) ippiMalloc_8u_C3(h, w, &dst_step);   
	InitMemory(src, h*src_step);

	InitMemory(dst, w*dst_step);	
    src_end = src + src_step*(h-1);

    dst_end = dst + dst_step*(w-1);   
    // Bottom Up, gives invalid Partial memory access in Inspector

    ippiTranspose_8u_C3R((Ipp8u *)src_end, -src_step, (Ipp8u *)dst_end, -dst_step, sz);   
    ippiFree( (void *)src );

    ippiFree( (void *)dst );   
    return 0;

}

Your code loks fine.

What happens if you let the src have a positive step, and what happens if you let the dst have a positive step?
(of course, also change end -> begin in that case)

Could you providesome details on a function InitMemory(...)? It is not clear what it does internally.

I can answer that: InitMemory is a local function declared just before the main.
(its right in front of you!)

Quoting Thomas JensenI can answer that: InitMemory is a local function declared just before the main.
(its right in front of you!)

Thanks, Thomas! Ididn't noticeit...

The InitMemoryfunction is needed, otherwise the Inspector will claim that you are accessing uninitialized memory, which is correct.

@Thomas, when the direction is top-down (and src_step is positive) there is no issue. It is only with the negative step that this happens. This is the case where I see access violations in my application. I am strongly suspecting the ippiTranspose_8u_C3R for this behaviour. Running with the Inspector gives me the hint that something bad is happening in this function....

Can you or someone at Intel reproduce this? I am running the latest 32-bit IPP on Windows 7 64-bit on a SandyBridge CPU.

Cheers, Jurrien

Okay, but what about my question about src_step and dst_step, which one is giving problems?

I'm also qurious about your Inspector, that it can detect reading uninitizalized memory. How can it do that?
(less important)

It could be that IPP is reading beyond the scanline because it is using SSE, but then the question is if reading outside should be considered OK ,when writing inside.

And you wrote that it crashes. That is more than a warning in an Inspector.
The crash could give you a hint if you look at the cpu view. The memory access error is then an address before or after your src or your dst.

Hello Jurrien, have you ever solved this problem?

There was a similar problem in my src code few days ago. The error msg of the compiler was "Aborted.(core dump)".

I checked the src code and found that I access memory out of length I defined.

If the length of the data before transposition is L, you should allocate L*Nc length memory for the output of transposition.

Which, Nc means number of channel, Nc=3 if ippiTranspose_xx_C3R is used for instance.

Any method of memory allocation is OK.  

Ippu8 *out = (Ipp8u*)malloc(L*Nc*sizeof(Ipp8u)); something like that.

command "malloc" can ensure address alignment.And maybe address alignment is not the main cause of this problem.

Hope that will be helpful.

 

BG

Charls

Leave a Comment

Please sign in to add a comment. Not a member? Join today