using non-inplace functions as inplace

using non-inplace functions as inplace

Imagen de dj_alek

Inplace functions are declared as deprecated since ipp 7.1 and possibly will be removed.

So, can we use non-inplace functions with pSrc==pDst? Or we must use intermediate buffer (and get excessive copying)?

publicaciones de 21 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Jeffrey Mcallister (Intel)

Yes, you may use pSrc==pDst. One of the main reasons for deprecating the in-place functions is that they can be viewed as a "restricted" form (that is, only allowing 1 source and destination buffer) where the non-inplace functions are more general and can be used with pSrc==pDst or different source and destination buffers, depending on what makes most sense in the application.

The suggested replacements in the warning messages (also listed here: http://software.intel.com/en-us/articles/functions-deprecated-in-ipp-71/) are fully validated with pSrc==pDst. Extra buffers/copies are not required.

Imagen de dj_alek

Can I consider this as an official answer from Intel and safely replace all inplace functions with their non-inplace equivalent with pSrc==pDst?
I think it would be useful to add pSrc==pDst permission to the IPP documentation.

Imagen de Chuck De Sylva (Intel)

Yes, that should be fine. We will update the documentation accordingly to address this.

Imagen de dj_alek

Ok. Thank you!

Imagen de Sergey Kostrov

Hi Jeffrey, I'd like to get some additional information.

>>... where the non-inplace functions are more general and can be used with pSrc==pDst...

Does it mean that some non-inplace IPP function could create some additional / temporary buffer to do some processing on data in 'pSrc' and then copy results back to 'pSrc' ( since pSrc == pDst )?

It is clear that additional buffer is not needed when some value, for example 1, has to be added to all elements of some array. That case I would consider as a really memory wise approach because it doesn't need any amount of memory from stack or heap for temporary buffers.

I simply would like to understand that application of non-inplace functions when pSrc==pDst for large data sets ( arrays or images greater than 128MB ) doesn't provide any advantages since in some cases a temporary buffer will be created anyway (!).

Thanks in advance.

Imagen de Jeffrey Mcallister (Intel)

There are no additional copies when using pSrc==pDst when moving to the non-inplace replacements for deprecated in-place functions. A lot of thought goes into providing good performance with IPP, which includes avoiding extra copies wherever possible. If one is found, especially as a performance regression when transitioning from duplicate in-place functions, this will be treated as a bug. However, pSrc==pDst replacements have been thoroughly reviewed and validated. As Chuck mentioned, the replacements suggested in the deprecation warnings for the in-place functions are safe from a functionality *and* performance perspective.

On a related note, IPP is moving away from internal allocations as an overall strategy. This is why so many initAlloc and free functions are being replaced by getsize. New functions added to IPP will also use external allocations. Adding internal intermediate buffers for pSrc==pDst would go against the general direction of giving users control and flexibility.

Thanks for your feedback on this. We want to be sure that the documentation for this topic covers your questions and concerns.

Imagen de Sergey Kostrov

>>...
>>On a related note, IPP is moving away from internal allocations as an overall strategy. This is why so many initAlloc and free
>>functions are being replaced by getsize. New functions added to IPP will also use external allocations. Adding internal intermediate
>>buffers for pSrc==pDst would go against the general direction of giving users control and flexibility.

This is exactly what I wanted to understand. Thank you, Jeffrey!

Imagen de David J.

How can we know whether a function is safe to to use src==dst? I've encountered some functions where doing this led to unpredictable results. Is this documented somewhere? Is it only safe when src and dst have the same data type? For example, if src is complex and dst is real.

For example, I was getting strange errors with ippsMagnitude_32f when using one of the input vectors as the output vector.

Imagen de Sergey Kostrov

>>... Is it only safe when src and dst have the same data type?

Yes and a suffix in a name of some function describes for what type needs to be used.

>>... I was getting strange errors with ippsMagnitude_32f when using one of the input vectors as the output vector...

Could you post an example of these errors?

Imagen de Sergey Kostrov

>>... Is it only safe when src and dst have the same data type?

Yes and a suffix in a name of some function describes for what type needs to be used.

>>... I was getting strange errors with ippsMagnitude_32f when using one of the input vectors as the output vector...

Could you post an example of these errors?

Imagen de David J.

I may be wrong on which function it was. There were two I was getting an error with, and it was only for certain input signals. Once I changed them to use a temporary buffer instead of dst=src, the errors went away. I'll see if I can determine which one it was, but it's one of these:

ippsWinHann_32f, ippsFFTFwd_RToPerm_32f, ippsMulPerm_32f,  ippsMagnitude_32f.

I wish I could be more specific, but unfortunately I didn't write it down. I'll try to remember, but it's in a large program, and I fixed it several days ago.

The error was simply that the output value was just slightly off of what it should be, but enough that I knew something was wrong. 

I just need to know exactly when it's safe to reuse memory and when it's not.

Thanks.

Imagen de Sergey Kostrov

David, These are statements you've written:

>>...I wish I could be more specific...
>>...I didn't write it down...
>>...I'll try to remember, but it's in a large program...
>>...The error was...slightly off...but enough that I knew something was wrong...

How could somebody proceed with investigation of your problem? There are No clear picture of what is going on and please try to be as specific as possible.

Imagen de David J.

Yes it was very ambiguous. But I'm not really looking for help debugging my code. I fixed these errors several weeks ago, by using temporary buffers instead in-place ops.

What I need is some clear documentation on when in-place ops can be done, and when they can't. This may be in the reference manuals, but I've been unable to find it.

My point was that, I have experienced bugs as a result of using in-place ops with IPP, which were difficult to diagnose. Because these bugs were only evident with certain input parameters, they could have easily gone unnoticed, and ended up in the production code. So it's just not worth the risk of a "trial and error" approach.

I should have taken better notes about the bugs, and how they were resolved, but unfortunately I didn't. Even then, I'm restricted on how much information I can give in a public forum. 

So I'm hoping someone can point me to some definitive documentation.Otherwise, I'll play it safe for now, and use temporary buffers.  

Imagen de David J.

If I recall correctly, the error was with using the real input to ippsMagnitude_32f as the real output.

There was another function where the input was real, the output was complex, and the same memory space was being used for both. I think that one's pretty obvious, though.

And I believe I was also getting errors with src=dst in ippsWinHann_32f.

I changed to using temporary buffers in ippsFFTFwd_RToPerm_32f, ippsMulPerm_32f, after discovering the other bugs, and there's nothing in the documentation indicating if in-place ops are supported for these functions. Ultimately, I removed all in-place ops with IPP. Fortunately, we use IPP sparingly in our code. 

Another function I was concerned about was the Haar transform. It would work if the algorithm was carried out in the most intuitive way, because writes would progress in increments of one, while the reads would have increments of two. However, I didn't want to chance making assumptions about IPP internal workings, which could change in subsequent releases. 

So there are so many different scenarios, and it really needs to be clearly documented for each function.

Imagen de Sergey Kostrov

>>...What I need is some clear documentation on when in-place ops can be done, and when they can't. This may be in the reference
>>manuals, but I've been unable to find it...

I agree that there are some white spots ( not too many, however ) in IPP documentation. I follow a combined approach, that is, online docs, pdf docs for the current version and pdf docs for some older versions ( down to version 3.0 ). I could tell you that in older pdf documents I was able to find some technical details which are not described in online docs.

Imagen de David J.

To get more specific, can ippsMagnitude safely do this:

ippsMagnitude_32f (realp, imagp, realp, num_pts)

Where the real part buffer is used for both pSrcRe and pDst.

Imagen de Sergey Kostrov

This is what I see in ipps.h header file:

...
// Names: ippsMagnitude
// Purpose: compute magnitude of every complex element of the source
// Parameters:
// pSrcDst pointer to the source/destination vector
// pSrc pointer to the source vector
// pDst pointer to the destination vector
// len length of the vector(s), number of items
// scaleFactor scale factor value
// Return:
// ippStsNullPtrErr pointer(s) to data vector is NULL
// ippStsSizeErr length of a vector is less or equal 0
// ippStsNoErr otherwise
// Notes:
// dst = sqrt( src.re^2 + src.im^2 )
*/
IPPAPI( IppStatus, ippsMagnitude_32fc, (const Ipp32fc *pSrc, Ipp32f *pDst, int len ) )
...

and as you can see pSrcDst is Not used.

Imagen de David J.

What your showing is for _32fc rather than _32f, but it still doesn't use pSrcDst.

I can't see any reason why it wouldn't work, as long of the nth element of pDst is not written to until after the magnitude is computed, i.e. realp[n] isn't changed until after it's value is used to compute the result. However, I don't know that I want to risk that, and I doubt it's much faster than a simple for loop.

I hope this hole in the documentation is filled in the near future, as non-in-place operations can get very expensive for large data buffers. I'd like to see every function be documented as to which buffers can be reused for src/dst.

Imagen de Sergey Kostrov

>>...I can't see any reason why it wouldn't work, as long of the nth element of pDst is not written to until after the
>>magnitude is computed, i.e. realp[n] isn't changed until after it's value is used to compute the result...

You could test it as follows:

- Do processing with one thread
- Initialize data in memory blocks pSrc1 and pSrc2
- In a 1st sub-test use pSrc1 != pDst1, and
- In a 2nd sub-test use pSrc2 = pDst2, and
- Then compare content of memory blocks pDst1 and pSrc2 (=pDst2)
- If data are the same then in-place processing could be done
- Repeat all of the above ( except for 1st item ) with more threads (!)

I did a verification and the following functions are threaded:
...
ippsMagnitude_32f
ippsMagnitude_32fc
ippsMagnitude_32sc_Sfs
ippsMagnitude_64f
ippsMagnitude_64fc
...
and that is why you need to verify both cases, that is, single- and multi-threaded processing.

>>...I hope this hole in the documentation is filled in the near future...

Please follow: http://www.intel.com/software/products/softwaredocs_feedback if you think it is Not clearly explained and needs to be improved.

Imagen de David J.

Using ippsFIR64f_32f, instead of ippsFIR64f_32f_I, produces corrupt data. Granted, I am using 7.0. Is this only safe to do for versions 7.1+?

Inicie sesión para dejar un comentario.