IPP with AMD CPUs....

IPP with AMD CPUs....

Guys, you are doing great job with IPP library.The only doubt I have is the futuresupport for AMD processors.

I remember that older version of IPL detected AMD Athlon as Pentium III and used PIII optimizations. Lastrelease of IPP detects exactly same Athlon as a generic Pentium and basically disables all optimizations on this CPU. As a workaround, currently one can get away with forcing PIII optimizations on Athlon. It works fine for me...

WillI be able to use IPP on AMD processors? Doyou have any plans to support Opteron and AMD64 architecture natively? Will youNOT implement "special measures" to prevent using your libraryon AMD processors?

I knowall this is politically charged questions... But still...

Message Edited by ZXS on 05-12-2004 08:12 PM

Message Edited by ZXS on 05-12-2004 08:13 PM

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Opteron willuse 80286 optimizations, and future AMD processors will be detected as a plain 8051 ...

Hi,

thank you for high valuation of IPP libraries. We are glad to have pleasant feedback:)

What about AMD processors support, it is not only political question. It isalso quite technical question. We can't guarantee the performance benefits of code which was tightly optimized for Intel architecture when it will run on other architecture. And of course, we do not guaranteethe compatibility of third party processors with Intel architecture. What weguarantee"generic" code of IPP will correctlywork on processors which is 100% compatible with Intel.

Regards,
Vladimir

Using generic code is more safe. Generic code was developed for processors which 100% compatible with Intel architecture.

Vladimir

"Generic" code has no SIMD optimizations.Such codecannot be used as a foundation of cometitive high-perfomance application because it does not take full advantage of available capabilities of modern CPU (not even close), despite all its "safety". Let us be honest, "generic"performance library makes no sense whatsoever...

Howdoes IPP libraryselect a version of optimizations (PX,A6,W7,T7) to be loaded? Isit documented somewhere?

I hope it is not something like this: IFCPUID()= "AuthenticAMD" THENpx()

It would be hard to expect that IPP supported all existing CPU microarchitectures. But, at least, if some third party CPU supports compatible SIMD instruction set (MMX,SSE,SSE2,SSE3)then it isreasonable to load"right'DLLs i.e. DLLs designed for this instruction set.
Because Intel has dominant position on CPU market, other vendors do mimic behaviour of Intel CPUs for software compatibility (though internal design of"other" CPUs can be quite different) - so machine code optimized for Intel CPUsis also pretty efficient on other processors.

Is it possible toaddkeyto the Windows registry which would allow to configure IPP to load the user-selected version of the library, or tosetitto AUTO-DETECT(by default)?

Let us be honest, very fast program doing wrong calculations makes no sense for anyone.

Yes, IPP does contain optimized code for Intel micro architecture. It means exactly what it is - for Intel microarchitecture, not for others.

Yes, "generic" code can't provide you the best performance and it was not dedicated for that. It provides you working solution whichgives the same results in terms of accuracy on other than Intel but 100% compatible architecture.

Of course, the code dispatching is doingwithanother algorithm. It is like this

IF detected_cpu() == "GenuineIntel"
select"specific" code
ELSE
select "generic" code
ENDIF

I see, I have to repeat it again. We do very thin optimization for Intel architecture, and we can't guarantee any resultsfor this optimized codeon other architecture. We have "generic" code, which should work on 100% compatible architecture.

Vladimir

No, there is no way to dispatch "Intel cpu specific" code on non-Intel processors.

One note about the "right" DLL.
In IPP"right" DLL is DLL which was designed forappropriate processor.Not only instruction set is taken into account here. The other features of architecture is also important, like cache features, branch prediction features and so on.

Vladimir

YES, other features such as cache properties, branch prediction etccan be important. But usually they are less important than ability to process several values at once (SIMD).I believe that impact of differences in these featuresis not so critical. It is like second order optmization. You can live without it if you still have first order optimization i.e. SIMD.Also it is really hard to make sure that such fine grained optimizationsare doingmore good thanharm. Becausethe answermay depend on the things not known at compile time likepatterns of input, CPU load (other apps running in background and polluting cache), L2 cache size, type and speed of main RAM,etc.

As a matter of fact, IPL version 3.0 with Pentium III optimizations is running perfectly on my Athlon. Overall speed isnotworse than on a comparable Pentium III.Based onthis practical experience, it is hard toacceptswitching to PX

BTW, it would be interesting to take a look atthe non kosher MMX instruction producing wrong resultson Athlon.

You cantry tolink tostatic libraries and use ippStaticInitCpu() ...

Yes, you can, but we do not provide support for such using of IPP on non-Intel processors.

Regards,
Vladimir

Dear zxs,

As Vladimir replied early, the Intel IPP will run on processors that are 100% compatible with Intel Architecture. It's the PX code that should be dispatched on all non-Intel processor-based systems.

Thanks,
Ying S
Intel IPP

Hello all

This post was very interesting for me,but I still have one more question ...

I know you don't provide any support when linking statically and "forcing" a processor type using ippStaticInitCpu, but I would like to know if it's possible to do so while linking dynamically ...

Regards

Marc Baillavoine

Hi,

No, there is no way to force cpu-specific code using for DLLs. These functions (ippStaticInit, ippStaticInitCpu, ippStaticFree) do nothing when they called from DLLs.

Regards,
Vladimir

:/

All right, thanks for this.

Marc

I can recommend the following workaround.

Build custom DLL with all IPP functions.
The custom DLL example is provided in IPP distro.
Include all IPP functions into this DLL.

Since you have complete control on this DLL you can write your own functions to select the best optimizations (W7, T7, A6, PX) depending on cpuid() or configuration file.

Moreover, you can select optimizations on a per-function basis.

Big surprise, there exist functions for which A6 versions run faster on Pentium4 than P4's native T7/W7 versions. You can load them. Also, you can plug your own implementations. My practice shows that it's not impossible to beat IPP

You will probably need a small script that parses IPP headers and creates all-inclusive header file and export.def for your library. The resulting library will be a bit bulky (around 40MB in my case) but you will have a benefit of worrying about just a single file.

It will take a while to link such a DLL. But it needs to be done just once.

I prefer to work with DLL rather than with static lib because of better compatibility (among other reasons). For example, Borland C++ Builder can't use IPP's static libs.

Message Edited by ZXS on 06-04-2004 04:41 PM

Dear ZXS,

Thanks for your help. The point is that I also have size constraints, i.e a dll more than a few hundreds kilobytes is not acceptable to me.

If a build my own custom dll (that's what I'm doing for the moment), can I call directly targeted functions for PIII or PIV ? (like, instead of calling ippiCopyBlock_H263_8u, calling a6_ippiCopyBlock_H263_8u or something like that ?). Of course I have to declare and export them but Do you think it will work ?

> The point is that I also have size constraints,
> i.e a dll more than a few hundreds kilobytes is not acceptable to me.

No prob. In this case you can include only the functions you use. As usual.

> ... calling a6_ippiCopyBlock_H263_8u or something like that ?).

Why not? if you export a6_ippiCopyBlock_H263_8u it should work.
You can even tweak custom DLL to export all possible functions:
1. ippiCopyBlock_H263_8u - the one with auto dispatching
2. px_ippiCopyBlock_H263_8u
3. a6_ippiCopyBlock_H263_8u
4. w7_ippiCopyBlock_H263_8u
5. t7_ippiCopyBlock_H263_8u
Just follow custom DLL's example - check how they redefine IPPAPI macro to achieve different effects.

it should not increase the size of DLL (almost).

Alternatively, you can implement and export a function which would allow the user to spesify function's version to be called.

(pseudo code follows)

typedef void (*FUNC_PTR)(void);
export "C" {
__declspec(dllexport) void myIppSetFuncOptimizations(FUNC_PTR funcPtr, CPU_VERSION);
// search function table for funcPtr and modify
// table's entry according to CPU_VERSION
}

the client will call it like this:
myIppSetFuncOptimizations((FUNC_PTR)ippiCopyBlock_H263_8u, A6)
- this call will search DLL's function table for ippiCopyBlock_H263_8u and modify corresponding entry so that all ippiCopyBlock_H263_8u calls would be redirected to a6_ippiCopyBlock_H263_8u.

One thing I forgot to mention. There are certain limitations on functions to be called inside DLL intialisation routine (e.g. you can't call Advapi's Registry functions - quite disappointing, isn't it?). For more info, see MSDN on DLL_PROCESS_ATTACH.

So you will probably need an additional function myIppInit(CPU_VERSION) that has to be called by all users at their initialization to initialize the entire function dispatch table of your DLL. In this way there will be no such annoying limitations.

You will need to implement some synchronization mechanism for DLL's function table accesess/modifications.

ZXS,

Thanks for all your tricks. I tried everything and it's working fine, even if I'm not that satisfied with that method (calling IPP's that way is fully unsupported, I suppose !)

By the way, thank you very much for having taking time answering to me, that's great !

Marc

Login to leave a comment.