STTNI instructions in SSE4.2

STTNI instructions in SSE4.2

I have found only the basic information on the internet.

Can someone tell me more about it, in a less formal way?


16 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

You can find all the SSE4.2 instructions in the intel documentation. If you want the short version, take a look at All the instructions are described.
Now, if you want to use it, you can use the intrinsic.h (if I'm not wrong), and call for example_mm_cmpistri(xmm1, xmm2) if you want to compare two strings. Before using 128 bit variables (xmm1 and xmm2), you must load thoses values as follow :__m128i xmm1 = _mm_loadu_si128 ((__m128i *) p1); //Where p1 is a pointer on your c stringThere is also an other way to use it, which is more fun, it by direct asm calls. You can une the __asm__() function. You can find all the documentation of this function in the gcc doc.Have fun :)

Beside MSDN there is not much documentation about SSE4.2. The best way to learn is to test it.Personnaly I use some SSE2 and SSE4.2 instruction but I didn't gain a lot of performance with SSE4.2, my advice is to start with SSE2 and, if you have time, try SSE4.2

It's a bummer for me as I don't have an SSE 4.2 processor available to play with.
Aren't there any, I don't know, virtualization gimmick I could use to test against?

I found this it might be helpfull

Really interesting, thank you shakal187 :-)

Hi Shakal187,You can post your link here that it can be found easily

could anybody use it ?

does it work ? i can't still compile anything of this stuff.
icc doesn't work, but can i use sttni with g++ ? i've read i could, byt it doesn't work.

any help is always welcome :)

best regards

You can use STTNI with GCC, look atnmmintrin.h (, don't forget to add the flag-msse4.2 (which is not activated by default)

@Fuchsicpc or g++ -msse4.2 are enough from the compile point of view ... header and functions need to be in place - most likely you will need to do conversions to m128i types and back.@flesti regarding a previous post, i don't know you're code but SSE4.2 should bring a significant improvement over SSE2 in this specific case.

i've already have it, thanks.
i forgot

include <xmmintrin.h>

i still does't work.. but i try :)

@Griore I've tested my code with and withoutSSE4.2 (switching SSE4.2 instruction with SSE2 instruction + some code) and it's run only 5% faster with SSE4.2

Hi. I want just to use 128 bit integer instaed of 64 bit. Could you give me some help, how it works ?

__m128i z;

how can i handle it in my normal programm ?
i have to initilize it, i have to compare it, but i fear it is too hard.


I think you'll have to do it all through assembly or intrinsics. There are many different useful instructions. Take a look at the Intel Intrinsics Guide (java app) at the bottom of this page, it might help you a bit:
If this is your first time, it might be a bit time consuming to get a hang of it, it is for me :)

Best regards,

With intel compiler you can use classes to manage vector operations, take a look at the first optimization manual "Optimizing software in C++" from Agner Fog( 122 (Using vector classes).unfortunatelytheses classes are not defined with gcc,you have to define them yourself.

Is it the same use of intrisncs in linux and windows? And are the machines people from intel test our codes an AVX enabled machines?

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui