strlen with SSE4.2 instructions

strlen with SSE4.2 instructions

imagem de wmula

Hi all, this is my first post

Paper "Inside Intel Next Generation Nehalem Architecture" by Ronak Singhal (SP08_NGMS001_100r_eng.pdf) contains comparison of strlen uses PCMPSTRx instruction and ordinal x86-code. SSE4.2 code looks very nice, but what is approximate speedup?

And why scalar x86 code was used? With SSE2 instructions strlen could also be coded; here is my implementation: http://wmula.republika.pl/proj/sse2string/src/strlen.S. I'm wondering how faster SSE4.2 code is.

BTW what is latency/throughput of PCMPSTRx instructions? Does latency depend on input data or is constant? I didn't find answers in recent manuals.

w.

3 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Shih Kuo (Intel)

PCMPxSTRy offers a rich set of capabilities. There are on-going work in developing more tutorial materials. You can expect more information to roll out in the Fall IDF time frame.

imagem de Shih Kuo (Intel)

For software developers who might be interested in attending Fall IDF (8/19-8/21). There will be sessions on Intel AVX on Wed. (8/20). On Thursday afternoon, there is an in-depthsession on SSE4.2. Additionally, SSE4.2 will be demo'ed in the advanced technology zone on all three days.

Faça login para deixar um comentário.