strlen with SSE4.2 instructions

strlen with SSE4.2 instructions

Hi all, this is my first post

Paper "Inside Intel Next Generation Nehalem Architecture" by Ronak Singhal (SP08_NGMS001_100r_eng.pdf) contains comparison of strlen uses PCMPSTRx instruction and ordinal x86-code. SSE4.2 code looks very nice, but what is approximate speedup?

And why scalar x86 code was used? With SSE2 instructions strlen could also be coded; here is my implementation: I'm wondering how faster SSE4.2 code is.

BTW what is latency/throughput of PCMPSTRx instructions? Does latency depend on input data or is constant? I didn't find answers in recent manuals.


publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

PCMPxSTRy offers a rich set of capabilities. There are on-going work in developing more tutorial materials. You can expect more information to roll out in the Fall IDF time frame.

For software developers who might be interested in attending Fall IDF (8/19-8/21). There will be sessions on Intel AVX on Wed. (8/20). On Thursday afternoon, there is an in-depthsession on SSE4.2. Additionally, SSE4.2 will be demo'ed in the advanced technology zone on all three days.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya