strlen with SSE4.2 instructions

strlen with SSE4.2 instructions

Hi all, this is my first post

Paper "Inside Intel Next Generation Nehalem Architecture" by Ronak Singhal (SP08_NGMS001_100r_eng.pdf) contains comparison of strlen uses PCMPSTRx instruction and ordinal x86-code. SSE4.2 code looks very nice, but what is approximate speedup?

And why scalar x86 code was used? With SSE2 instructions strlen could also be coded; here is my implementation: I'm wondering how faster SSE4.2 code is.

BTW what is latency/throughput of PCMPSTRx instructions? Does latency depend on input data or is constant? I didn't find answers in recent manuals.


3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

PCMPxSTRy offers a rich set of capabilities. There are on-going work in developing more tutorial materials. You can expect more information to roll out in the Fall IDF time frame.

For software developers who might be interested in attending Fall IDF (8/19-8/21). There will be sessions on Intel AVX on Wed. (8/20). On Thursday afternoon, there is an in-depthsession on SSE4.2. Additionally, SSE4.2 will be demo'ed in the advanced technology zone on all three days.

Leave a Comment

Please sign in to add a comment. Not a member? Join today