Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
westmere
May 1, 2009 5:03 PM PDT
sse4.2 instructions
If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?

I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.

Thanks in advance.
tim18
Total Points:
68,747
Status Points:
68,747
Black Belt
May 1, 2009 6:20 PM PDT
Rate
 
#1
Current Intel and Sun compilers have an explicit SSE4.2 compile option, but I haven't seen a case to show that code such as in
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code.  This requires a compiler with the up to date include file, and (for linux) a binutils 2.9.xx.
There is no current mechanism to take advantage of SSE4.2 instructions without recompilation, although there appear to be several research projects on binary translation.
Existing code which uses parallel move instructions, for example, automatically takes advantage of the improved support of varying alignments in SSE4.2 (and recent AMD) CPUs.


Shih Kuo (Intel)
Total Points:
1,420
Status Points:
920
Brown Belt
May 2, 2009 11:21 AM PDT
Rate
 
#2
Quoting - westmere
If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?

I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.

Thanks in advance.

Many string functions in the runtime library can be sped up using SSE4.2 instructions. Some of them can also be sped up using SSE2 as well. Various compilers are exploring the possibility of drop-in replacement of runtime library functions using newer instruction set. I would keep my fingers crossed that it will happen in the near future.



westmere
May 4, 2009 11:55 AM PDT
Rate
 
#3 Reply to #1
Quoting - tim18
I haven't seen a case to show that code such as in
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code.
Thanks for the response tim18. Any idea if auto-vectorization is planned for future compiler releases, or if gcc -ftree-vectorize -msse4.2 might already do this?


tim18
Total Points:
68,747
Status Points:
68,747
Black Belt
May 4, 2009 1:03 PM PDT
Rate
 
#4 Reply to #3
Quoting - westmere
Thanks for the response tim18. Any idea if auto-vectorization is planned for future compiler releases, or if gcc -ftree-vectorize -msse4.2 might already do this?
I do have an example where ifort -xsse4.2 uses the horizontal dot product, but only in a remainder loop, so it's not significant for performance.  The expectation would be that horizontal dot product would be useful only in limited situations, such as where there is a fixed dot product length of 4.  It may be that the code would be optimized automatically in that situation.
The same examples, with g++ or gfortran 4.5, generate identical code with sse4.1 or sse 4.2 options.  While the gcc/g++/gfortran use of sse4 code shows some consistent performance gains over sse3, sse4.1 isn't used in the same ways in my code samples by gcc and Intel compilers, with the exception of the _mm_set_ps, where both compilers shift to sse4.1 code (so it's not necessary to shift source code to the corresponding sse4.1 intrinsic).   g++ 4.5 has more effective auto-vectorization than previous g++.
I haven't found any use of sse4 code by the Sun compilers, but they frequently vectorize effectively for sse4.2 CPUs, using sse instructions, even in a few situations where the others don't.
The marketing people usually miss several points:  the few situations where new instructions are beneficial are far outnumbered by those where the old instructions may be optimized better for the new CPUs.  There isn't sufficient incentive to make applications incompatible with older CPUs, when the AVX instruction set will offer real gains in a year or two.


westmere
May 4, 2009 4:40 PM PDT
Rate
 
#5 Reply to #4
Thanks for the help tim18.


Shih Kuo (Intel)
Total Points:
1,420
Status Points:
920
Brown Belt
October 15, 2009 10:48 PM PDT
Rate
 
#6
Quoting - westmere
If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?

I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.

Thanks in advance.

You might want to check out the the alpha code of Glibc 2.11 string and memory functions. It includes multi-arch support so that the library can be configured and built to recognize what ISA is available and your existing code calling string and memory functions of Glibc 2.11 will execute using SIMD code on Nehalem, Penryn, Merom based processors. 

You might also be interested in SSE4.2 example that speeds up string to integer conversion function. One such example is shown in the latest Optimization manual.     
http://www.intel.com/products/processor/manuals/index.htm

Shih Kuo (Intel)
Total Points:
1,420
Status Points:
920
Brown Belt
November 2, 2009 9:54 AM PST
Rate
 
#7 Reply to #6

You might want to check out the the alpha code of Glibc 2.11 string and memory functions. It includes multi-arch support so that the library can be configured and built to recognize what ISA is available and your existing code calling string and memory functions of Glibc 2.11 will execute using SIMD code on Nehalem, Penryn, Merom based processors. 

You might also be interested in SSE4.2 example that speeds up string to integer conversion function. One such example is shown in the latest Optimization manual.     
http://www.intel.com/products/processor/manuals/index.htm

Latest news from the Glibc front.
http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html



Intel Software Network Forums Statistics

8470 users have contributed to 31601 threads and 100650 posts to date.
In the past 24 hours, we have 29 new thread(s) 115 new posts(s), and 162 new user(s).

In the past 3 days, the most popular thread for everyone has been gemm(A,A,A) like possible? The most posts were made to gemm(A,A,A) like possible? The post with the most views is Dear Steve, excuse me for a d

Please welcome our newest member kopernikus