-xSSE4.1 fatal error on E5450

-xSSE4.1 fatal error on E5450

Hello,
when I try to run libquantum (as included in SPEC CPU2006) previously compiled using -xSSE4.1 flag I receive the following error message:

Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel processors with Swing New Instructions support.

but CPU is a E5450 that sould support SSE4.1
icc: v11 build 081
OS: openSUSE10.3

Could you pelase help me?

Simone

18 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Simone,
The message basically means that the generated code for SSE4.1 apparently is not finding SSE4.1 instructions on the processor you're running it on. The link http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/ shows you some detail on the options you can use and what that generates for SSE. I don't know exactly what ES450 supports, but can you do a quick dump of the cpu info and what it supports in terms of SSE? Accordingly, you can then use appropriate option to generate code for SSE supported on your processor and it should work.
Can you attach the SSE supported info of your CPU info dump? In the meantime, I'll look into this info offline also, just FYI
-regards,
Kittur

cpuinfo doesn't show anything about SSE4 on these CPUs, although S5450 clearly supports SSE4.1 in hardware (and when running a supported OS). The quoted message is identical to what I get with the same OS running on Core 2 Duo, if I compile main() with -xSSE4.1, so it seems that the icc run-time library function isn't distinguishing this CPU from Core 2 Duo, as it might not if it drops back to using only /proc/cpuinfo for identification.
I don't see anything in cpuinfo, except for cache size, to distinguish this CPU from Core 2 Duo, as there is no model number string on the Penryn CPUs to which I have access.
It's unusual for -xSSE4.1 to produce useful optimization for these CPUs, beyond what -mSSE3 provides. Does -msse4 allow for generation of SSE4.1 code without CPU identity check? If the gcc is new enough to include the -msse4 option, does that work on this CPU?

Quoting - Kittur Ganesh (Intel)

Hi Simone,
The message basically means that the generated code for SSE4.1 apparently is not finding SSE4.1 instructions on the processor you're running it on. The link http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/ shows you some detail on the options you can use and what that generates for SSE. I don't know exactly what ES450 supports, but can you do a quick dump of the cpu info and what it supports in terms of SSE? Accordingly, you can then use appropriate option to generate code for SSE supported on your processor and it should work.
Can you attach the SSE supported info of your CPU info dump? In the meantime, I'll look into this info offline also, just FYI
-regards,
Kittur

Hi you all,

The case expressed by Simone is mine, too (icc: v11 build 081 ; OS: openSUSE10.3).
According to the page you linked, E5450 supports SSE4.1. However, /proc/cpuinfo does not mention this flag (an excerpt follows). Any idea about the reason why?

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel Xeon CPU E5450 @ 3.00GHz
stepping : 10
cpu MHz : 2992.498
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 ssse3 lahf_lm
bogomips : 5985.16
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

Hi,
I tried on a Core 2 duo system that has sse4.1 support and it works fine. Also, when I dump the cpuinfo I do get the following:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 sse4_2 popcnt lahf_lm ida

Also, when I run that exec on a system that has no sse4.1 support I do get the message that I need to run on the right processor. So, I need to check with our developers on why you're getting that error on that system even though you say that the E5450 does support. I'll update you as soon as I've some info, just FYI.

-regards,
Kittur

Quoting - Kittur Ganesh (Intel)

Hi,
I tried on a Core 2 duo system that has sse4.1 support and it works fine. Also, when I dump the cpuinfo I do get the following:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 sse4_2 popcnt lahf_lm ida

Also, when I run that exec on a system that has no sse4.1 support I do get the message that I need to run on the right processor. So, I need to check with our developers on why you're getting that error on that system even though you say that the E5450 does support. I'll update you as soon as I've some info, just FYI.

-regards,
Kittur

BTW, I just found out that it boils down to cpuid and doesn't rely on /proc/cpuinfo - so that's ruled out. So, looks like either that processor is not supporting sse41 or it could be a bug in the compiler too (in case it's checking the wrong bit may be?)

One thing you can quickly do to find out if your processor is indeed supporting sse41 is to add an SSE41 intrinsic in your test code, and compile with gcc? If it says it's illegal instruction, then obviously the proc is not supporting sse41
Just a thought...

-regards,
Kittur

Likewise, if you compile with -msse4 rather than -xsse4, you won't get the unsupported CPU message; you will simply attempt to execute SSE4.1 instructions, and see the following message if it is not an sse4.1 CPU:
forrtl: severe (168): Program Exception - illegal instruction

Quoting - tim18
Likewise, if you compile with -msse4 rather than -xsse4, you won't get the unsupported CPU message; you will simply attempt to execute SSE4.1 instructions, and see the following message if it is not an sse4.1 CPU:
forrtl: severe (168): Program Exception - illegal instruction

Hi Simone and all ....
I see in this exchange ???
model name : Intel Xeon CPU E5450 @ 3.00GHz
stepping : 10
cpu MHz : 2992.498
cache size : 6144 KB

Curious i am not service Intel technical , but ....
Model E5450 have 12MB cache I think ??? no 6144 KB ???
Verify:
http://www.intel.com/pressroom/kits/quickreffam.htm

use command
dmidecode
cat /proc/cpuinfo to verify if your firmware service processor is
updated correctly (or Bios if an assembly machine)

What is your number Kernel oprational in Suse 10.3 (/usr/modules/*) ?

(just cut and paste this all entire line command in your root shell) +(Enter)

%&dmidecode > info_myxeon.txt & %&cat /proc/cpuinfo >> info_myxeon.txt & %&ls /lib/modules >> info_myxeon.txt & %&vi info_myxeon.txt

(To exit vi (ESC) (:) (x or q) (Enter)
You having now files info_myxeon.txt in the current directory
Best regards

This issue seems to be relates to OS release:
I did a clean reinstall of the system using openSUSE 11.1 (kernel 2.6.27.7-9.1) and sse4_1 flag is corretly detected, while openSUSE10.3 (kernel 2.6.22.5-31) detect only up to sse3.
Does anybody knows how to let openSUSE 10.3 detec sse4_1 features?

Thanks a lot for your support.

Quoting - bustaf

Hi Simone and all ....
I see in this exchange ???
model name : Intel Xeon CPU E5450 @ 3.00GHz
stepping : 10
cpu MHz : 2992.498
cache size : 6144 KB

Curious i am not service Intel technical , but ....
Model E5450 have 12MB cache I think ??? no 6144 KB ???
Verify:
http://www.intel.com/pressroom/kits/quickreffam.htm

I think we can explain this 6MB taking care that E5450 has 12MB cache per chip divided into 2 6MB blocks shared by two cores (please reref to Intel engineers for detailed information E54xx architecture)

Regards,

Quoting - Simone Tinti

I think we can explain this 6MB taking care that E5450 has 12MB cache per chip divided into 2 6MB blocks shared by two cores (please reref to Intel engineers for detailed information E54xx architecture)

Yes, the 6MB cache size is shown in /proc/cpuinfo on the E5450 chip installations accessible to me, and that is useful evidence. I haven't seen one which had the E5450 designation visible. A non-SSE4.1 chip would show at most 4MB cache. As Kittur explained, the icc run-time library is not using cpuinfo or the parameters displayed there to determine whether you have an SSSE3 or SSE4.1 CPU.
So far, there has been no explanation of why -msse4, -mSSE4.1, -mSSSE3, or other options could not be tried, in case one of those might be a satisfactory solution, given that the chosen operating system will not get full support from icc.

Quoting - tim18

Quoting - Simone Tinti
I think we can explain this 6MB taking care that E5450 has 12MB cache per chip divided into 2 6MB blocks shared by two cores (please reref to Intel engineers for detailed information E54xx architecture)

Yes, the 6MB cache size is shown in /proc/cpuinfo on the E5450 chip installations accessible to me, and that is useful evidence. I haven't seen one which had the E5450 designation visible. A non-SSE4.1 chip would show at most 4MB cache. As Kittur explained, the icc run-time library is not using cpuinfo or the parameters displayed there to determine whether you have an SSSE3 or SSE4.1 CPU.
So far, there has been no explanation of why -msse4, -mSSE4.1, -mSSSE3, or other options could not be tried, in case one of those might be a satisfactory solution, given that the chosen operating system will not get full support from icc.

Hi
You can read link given cache is 12 MO, If you have only half you having fault firmware machine or
processor fault or inappropriate kernel ...
An correct computer must give exactly as supposed existing.
Accorded with you for flags, i think machine can also work very well same

Not acceptable for Linux users....
(given that the chosen operating system will not get full support from icc.)

If true i think well to upgrade your compiler or wrote in the documentation this eventuality as clear.
Not accorded with you, i think that this compiler
can improve benefit performance exactly same Linux and
and Microsoft operating system with probability same number faults
All problems can be or must be resolved. (flag correctly and cache size)
Not acceptable to hide with same answers without subjects..(blured)
can result decreasing potential new users Linux for nothing.
To explain clear I think that with original parameters Linux kernel not modified and process ksoftirqd you having very small
chance to use 1/4 or less full cache size processors also must having
very great experience programming to control exactly crossing affinity .

Best regards

Added
Hi Simone
About cache
If you think that 6 * 2
cat /proc/cpuinfo must show 2 processors with 2 core

Processor 0 have 6 M and 2 core (not quad given)
Processor 1 have 6 M and 2 core (not quad given)

Not as that given at exchange...

I have read information for this processor ,
I think probably problem existing .....

I dont know if can resolve problem ,i have see an package microcode.ctl ? ( debian 4) (is given as microcode dynamic for Intel processors)???
I have install in machine machine as allready worked without problem , difficult to evaluate change ...

Best reagrds
Last
For verifying I have request distant command with SSH on a server (Debian) that I installed a little
ago time ,a friend .. (I have make upgrade this machine last firmware)
cache fault Same ???? , but, * 8
better that I not inform the friend ..... I hope that you having true ,and me i having false ...
If you see 6, I don't know how you can drive or use 12 ????. (also core id 0,1,2,3 *2 ???)
Probably existing patch correction last kernel to resolve if really problem.
Not same problem if really, that can decrease performance server
work very well using GCC ,also without using ICC..

I think easy to verify see chapter (EXAMPLE) mounted processor at end part of document
I cant not make is not my machine.. and with also ssh
access not easy (I have mounted system with several security)

Link:
http://www.linux.gr/cgi-bin/man/man2html?cpuset+7

Or an engineer can give an intelligent piece of code that shows that 2 * 6 -> 4 hearts are managed by the system (same kernel) ,without exhaust literature incomprehensible...

If not, no problem it is not possible to buy less price same processor have only 6 M cache..
but at least we know... and problem can be study for eventually corrected kernel..

I believe that some times if you do not want to lose money for task asked , if no problem, it
is preferable to utilser default without asking too questions without probably answers....

hidexxxxxx:~# cat /proc/cpuinfo:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel Xeon CPU E5450 @ 3.00GHz
stepping : 6
cpu MHz : 2992.161
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est
tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority
bogomips : 5984.32
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 1
etc ....

Quoting - Kittur Ganesh (Intel)

Hi,
I tried on a Core 2 duo system that has sse4.1 support and it works fine. Also, when I dump the cpuinfo I do get the following:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 sse4_2 popcnt lahf_lm ida

Also, when I run that exec on a system that has no sse4.1 support I do get the message that I need to run on the right processor. So, I need to check with our developers on why you're getting that error on that system even though you say that the E5450 does support. I'll update you as soon as I've some info, just FYI.

-regards,
Kittur

Kittur, that doesn't seem like a Core 2 Duo to me, I have yet to see one with SSE4.2, DCA and POPCNT support. You either have some Core 2 Duo engineering sample or a Nehalem chip there.

As for the problem at hand:

OS kernel detection of CPU features and cache size doesn't have anything to do with compiler detection. As long as the SSE2/SSE3 code works the OS has configured the CPU correctly to support SIMD.

That said, E5450 supports SSE4.1, so it must be an error in the compiler detection code somehow.

Regards,
Igor Levicki

The processor has 12MB cache divided into two 6MB L2 caches. The CPUID when issued on any hardware thread will return the available cache for that hardware thread. In this case the L2 cache will show a size of 6MB. Different pages of the CPUID will provide information for you to determine the number of such L2 caches (as well as L1, and for later processors L3).

As for using all of your cache in your application (12MB), consider pinning software threads to hardware threads and determining which software threads use which L2 cache. Then, intellegently divide your work into two pieces placing half your L2 cache sensitive work for use on one/two of the threads sharing one of the L2 caches and the other half of the L2 cache sensitive work for use on one/two of the other threads sharing the second of the L2 caches. This will require a little more programming effort, but if your data and work can benifit by division in two and seggregation then the results will be well worth the effort.

Consider these techniques:

#pragma omp parallel
{
  // Objects carries designated thread number
  int myThreadNum = omp_get_thread_num();
  int i;
  for(i=0; i myThreadNum) ? 1 : 0);
  for(i=Begin; i < End; ++i)
    DoWork(Object[i]);
}
// or
#pragma omp parallel
{
  // modulus of Objects dedicated to thread number
  int myThreadNum = omp_get_thread_num();
  int numThreads = omp_get_num_threads();
  int i;
  for(i=myThreadNum; i    DoWork(Object[i]);
}
(paste of code snippets seems to have problems)






Jim Dempsey
www.quickthreadprogramming.com

Quoting - Igor Levicki

Kittur, that doesn't seem like a Core 2 Duo to me, I have yet to see one with SSE4.2, DCA and POPCNT support. You either have some Core 2 Duo engineering sample or a Nehalem chip there.

As for the problem at hand:

OS kernel detection of CPU features and cache size doesn't have anything to do with compiler detection. As long as the SSE2/SSE3 code works the OS has configured the CPU correctly to support SIMD.

That said, E5450 supports SSE4.1, so it must be an error in the compiler detection code somehow.

Hi Igor,
Yes, you're correct (has a Nehalem chip). Anyways,. I tried on a few systems that support sse4.1 and compiler detects fine. Looks like we need to really try it on W5450 system (don't have one in our lab though) to see if the problem can be reproduced? I'll touch base with some developers in the meantime on this again, FYI.
-regards,
Kittur

Quoting - Kittur Ganesh (Intel)

Hi Igor,
Yes, you're correct (has a Nehalem chip). Anyways,. I tried on a few systems that support sse4.1 and compiler detects fine. Looks like we need to really try it on W5450 system (don't have one in our lab though) to see if the problem can be reproduced? I'll touch base with some developers in the meantime on this again, FYI.
-regards,
Kittur

Seems that I can smell Nehalem by its feature flags from 7,500 miles away :-)
Looking forward to learn what causes the problem.

Regards,
Igor Levicki

Hi Simone,
Could you please run the attached program (d.c) on that system and tell us what output you got? Appreciate much.
-regards,
Kittur

Attachments: 

AttachmentSize
Downloadtext/x-csrc d.c2.5 KB

Quoting - Simone Tinti

Quoting - bustaf

Hi Simone and all ....
I see in this exchange ???
model name : Intel Xeon CPU E5450 @ 3.00GHz
stepping : 10
cpu MHz : 2992.498
cache size : 6144 KB

Curious i am not service Intel technical , but ....
Model E5450 have 12MB cache I think ??? no 6144 KB ???
Verify:
http://www.intel.com/pressroom/kits/quickreffam.htm

I think we can explain this 6MB taking care that E5450 has 12MB cache per chip divided into 2 6MB blocks shared by two cores (please reref to Intel engineers for detailed information E54xx architecture)

Regards,

Hi all
I have make test with an program on two different machines (IBM xSeries 445,HP Server DL580 G2) with recent Linux kernels
no problem ,that showed cpuinfo 6M is correct. I wrong understood , 6M x 2 shared corresponding exactly, also shared cache same can be better results some times.
I put both the two machines in load extreme, (an half hours situation impossible reality )
working perfectly. I am happy i have not reinstalling the 2 servers ...
I 'did not execute code (d.c) ,there with me the engineers of quality control, I fear
code result create an new problem....
Jim, thank for your sample source.
Best regards

Leave a Comment

Please sign in to add a comment. Not a member? Join today