Dear all!
I have a problem.
I have ifort installed on a big cluster.
Linux access1 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat Nov 12 15:11:58 CST 2011 x86_64 x86_64 x86_64 GNU/Linux
Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.0.3.174 Build 20110309
Here is cpuinfo
Intel Xeon CPU X5670
===== Processor composition =====
Processors(CPUs) : 24
Packages(sockets) : 2
Cores per package : 6
Threads per core : 2
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 0 2 0
5 0 2 1
6 0 8 0
7 0 8 1
8 0 9 0
9 0 9 1
10 0 10 0
11 0 10 1
12 1 0 0
13 1 0 1
14 1 1 0
15 1 1 1
16 1 2 0
17 1 2 1
18 1 8 0
19 1 8 1
20 1 9 0
21 1 9 1
22 1 10 0
23 1 10 1
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,1,2,8,9,10 (0,12)(2,14)(4,16)(6,18)(8,20)(10,22)
1 0,1,2,8,9,10 (1,13)(3,15)(5,17)(7,19)(9,21)(11,23)
===== Cache sharing =====
Cache Size Processors
L1 32 KB (0,12)(1,13)(2,14)(3,15)(4,16)(5,17)(6,18)(7,19)(8,20)(9,21)(10,22)(11,23)
L2 256 KB (0,12)(1,13)(2,14)(3,15)(4,16)(5,17)(6,18)(7,19)(8,20)(9,21)(10,22)(11,23)
L3 12 MB (0,2,4,6,8,10,12,14,16,18,20,22)(1,3,5,7,9,11,13,15,17,19,21,23)
What is really bothering me is that I can not go to optimization level higher than O1. Otherwise (O2, O3) my program crashes with segfault.
The program itself is perfectly reliable. It can be optimized to any level on another cluster
Linux t60-2.parallel.ru 2.6.18-skif-rhel-alt13.M41.3 #1 SMP Tue Feb 2 12:09:59 MSK 2010 x86_64 GNU/Linux
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20091012 Package ID: l_cprof_p_11.1.059
Intel Xeon Processor (Intel64 Harpertown)
===== Processor composition =====
Processors(CPUs) : 8
Packages(sockets) : 2
Cores per package : 4
Threads per core : 1
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 0 2 0
5 0 2 1
6 0 3 0
7 0 3 1
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,1,2,3 0,2,4,6
1 0,1,2,3 1,3,5,7
===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 6 MB (0,2)(1,3)(4,6)(5,7)
I also can use any optimixzation level on i7 processor with the latest Composer 2011.
So, my quetsion to experts is:
What could be a problem?
Is it another uncertainty in the compiler or is it something else?
Thanks in advance
S.Savinov
P.S.
I must admit that the program is runnig 2.5 times faster on X5670 with O1.





