| Thread Tools | Search this thread |
|---|
svyatoslav.korneev
| July 6, 2009 4:24 AM PDT Cluster 2D FFT very Slow, Why? | ||||
Hellow. I have problem. Intel Cluster FFT example (/opt/intel/Compiler/11.0/083/mkl/examples/cdftf) execute very slow on my cluster. And if I increase number of process, execution time decrease. Execution time statistic for "STATUS = DftiComputeForwardDM(DESC,LOCAL)", field 512*512 (first column MPI_RANK, second execution time per sec): DFTI_FORWARD_DOMAIN = DFTI_COMPLEX DFTI_PRECISION = DFTI_DOUBLE DFTI_DIMENSION = 2 DFTI_LENGTHS = (512,512) DFTI_FORWARD_SCALE = 1.0 DFTI_BACKWARD_SCALE = 1.0/(M*N) CREATE= 0 8 process: 0 0.2209660 7 0.2209670 1 0.2229670 6 0.2209670 3 0.2229670 4 0.2229670 2 0.2229670 5 0.2219670 16 process: 0 0.2129680 3 0.2129680 1 0.2129680 6 0.2129680 4 0.2129680 5 0.2129670 2 0.2129680 7 0.2129670 13 0.2389640 9 0.2389640 15 0.2389640 11 0.2389630 12 0.2389640 14 0.2389630 8 0.2389630 10 0.2389640 32 process: 0 0.5439169 5 0.5519149 1 0.5519161 7 0.5519171 3 0.5519159 4 0.5529160 28 0.3739430 13 0.5509160 18 0.2789580 6 0.5019231 2 0.5539160 9 0.5529160 12 0.5499170 8 0.5529151 15 0.5509162 11 0.5509150 14 0.5509160 10 0.5509150 20 0.2789570 16 0.2789580 21 0.2789570 17 0.2789590 22 0.2789570 19 0.2789580 23 0.2789580 24 0.3739420 27 0.3739440 31 0.3739430 25 0.3739430 29 0.3739440 30 0.3739430 26 0.3739430 64 process: 30 1.019846 49 0.3459470 45 0.3499470 5 1.026844 0 0.3339500 2 1.021845 6 1.031843 1 1.024845 4 1.027844 3 1.022845 7 1.024844 58 0.3379490 21 1.008847 13 1.020845 33 0.6359040 31 1.023844 27 1.030843 29 1.026844 25 1.027844 28 1.016845 24 1.027843 26 1.031843 52 0.3439469 48 0.3469470 53 0.3429482 51 0.3449471 55 0.3409491 54 0.3419471 50 0.3459470 32 1.012846 38 0.3569450 37 0.3579450 36 0.4479311 35 0.4559300 39 0.3559461 34 0.4559309 44 0.3499467 41 0.3529470 40 0.3539469 46 0.3489470 47 0.3479462 43 0.3509469 42 0.3519461 59 0.3379490 57 0.3369482 62 0.3349490 63 0.3339500 61 0.3359480 60 0.3379490 56 0.2829571 10 1.019846 9 1.027843 15 1.024845 14 1.018845 8 1.020845 11 1.020844 12 1.018845 17 1.013845 18 1.014845 22 1.010846 20 1.015846 19 1.010846 23 1.010846 16 1.016845 Cluster one module config: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz stepping : 6 cpu MHz : 2333.423 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4670.17 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz stepping : 6 cpu MHz : 2333.423 cache size : 4096 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4666.87 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz stepping : 6 cpu MHz : 2333.423 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4666.79 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz stepping : 6 cpu MHz : 2333.423 cache size : 4096 KB physical id : 3 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4666.78 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Cluster have 990 such modules. Cluster start one process per one core. I make example by: make libem64t mpi=mpich interface=ilp64 Help me please, why it's so slow. Svyatoslav | |||||
|
|||||||||||||
| 8285 users have contributed to 31229 threads and 99106 posts to date. |
|---|
| In the past 24 hours, we have 13 new thread(s) 50 new posts(s), and 68 new user(s). In the past 3 days, the most popular thread for everyone has been comparison cilk++, openmp, pthreads first results The most posts were made to comparison cilk++, openmp, pthreads first results The post with the most views is Very amusing... Escalated as Please welcome our newest member tvinni |