Pardiso does not want to utilize dual core

Pardiso does not want to utilize dual core

Hello dear community members,
I am trying to getpardiso_unsym_c.c running on my dual core machine under Ubuntu 9.10 and use both cores.I have tried export MKL_NUM_THREADS=2Changediparm[2] = 2;And then running make which is doing:icc -xK -w -I/opt/intel/Compiler/11.1/069/mkl/include source/pardiso_unsym_c.c -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_solver.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel.a -Wl,--start-group "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel_thread.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_core.a -Wl,--end-group -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" -liomp5 -lpthread -lm -o _results/intel_parallel_32_lib/pardiso_unsym_c.outexport LD_LIBRARY_PATH="/opt/intel/Compiler/11.1/069/mkl/lib/32":/opt/intel/Compiler/11.1/069/lib/ia32:/opt/intel/Compiler/11.1/069/ipp/ia32/sharedlib:/opt/intel/Compiler/11.1/069/mkl/lib/32:/opt/intel/Compiler/11.1/069/tbb/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.1/069/mkl/lib/32; _results/intel_parallel_32_lib/pardiso_unsym_c.out >_results/intel_parallel_32_lib/pardiso_unsym_c.resicc -xK -w -I/opt/intel/Compiler/11.1/069/mkl/include source/pardiso_unsym_c.c -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_solver.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel.a -Wl,--start-group "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel_thread.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_core.a -Wl,--end-group -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" -liomp5 -lpthread -lm -o _results/intel_parallel_32_lib/pardiso_unsym_c.outexport LD_LIBRARY_PATH="/opt/intel/Compiler/11.1/069/mkl/lib/32":/opt/intel/Compiler/11.1/069/lib/ia32:/opt/intel/Compiler/11.1/069/ipp/ia32/sharedlib:/opt/intel/Compiler/11.1/069/mkl/lib/32:/opt/intel/Compiler/11.1/069/tbb/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.1/069/mkl/lib/32; _results/intel_parallel_32_lib/pardiso_unsym_c.out >_results/intel_parallel_32_lib/pardiso_unsym_c.resBut getting:Statistics: =========== < Parallel Direct Factorization with #processors: > 1 Anybody know why that may be happening and what can I do to fix it?Here iscat /proc/cpuinfoprocessor : 0vendor_id : GenuineIntelcpu family : 6model : 15model name : Intel Core2 Duo CPU T5550 @ 1.83GHzstepping : 13cpu MHz : 1000.000cache size : 2048 KBphysical id : 0siblings : 2core id : 0cpu cores : 2apicid : 0initial apicid : 0fdiv_bug : nohlt_bug : nof00f_bug : nocoma_bug : nofpu : yesfpu_exception : yescpuid level : 10wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lmbogomips : 3657.02clflush size : 64power management:processor : 1vendor_id : GenuineIntelcpu family : 6model : 15model name : Intel Core2 Duo CPU T5550 @ 1.83GHzstepping : 13cpu MHz : 1000.000cache size : 2048 KBphysical id : 0siblings : 2core id : 1cpu cores : 2apicid : 1initial apicid : 1fdiv_bug : nohlt_bug : nof00f_bug : nocoma_bug : nofpu : yesfpu_exception : yescpuid level : 10wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lmbogomips : 3657.51clflush size : 64power management:Thanks in advance for any help.

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello Slava,Do you mean pardiso_unsym_c.c from solver's examples?\examples\solver\source\--Gennady

Quoting Gennady Fedorov (Intel)
Hello Slava,Do you mean pardiso_unsym_c.c from solver's examples?\examples\solver\source\--Gennady

Hi Gennady,
yes, I run it from there.

Slava, the input data of this example ( 5x5, nnz==13) is extremely small to see the multithreaded advantages of PARDSIO.For such small inputs, PARDISO will always run in serial mode.--Gennady

Thanks for your reply.
I have modified code of example to try at least 50x50 or 400x400 but now I get segmentation fault.This is how I fill in input matrix:

MKL_INT n = 50; /* 5 */

MKL_INT i,j,z;

double b[n], x[n], a[n*n], o;

z=0;

for(i=0; i

for(j=0; j

o = (i+j)/0.89;

if(i==j-1) o = 10;

if(i==j+1) o = 2;

a[z] = o;

z++;

}

}

for(i=0; i

MKL_INT ia[n+1];

for(i=0;i

ia[n]=n*n+1;

MKL_INT ja[n*n];

for(z=0;z<(n*n);z++) for(i=0;i

Maybe because it does not have zeroes?

Please check the CSR format first of all - to check sparse matrix representation.

iparm(27) -
matrix checker. Please refer to the MKL manual for details.

--Gennady

Thanks Gennady, it was an issue with input data.
Now I am running performance tests and trying to compare difference with one core and two cores involved.I must admit I am a little puzzled. I am generating a 4000 x 4000 system and this is how long it takes to solve it:Times: ====== Time fulladj: 0.650926 s Time reorder: 1.434606 s Time symbfct: 0.380290 s Time malloc : 202.740895 s Time total : 206.280643 s total - sum: 1.073926 s As you can see,Time malloc takes almost all of computing time. I wonder why?And the only difference between 1 and 2 cores is in this bit:Summary PARDISO: ( factorize to factorize ) ================ Times: ====== Time A to LU: 0.000000 s Time numfct : 9.179849 s Time malloc : 0.000039 s Time total : 9.179889 s total - sum: 0.000001 sgflop/s for the numerical factorization: 4.647834And with 2 cores involved it isSummary PARDISO: ( factorize to factorize ) ================ Times: ====== Time A to LU: 0.000000 s Time numfct : 5.419863 s Time malloc : 0.000037 s Time total : 5.419902 s total - sum: 0.000001 sgflop/s for the numerical factorization: 7.872242But total times is something irrelevant:Times: ====== Time fulladj: 0.649804 s Time reorder: 1.430529 s Time symbfct: 0.333696 s Time parlist: 0.000537 s Time malloc : 202.318808 s Time total : 205.811701 s total - sum: 1.078326 s Anybody knows why?

Hi Slava,I am a little puzzled as well with such results (ime malloc : 202.740895 s ) -:).Your input is pretty small for sparse solvers. As an example in this threadTime malloc : 0.825073 s for allocation ~1.5*10^9 nnz.I have noguesses yetwhy it happens. need to do some experiments. Can you give moredetailsabout your system's?CPU type, RAM, 32 or 64 bit...--Gennady

Hi Gennady,
thanks for trying to help :)CPU is Intel Core2 Duo CPU T5550 @ 1.83GHz - you can see this info earlier in the thread.System installed is ubuntu 9.10, 32 bit version, compiling with lib32.This machine has 3 GBs of RAM.

I think it will help if I post source code here, this is just changed file from examples folder.Just change n variable to set matrix dimension.

http://software.intel.com/file/26550

I don'tunderstandthis statement:for(z=0;z<(n*n);z++) for(i=0;i

Sorry Gennady, I have probably attached wrong file, this string should look like:

for(z=0;z
I have re-uploaded the file.

Attachments: 

AttachmentSize
Download pardiso_unsym_c.c15 KB
Best Reply

Slava,you wrote: Time malloc : 202.740895 s Time total : 206.280643 s total - sum: 1.073926 s Time malloc : 202.740895 s Time total : 206.280643 s total - sum: 1.073926 sCould you please switch off the matching mechanism ( set iparm[12] == 0) and see the results.--Gennady

Gennady, this has made a hugedifference. Thank you!
Now 5000 x 5000 runs a lot faster.I have a question however, those are times:Summary PARDISO: ( reorder to reorder ) ================ Times: ====== Time fulladj: 0.999903 s Time reorder: 2.324973 s Time symbfct: 0.587419 s Time parlist: 0.000738 s Time malloc : 0.553460 s Time total : 5.853783 s total - sum: 1.387291 sSummary PARDISO: ( factorize to factorize ) ================ Times: ====== Time A to LU: 0.000000 s Time numfct : 9.328830 s Time malloc : 0.000039 s Time total : 9.328870 s total - sum: 0.000001 sSummary PARDISO: ( solve to solve ) ================ Times: ====== Time solve : 0.076216 s Time total : 0.367927 s total - sum: 0.291711 s So total execution time is sum of total times above?

yes, it should be the total execution time.

Thank you Gennady, once again, your support is priceless.I am back with another problem however :)http://software.intel.com/en-us/forums/showthread.php?t=73238&p=2#117309

Login to leave a comment.