intel compiler in tesla machine

intel compiler in tesla machine

ahmediiit's picture

Hello sir.
I wanr to ask whether the intel fortran compiler for linux
can be installed on the nvidia tesla machine

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Gennady Fedorov (Intel)'s picture

Hello, Please look here a t the Intel Fortran Compiler Release Notes to find out the appropriate System Requirements.

Tim Prince's picture

As you must be aware, Fortran compilers for Tesla run on a host machine, and support off-loading of cuda library code to run on Tesla under syntax resembling OpenMP. There is no compiler which installs on Tesla, nor has any decision been made about an Intel compiler supporting Tesla. Intel Fortran could be installed on a host machine for Tesla, but would not utilize Tesla unless you made yourself an interface to cuda host tools.

ahmediiit's picture

Hello sir,

this means that if i install intel fortran compiler on the host machine it will not utilise
the multi cores of tesla.Is there any tool to make it compatible with cuda to use
mkl pardiso on the tesla?

presently i am using the IVF Compiler with mkl for solving linear equation(pardiso).
My system is intel xeon processor (e5520) with 8 cores.
I need to solve large sparsematrice around 50,00000 size matrice for many iteration.

the system is taking lot of time.
please give some suggestion how to increase the speed. or changing the processor.
any processor where pardiso isefficient?
Is there any other solver faster than pardiso?
or can we attach one more processor to the present system?
does pardiso works on the cluster?

Gennady Fedorov (Intel)'s picture

Hello Ahmed, quote:""I need to solve large sparsematrice around 50,00000 size matrice for many iteration." Do you mean the input matrices size is 5 000 000? What mode ( in-core, out-of-core, hybryd) of PARDISO are you using? What MKL version? --Gennady

ahmediiit's picture

hello sir

50,00000 is thesizeof matrix not nonzero elements.
nonzero is around 1500,00000.
mkl version 10.2.3.029.
going for incore
RAM is around 24GB

Gennady Fedorov (Intel)'s picture

Yes, this is very good size :). Are you sure you don't swapping the calculation? because of the input task size will requires ~2 Gb of RAM at least ( nnz * sizeof(double) + ja * sizeof(int) ~ 2 GB). Then, at the factorization stage may requires more then 10 times memory versus the original and therefore in this case you will have swap. Can you check it withiparm(18) -the solver will report the numbers of non-zero elements on the factors. --Gennady

ahmediiit's picture

Hello sir,
i am sending the output for size 12lac size ( 1200000) MATRIx
for 50 lac there is problem with my code,we will see it later,
now please tell me how to make this still faster.

in the start of the program the size of the pf usage is around 3.5 GB
and when it enters the pardiso subroutine it increases to 22.5 GB

nonzero= 15346680
solution start

== PARDISO is running in In-Core mode, because iparam(60)=0 ===

=============== PARDISO: solving a symmetric indef. system ===========

ummary PARDISO: ( reorder to reorder )
===============

imes:
=====
Time fulladj: 0.098160 s
Time reorder: 7.023050 s
Time symbfct: 8.933176 s
Time parlist: 0.275664 s
Time malloc : 0.825073 s
Time total : 19.331269 s total - sum: 2.176147 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %

=============== PARDISO: solving a symmetric indef. system ===========

ummary PARDISO: ( factorize to factorize )
===============

imes:
=====
Time A to LU: 0.000000 s
Time numfct : 776.535926 s
Time malloc : 0.048327 s
Time total : 776.586925 s total - sum: 0.002672 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471

gflop/s for the numerical factorization: 29.430468

=============== PARDISO: solving a symmetric indef. system ===========

ummary PARDISO: ( solve to solve )
===============

imes:
=====
Time solve : 9.663265 s
Time total : 29.857200 s total - sum: 20.193935 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471

gflop/s for the numerical factorization: 29.430468

solution end

Gennady Fedorov (Intel)'s picture

Could you please check the scalability of the solution by linking your application with the serial libraries. --Gennady

ahmediiit's picture

Hello sir
how to link with serial libraries

Gennady Fedorov (Intel)'s picture

please use the Linker Adviser to have the appropriate linking line.

Login to leave a comment.