Recent posts
https://software.intel.com/en-us/recent/251059
enMany Fortran code! How to call Fortran code from C/C++? vice versa
https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/276769
<p> Many Fortran code! How to call Fortran code from C/C++?<br />
As a C/C++ programmer, there are a few reasons to use Fortran:</p>
<p>(1) Fortran is very similar to Matlab and easy to port;<br />
(2) Fortran has support of complex numbers and vectorized numbers and<br />
the operations in Fortran are naturally element-wise, operating on a<br />
whole vector.<br />
(3) There are many scientific codes are in Fortran.</p>
<p>-------------</p>
<p>So how do I call Fortran program from my C program?</p>
<p>I am using MSVS. NET 2003, VisualC++ and Intel C++ therein, and also<br />
the Intel Visual Fortran therein.</p>
<p>To give an example, I want to call a Fortran program from C++, which<br />
is in its bare form, it is a function that evaluates something and<br />
pass the results back; I also want to call C++ from Fortran, one<br />
example is that many good numerical integral codes are in Fortran, but<br />
I want to provide my integrand function in C++.</p>
<p>How to do these interfaces? Pointers and readings are appreciated!<br />
Thanks for your help! </p>
Thu, 23 Aug 07 19:09:25 -0700losemind276769Wrappers for MKL Vector Math Library?
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/304716
<p>Hi all,</p>
<p>I want to use the Intel MKL VML, but there is only support for vectorized functions, no support for common vector operations, such as dot-product, scalar-vector product, vector-vector element-wise product/division, double-scalar-complex-vector product, etc. </p>
<p>I just felt it is highly unnatural to program using a lot of loops myself to do the above operations. For example, in preparation of using the vectorized functions(complex), my arguments have to be prepared, and I start from real constants, and scalars, and after several operations, they are then ready to feed into the vectorized functions. But these programming procedures are highly unnatural and much like assembly luaguage type... </p>
<p>Any wrappers that can facilitate the vectorized operations in addition to the vector functions?</p>
<p>Thanks!</p>
Tue, 21 Aug 07 18:02:30 -0700losemind304716Which Intel Mobile Processor is best for floating point computational use?
https://software.intel.com/en-us/forums/intel-c-compiler/topic/304739
<p>Hi all,</p>
<p>My laptop is broken and I am seriously considering purchasing a new laptop. I mainly use the laptop to do all kinds of programming and scientific/numerical computations, including Matlab, C/C++, Fortran, Maple, Mathematica, etc. I am wondering if anybody can tell some experiences about speed performance of Maple on Core Duo Dual Core processors? Are the computations on these computers really faster? </p>
<p>I recall a few months ago I had tested a Monte Carlo program in Matlab on my friend's new Intel Quad Core (4 cpus). He paid $6000 for a desktop like that. The speed was not improved at all, compared to a single CPU.</p>
<p>I am wondering which mobile processor shall I buy for my specific applications? Does it support SSE3, and native support for complex-numbers, vectorized computations, and many other advanced features for floating-point and scientific computation?</p>
<p>Thanks a lot!</p>
Fri, 17 Aug 07 20:24:33 -0700losemind304739I cannot seem to speed-up further -- could you please criticize my code?
https://software.intel.com/en-us/forums/intel-c-compiler/topic/304777
<p>Hi, </p>
<p>I tried with the following compiler options ... And then I spent a few dyas on tweaking the code for further speedup. No matter how I trialed and experimented the settings, I couldn't get further speedup. I decided to jump out of my old paradigm and to learn how other people would do the programming with high performance. Could anybody please feel free to criticize my code and give me some suggestions? The more you criticize, the more I can learn. Thank you very much for your help!</p>
<p>It's very confusing, for example, see the following quote from the build log. Is the loop parallelized or not at the end of the day? I have two Xeon 2.4GHz CPU supporting SSE2. Thanks!</p>
<p>. ry2.cpp(100): (col. 2) remark: loop was not parallelized: insufficient computational work.<br />
. ry2.cpp(108): (col. 2) remark: LOOP WAS AUTO-PARALLELIZED.<br />
. ry2.cpp(110): (col. 3) remark: loop was not parallelized: insufficient inner loop.</p>
<p>----------------------------------------</p>
<p>#include <br />#include "imsl.h"<br />#include "mex.h"</p>
<p>#define MaxNumEqns 1000000</p>
<p>//Intel C++ compiler options: /O3 /Qopt-report /Qopt-report-phase hlo /QxW /Qopt-report3 /Qparallel /Qpar-report3</p>
<p>#pragma comment(linker,"/NODEFAULTLIB:libc.lib")</p>
<p>static double kappa, theta, sigma2, f, v, abserr, x0, T0;<br />static double *pps, *pys, *pV, *pT;<br />static int nps, nys, nV, nT;<br />static double *res_Re, *res_Im;<br />static double y[MaxNumEqns];</p>
<p>void fcn (int neq, double t, double y[], double yprime[])<br />{<br /> //y: 0 -- alpha_real, <br /> // 1 -- alpha_imag,<br /> // 2 -- beta_real,<br /> // 3 -- beta_imag.</p>
<p> for (int i=0; i {<br /> double sI=0, sR=0;</p>
<p> for (int k=0; k {<br /> double tmp=1+y[i*4+2]*(-pys[k]);<br /> double tmp2=tmp*tmp;<br /> double tmp3=y[i*4+3]*(-pys[k]);<br /> double tmp4=tmp3*tmp3;<br /> sR=sR+pps[k]*tmp/(tmp2+tmp4);<br /> sI=sI+pps[k]*y[i*4+3]*(-pys[k])/(tmp2+tmp4);<br /> }</p>
<p> yprime[i*4+0] = kappa*theta*y[i*4+2]-f+f*sR;<br /> yprime[i*4+1] = kappa*theta*y[i*4+3]-f*sI;<br /> yprime[i*4+2] = -kappa*y[i*4+2]+0.5*sigma2*(y[i*4+2]*y[i*4+2]-y[i*4+3]*y[i*4+3]);<br /> yprime[i*4+3] = pV[i] - kappa*y[i*4+3]+sigma2*y[i*4+2]*y[i*4+3];</p>
<p> }<br />}</p>
<p>void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray*prhs[] ) <br />{ <br /> //v, T, kappa, theta, sigma2, f, ys, ps, x0, T0, abserr</p>
<p> //v and T can be vectors, ys and ps are vectors.</p>
<p> kappa=*(mxGetPr(prhs[2])); <br /> theta=*(mxGetPr(prhs[3])); <br /> sigma2=*(mxGetPr(prhs[4])); <br /> f=*<br />
(mxGetPr(prhs[5]));</p>
<p> nV=mxGetNumberOfElements(prhs[0]); //Number of "v"'s</p>
<p> pV=mxGetPr(prhs[0]);</p>
<p> nT=mxGetNumberOfElements(prhs[1]); //Number of "T"'s</p>
<p> pT=mxGetPr(prhs[1]);</p>
<p> nps=mxGetNumberOfElements(prhs[7]); //Number of "p"'s</p>
<p> pps=mxGetPr(prhs[7]);</p>
<p> nys=mxGetNumberOfElements(prhs[6]); //Number of "y"'s<br /> //08/10/2007: Note: due to a typo, the y's have to be negated before passing into this function. <br /> //08/11/2007: Note: the above restriction has been removed and the negation is no longer needed.</p>
<p> pys=mxGetPr(prhs[6]);</p>
<p> x0=*(mxGetPr(prhs[8]));</p>
<p> T0=*(mxGetPr(prhs[9]));</p>
<p> abserr=*(mxGetPr(prhs[10]));</p>
<p> plhs[0]=mxCreateDoubleMatrix(nV, nT, mxCOMPLEX); //The solutions of A, "v" is the row, "T" is the col.</p>
<p> plhs[1]=mxCreateDoubleMatrix(nV, nT, mxCOMPLEX); //The solutions of B, "v" is the row, "T" is the col.</p>
<p> res_Re=mxGetPr(plhs[0]); //These are the outputs to be passed back to Matlab</p>
<p> res_Im=mxGetPi(plhs[0]);</p>
<p> int nstep;<br /> char *state;</p>
<p> double t = 0.0; /* Initial time */</p>
<p> for (int i=0; i<br /> imsl_d_ode_runge_kutta_mgr(IMSL_ODE_INITIALIZE, &state, IMSL_TOL, abserr, 0);</p>
<p> for (int j=0; j<br /> imsl_d_ode_runge_kutta_mgr(IMSL_ODE_RESET, &state, 0);</p>
<p> for (int j=0; j {<br /> for (int i=0; i {<br /> res_Re[j*nV+i]=exp(y[i*4+0]+y[i*4+2]*x0)*cos(y[i*4+1]+y[i*4+3]*x0+T0*pV[i]);<br /> res_Im[j*nV+i]=exp(y[i*4+0]+y[i*4+2]*x0)*sin(y[i*4+1]+y[i*4+3]*x0+T0*pV[i]);<br /> }<br /> }</p>
<p> return;<br />}</p>
<p>Build log:<br />---------------------------------------</p>
<p>------ Rebuild All started: Project: try2, Configuration: Release Win32 ------</p>
<p>Deleting intermediate files and output files for project 'try2', configuration 'Release|Win32'.<br />Compiling with Intel C++ 10.0.025 [IA-32]... (Intel C++ Environment)<br />icl: command line warning #10120: overriding '/O2' with '/O3'<br />icl: command line warning #10121: overriding '/Qvc8' with '/Qvc7.1'<br />stdafx.cpp<br />DllMain.cpp<br /> procedure: DllMain<br /> procedure: DllMain<br />try2.cpp<br />. ry2.cpp(95): warning #177: variable "nstep" was declared but never referenced<br /> int nstep;<br /> ^</p>
<p>. ry2.cpp(12): warning #177: variable "v" was declared but never referenced<br /> static double kappa, theta, sigma2, f, v, abserr, x0, T0;<br />&nb<br />
sp; ^<br /> procedure: mexFunction<br /> procedure: mexFunction</p>
<p>HLO REPORT LOG OPENED ON Sat Aug 11 16:45:23 2007</p>
<p><. ry2.cpp;-1:-1;hlo;_mexFunction;0><br />High Level Optimizer Report (_mexFunction)<br />Adjacent Loops: 3 at line 100<br />Unknown loop at line #104<br />QLOOPS 4/4 ENODE LOOPS 4 unknown 1 multi_exit_do 0 do 3 linear_do 3<br />LINEAR HLO EXPRESSIONS: 38 / 61 <br />------------------------------------------------------------------------------</p>
<p>. ry2.cpp(100): (col. 2) remark: loop was not parallelized: insufficient computational work.<br />. ry2.cpp(108): (col. 2) remark: LOOP WAS AUTO-PARALLELIZED.<br />. ry2.cpp(110): (col. 3) remark: loop was not parallelized: insufficient inner loop.<br />. ry2.cpp(100): (col. 2) remark: LOOP WAS VECTORIZED.<br />. ry2.cpp(110): (col. 3) remark: LOOP WAS VECTORIZED.<br />. ry2.cpp(110): (col. 3) remark: LOOP WAS VECTORIZED<br /><. ry2.cpp;104:104;hlo_scalar_replacement;in _mexFunction;0><br />#of Array Refs Scalar Replaced in _mexFunction at line 104=1</p>
<p><. ry2.cpp;108:108;hlo_linear_trans;_mexFunction;0><br />Loop Interchange not done due to: User Function Inside Loop Nest<br />Advice: Loop Interchange, if possible, might help Loopnest at lines: 108 110 <br /> : Suggested Permutation: (1 2 ) --> ( 2 1 )<br /> procedure: fcn<br /> procedure: fcn</p>
<p><. ry2.cpp;-1:-1;hlo;?fcn@@YAXHNQAN0@Z;0><br />High Level Optimizer Report (?fcn@@YAXHNQAN0@Z)<br />QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2<br />LINEAR HLO EXPRESSIONS: 69 / 101 <br />------------------------------------------------------------------------------<br />.<br />. ry2.cpp(26): (col. 2) remark: loop was not parallelized: existence of parallel dependence.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 43, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 43, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 43, and yprime line 41.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence<br />
between yprime line 40, and (unknown) line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between (unknown) line 32, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between y line 32, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and (unknown) line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between (unknown) line 34, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between y line 34, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and (unknown) line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between (unknown) line 37, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between y line 37, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 40, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 40, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 40, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 40, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and y line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 40, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and y line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 41, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI depende<br />
nce between y line 42, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 42, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 43, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 43, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven ANTI dependence between y line 43, and yprime line 40.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between (unknown) line 37, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and (unknown) line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between (unknown) line 37, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and (unknown) line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between (unknown) line 37, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and (unknown) line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between (unknown) line 37, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and (unknown) line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between y line 37, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between y line 37, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between y line 37, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and y line 37.<br />. ry2.cpp(37): (col. 4) remark: parallel dependence: proven ANTI dependence between y line 37, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and y line 37.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between (unknown) line 34, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and (unknown) line 34.<br />. ry2.cpp(34): (col. 27) remark: paralle<br />
l dependence: proven ANTI dependence between (unknown) line 34, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and (unknown) line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between (unknown) line 34, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and (unknown) line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between (unknown) line 34, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and (unknown) line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between y line 34, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between y line 34, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between y line 34, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and y line 34.<br />. ry2.cpp(34): (col. 27) remark: parallel dependence: proven ANTI dependence between y line 34, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and y line 34.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between (unknown) line 32, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and (unknown) line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between (unknown) line 32, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and (unknown) line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between (unknown) line 32, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and (unknown) line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between (unknown) line 32, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and (unknown) line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between y line 32, and yprime line 40.<br />. ry2.cpp(40): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 40, and y line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between y line 32, and yprime line 41.<br />. ry2.cpp(41): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 41, and y line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence between y line 32, and yprime line 42.<br />. ry2.cpp(42): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 42, and y line 32.<br />. ry2.cpp(32): (col. 28) remark: parallel dependence: proven ANTI dependence betwe<br />
en y line 32, and yprime line 43.<br />. ry2.cpp(43): (col. 3) remark: parallel dependence: proven FLOW dependence between yprime line 43, and y line 32.<br />. ry2.cpp(30): (col. 3) remark: loop was not parallelized: insufficient computational work.<br />. ry2.cpp(30): (col. 3) remark: LOOP WAS VECTORIZED</p>
<p><. ry2.cpp;26:26;hlo_linear_trans;?fcn@@YAXHNQAN0@Z;0><br />Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due to other Compiler Transformations)<br />Advice: Loop Interchange, if possible, might help Loopnest at lines: 26 30 <br /> : Suggested Permutation: (1 2 ) --> ( 2 1 )</p>
<p>...</p>
<p>---------------------- Done ----------------------</p>
<p> Rebuild All: 1 succeeded, 0 failed, 0 skipped</p>
<p></p>
Sat, 11 Aug 07 13:48:07 -0700losemind304777Is there an easy way to make my Fortran program parallel?
https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/276710
<p>Hi Folks,</p>
<p>Here is a numerical integration program:</p>
<p><a href="http://people.scs.fsu.edu/~burkardt/f_src/quadpack/quadpack.f90">http://people.scs.fsu.edu/~burkardt/f_src/quadpack/quadpack.f90</a></p>
<p>In evaluating the integrand functions, it is good to do two function evaluations at the same time simultaneously.</p>
<p>I just found that I have two CPUs Pentium Xeon 2.4GHz, supporing only up to SSE2.</p>
<p>Although the CPUs are slow, but if I can utilize them to do parallel computing of the numerical integration. It would be great! </p>
<p>But I am not a Fortran expert, nor am I a CS engineer...</p>
<p>Could anybody give some easy-to-follow advice on how to make that code parallel on two CPUs?</p>
<p>Thanks a lot!</p>
Tue, 31 Jul 07 16:01:48 -0700losemind276710No matter how I adjust the optimization settings, it doesn't speed up at all.
https://software.intel.com/en-us/forums/intel-c-compiler/topic/304864
<p>Hi all,</p>
<p>No matter how I adjust the optimization setting, I was unable to see any speed improvments. I am optimizing for ultimate speed. My PC I am working on is a Intel Xeon 2.4GHz (I tested it only supports up to SSE2). I am doing a numerically intensive numerical integration program. What can I do?</p>
<p>Here is the original command line from MS VS2003.NET's C++ property panel in the project properties menu:</p>
<p>/c /O2 /I "C:Program FilesMATLABR2007aexternincludewin32" /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_USRDLL" /D "_MBCS" /D "try2_EXPORTS" /D "_WINDLL" /FD /EHsc /MT /Fo"Release/" /W3 /nologo /Wp64 /Zi /Gd /O3 /Ot /Og /QaxN /QParallel /QxN</p>
<p>----------</p>
<p>I've overriden it with:</p>
<p>/O3 /Ot /Og /QaxN /QParallel /QxN</p>
<p>And also tried </p>
<p>/QaxP /QParallel /QxP,</p>
<p>--------------</p>
<p>But with no speedup at all.</p>
<p>I am also looking for some good cookbook/reference about speed optimization using Intel C++ compiler, but I couldn't find good optimization cookbook for this version 10.1. In our applications, we want ultimate speed for numerically intensive computations -- mostly numerical integration. </p>
<p>I am not sure how to perform the PGO and other high level speed optimization techniques...</p>
<p>Thanks for your help!</p>
<p></p>
Mon, 30 Jul 07 10:30:43 -0700losemind304864Help! How to use Intel Vtune to profile a MEX DLL in Matlab?
https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/304873
<p>Help! How to use Intel Vtune to profile a MEX DLL in Matlab?</p>
<p>Hi all,</p>
<p>I am using Intel Vtune to profile a mex dll in Matlab:</p>
<p>but I got the following error after I set up the Vtune side and when I tried to launch the dll inside Matlab:</p>
<p>----------</p>
<p>??? Invalid MEX-file 'C:MyProjects ry2.dll': Invalid access to memory location.</p>
<p>----------</p>
<p>I believe this is a side-effect after Intel VTune instrumented my try2.dll. This didn't not happen before I set up the Intel VTune, i.e., try2.dll was running all successfully previously. </p>
<p>What can I do now? There must be some experts here who had played with Intel Vtune along with Matlab mex dll. </p>
<p>Thanks a lot!</p>
Sun, 29 Jul 07 15:06:30 -0700losemind304873C/C++ speed optimization bible/resources/pointers needed!
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/304884
<p>C/C++ speed optimization bible/resources/pointers needed!</p>
<p>Hi<br />
all,</p>
<p>I am in the middle of programming to solve an engineering<br />
problem<br />where the speed is huge concern. The project involving lots<br />
of<br />numerical integration and then there are several loops/levels<br />
of<br />optimization on top of the function evaluation engine. As you<br />
probably<br />know, the key to a successful optimization is a fast<br />
underlying<br />objective function evaluator. The faster it is, the more promising<br />
the<br />optimization result(perhaps global optimal). However our<br />
project<br />requires many numerical integrations which prohibits us from making<br />
it<br />super fast. At the heart of the numerical integration is a<br />
smart<br />integrator and a super-fast integrand function evaluator. Even<br />
worse,<br />our function evaluation is in complex-domain. So the kay point is<br />
how<br />to arrange our C/C++ code to make it highly efficient in every<br />
aspect.<br />Could anybody give some advice/pointers on how to improve the speed<br />
of<br />C/C++ program? How to arrange code? How to make it highly efficient<br />and<br />
super fast? What options do I have if I don't have luxury to<br />
use<br />multi-threaded, multi-core or distributed computing? But I do have<br />
a<br />P4 at least. Please recommend some good bibles and resources!<br />
What can Intel's libraries do for me specifically for my problems? Thank<br />you!</p>
<p></p>
Thu, 26 Jul 07 22:09:11 -0700losemind304884seeking highly efficient caches scheme for demanding engineering computing?
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/304885
<p>seeking highly efficient caches scheme for demanding engineering<br />
computing?</p>
<p>HI all,</p>
<p>To same the time of costly function evaluation,<br />
I want to explore the <br />possibility of caching.</p>
<p>Suppose in millions of<br />
calls to the function evaluation, there are some <br />computations, in which only<br />
a portion of the parameters change, others <br />remain the same as the parameters<br />
that are used in some other function <br />evaluations some time before. To save<br />
time, I can group the varying <br />parameters into sub-groups, and cache the<br />
results for later use. This needs <br />a highly efficient cache and a efficient<br />
organzation and look-up. Also, the <br />decision of whether a group of parameters<br />
has been seen before should be <br />based on data trunk in binary form, instead<br />
of decimal formats, so I can <br />compare memory data trunks directly using<br />
memory comparison.</p>
<p>Does anybody have good suggestions and pointers on<br />
this approach? What can Intel's libraries help me do?</p>
<p>Thanks!</p>
Thu, 26 Jul 07 22:07:55 -0700losemind304885Matlab and ICC 9.1: how to compile a c program into MEX using Intel Compiler 9.1?
https://software.intel.com/en-us/forums/intel-c-compiler/topic/306021
<p>how to compile a c program into Matlab MEX using Intel Compiler 9.1?<br />
I searched a lot on the google but did not figure out a good solution. Could anybody give me some pointers to a working solution? Thanks a lot</p>
Sun, 07 Jan 07 23:07:02 -0800losemind306021