Recent posts
https://software.intel.com/en-us/recent/242867
en2nd-level cache misses
https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/306318
<p>I am analysing my program to findthe biggest bottleneck.</p>
<p>When sampling with V-Tune for2nd-level cache load misses I see a certain number that I want to relate to clock cycles to see how much time the processor is waiting for data. Is there anyway to find out how much penalty cycles should be taken for such a lvl2 cache miss ?</p>
<p>I am using a Pentium4 with 512 kbyte lvl2 cache, 512 Mbyte of DDR memory. </p>
</p>
Fri, 20 Oct 06 12:57:43 -0700w.a.wiggers@student.utwente.nl306318Comparison Pentium 4 and Woodcrest
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/306334
<p>For our research we want to compare the Intel MKL sparse matrix vector multiplication "<em>mkl_dcsrmv</em>"on a Pentium 4 2.4 Ghz and one core of the Woodcrest processor. </p>
<p>I read in the documentation that MKL uses OpenMP for threading, however even I set OMP_NUM_THREADS to 1 and MKL_SERIAL=yes I still see in V-Tune that this function will spread to multiple logical processors on the Woodcrest processor. </p>
<p>The function"<em>mkl_dcsrmv</em>" translates for the Pentium 4to "<em>mkl_spblas_p4_dcsrmmsysm</em>" and for the Woodcrestto "<em>L__mlk_spblas_p4m_dcsrmmsym_304__par_loop0</em>" </p>
<p>Even though this last function is called in a single thread, why does it spread out to multiple processors ? Can I deactivate this ? </p>
</p>
Wed, 18 Oct 06 14:59:55 -0700w.a.wiggers@student.utwente.nl306334Threading
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/306370
<p>I am a computer science student and I am intested to see the<br />
performance speed up and scalability of Intel MKL for our algorithm. We<br />
are using Intels woodcrest processor with corresponding bensley<br />
platform, thus having 4 cores available.</p>
<p>The algorithm we are looking at is CG , it needs to solve Ax = b , for<br />
500 b's . An obvious way to parallelize over multiple cores is letting<br />
each core solve one Ax=b. I've seen that a CG framework is provided by<br />
MKL and that we just need to fill in the operations.</p>
<p>Probably the most important operation in CG is the sparse matrix<br />
vector multiplication. I've read in the Intel MKL documentation that<br />
sparse blas lvl2 also uses OpenMP for threading and I am thus wondering<br />
how this is implemented. Does this function spread the matrix over<br />
different cores and then mtultiply it rowwise? I know it is possible to<br />
thread in the way I want with OpenMP but I am interested to know how<br />
Intel did this.</p>
<p></p>
Tue, 10 Oct 06 21:24:08 -0700w.a.wiggers@student.utwente.nl306370