https://software.intel.com/pt-br/forums/topic/342666/feed
pt-brQuote:Zhang Z (Intel) wrote:
https://software.intel.com/pt-br/comment/1717719#comment-1717719
<a id="comment-1717719"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p><strong class="quote-header">Citação:</strong><blockquote class="quote-msg quote-nest-1 odd"><div class="quote-author"><em class="placeholder">Zhang Z (Intel)</em> escreveu:</div>
<p><strong>Quote:</strong></p>
<blockquote><p><em>Jean-françois D.</em> wrote:</p>
<p><strong>Quote:</strong></p>
<blockquote><p><em>Zhang Z (Intel)</em> wrote:</p>
<p>Jean-François,</p>
<p>Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?</p>
<p>Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.</p>
</blockquote>
<p>No I don't, I found approximations of the theorical peak here for exemple : <a href="http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_PressBriefing.pdf">http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_P...</a></p>
<p>That is around 80 GFlops for my X5570,</p>
<p>for the estimation of the covariance matrix, it is pretty simple if I use zgemm its 8*N²*K floating operations, so can I compare this number (divided by the time i have to make the calculation) to the theorical peak to have an idea if this should run fast enough ? </p>
<p>When programming on GPUs, which i also use for bigger matrices, as far as i know, FMA i supported and thus i guess i can divide the 8*N²*M floating operations by 2 as one mul-add is made in one cycle.</p>
</p>
</blockquote>
<p>I cannot comment on GPU peak performance. But for Intel Xeon X5570, my calculation gives 94 GFlops theoretical peak performance for double precision floating-point operations and 188 GFlops for single precision floating-point operations. This is based on 2.93 GHz CPU frequency, 2 sockets, 4 cores per socket, and assumes all operations are vectorized. You can use this information to compute an upper limit of the speed for you operations.</p>
<p></blockquote></p>
<p>Thank you, it really helps ! Indeed, I got two X5570 clocked at 2.93 GHz !</p>
</div></div></div>Thu, 06 Dec 2012 10:39:15 +0000JeanFrancoiscomment 1717719 at https://software.intel.comQuote:Jean-françois D. wrote:
https://software.intel.com/pt-br/comment/1717631#comment-1717631
<a id="comment-1717631"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p><strong class="quote-header">Citação:</strong><blockquote class="quote-msg quote-nest-1 odd"><div class="quote-author"><em class="placeholder">Jean-françois D.</em> escreveu:</div>
<p><strong>Quote:</strong></p>
<blockquote><p><em>Zhang Z (Intel)</em> wrote:</p>
<p>Jean-François,</p>
<p>Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?</p>
<p>Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.</p>
</p>
</blockquote>
<p>No I don't, I found approximations of the theorical peak here for exemple : <a href="http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_PressBriefing.pdf">http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_P...</a></p>
<p>That is around 80 GFlops for my X5570,</p>
<p>for the estimation of the covariance matrix, it is pretty simple if I use zgemm its 8*N²*K floating operations, so can I compare this number (divided by the time i have to make the calculation) to the theorical peak to have an idea if this should run fast enough ? </p>
<p>When programming on GPUs, which i also use for bigger matrices, as far as i know, FMA i supported and thus i guess i can divide the 8*N²*M floating operations by 2 as one mul-add is made in one cycle.</p>
<p></blockquote></p>
<p>I cannot comment on GPU peak performance. But for Intel Xeon X5570, my calculation gives 94 GFlops theoretical peak performance for double precision floating-point operations and 188 GFlops for single precision floating-point operations. This is based on 2.93 GHz CPU frequency, 2 sockets, 4 cores per socket, and assumes all operations are vectorized. You can use this information to compute an upper limit of the speed for you operations.</p>
</div></div></div>Wed, 05 Dec 2012 19:22:25 +0000mad\zzhan68comment 1717631 at https://software.intel.comQuote:Zhang Z (Intel) wrote:
https://software.intel.com/pt-br/comment/1717353#comment-1717353
<a id="comment-1717353"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p><strong class="quote-header">Citação:</strong><blockquote class="quote-msg quote-nest-1 odd"><div class="quote-author"><em class="placeholder">Zhang Z (Intel)</em> escreveu:</div>
<p>Jean-François,</p>
<p>Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?</p>
<p>Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.</p>
<p></blockquote></p>
<p>No I don't, I found approximations of the theorical peak here for exemple : <a href="http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_PressBriefing.pdf">http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_P...</a></p>
<p>That is around 80 GFlops for my X5570,</p>
<p>for the estimation of the covariance matrix, it is pretty simple if I use zgemm its 8*N²*K floating operations, so can I compare this number (divided by the time i have to make the calculation) to the theorical peak to have an idea if this should run fast enough ? </p>
<p>When programming on GPUs, which i also use for bigger matrices, as far as i know, FMA i supported and thus i guess i can divide the 8*N²*M floating operations by 2 as one mul-add is made in one cycle.</p>
</div></div></div>Tue, 04 Dec 2012 08:52:00 +0000JeanFrancoiscomment 1717353 at https://software.intel.comComputational cost depends on
https://software.intel.com/pt-br/comment/1717350#comment-1717350
<a id="comment-1717350"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Computational cost depends on the method used and whether weights are required. Practical efficiency also significantly depends on data format: columns- or rows-major and actual task size. In the case of VSL_SS_METHOD_FAST, no weights and dimention << observations number, that will dominated by ssyrk/dsyrk matrix cost.</p>
</div></div></div>Tue, 04 Dec 2012 08:11:12 +0000mad\iburylovcomment 1717350 at https://software.intel.comJean-François,
https://software.intel.com/pt-br/comment/1717235#comment-1717235
<a id="comment-1717235"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Jean-François,</p>
<p>Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?</p>
<p>Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.</p>
</div></div></div>Mon, 03 Dec 2012 18:13:55 +0000mad\zzhan68comment 1717235 at https://software.intel.comThank you Ilya. I didnt know
https://software.intel.com/pt-br/comment/1716922#comment-1716922
<a id="comment-1716922"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Thank you Ilya. I didnt know these functions!</p>
<p>But I dont see any computational cost in the ref manuel for these functions !</p>
<p>Jean-François</p>
</div></div></div>Thu, 29 Nov 2012 13:46:11 +0000JeanFrancoiscomment 1716922 at https://software.intel.comHello Jean-François,
https://software.intel.com/pt-br/comment/1716921#comment-1716921
<a id="comment-1716921"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Hello Jean-François,</p>
<p>There is a dedicated functionality for covariance matrix estimation in Math Kernel Library. You can check Statistical Functions - Summary Statistics chapter in <a href="http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm">Reference Manual</a>. You can also check the following example: ./vslc/source/vslsbasicstats.c</p>
<p>Ilya</p>
</div></div></div>Thu, 29 Nov 2012 13:37:53 +0000mad\iburylovcomment 1716921 at https://software.intel.com