<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Wed, 25 Nov 2009 04:38:42 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/feed" rel="self" type="application/rss+xml" />
    <title>Intel Software Network - <![CDATA[ Intel® VTune™ Performance Analyzer ]]> feed</title>
    <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Bus transactions and cache miss</title>
      <description><![CDATA[ <div>
<div>Hi,</div>
<div><br /></div>
<div>I've been trying to understand how bus and memory performance counters actually work. My first question is regarding BUS_TRANS_BRD. According to the definition it counts the number of "burst read" transactions, including L1 data cache read misses, L2 hw prefetches and IFU misses. This is not clear enough. Does it count the L1 requests that are missing L2 as well? If that is the case, then why should a multi-threaded program running on two cores that are sharing a L2 cache produce no cache miss (L2_LINES_IN) but a lot of BUS_TRANS_BRDs:</div>
<div>L2_LINES_IN is about 1000,</div>
<div>BUS_TRANS_BRD is about 100 million.</div>
<div><br /></div>
<div>If this counter is only counting L1 data read misses, then why a single threaded program with small data (large enough not to fit in L1, about 100K) is not producing any BUS_TRANS_BRDs?</div>
<div><br /></div>
<div>My second question is about L2_LINES_IN and MEM_LOAD_RETIRED:L2_LINE_MISS. According to definitions L2_LINES_IN counts the number of allocated lines in L2 and L2_LINE_MISS counts the number of loads that missed L2. Based on these definitions L2_LINES_IN must always be greater than L2_LINE_MISS. Because whenever a load is missing the cache, a line must be allocated in L2 cache. However, for my multithreaded application, with two threads sharing a L2 cache, there are no cache misses (L2_LINES_IN) because the data is small and it fits L2, but there are plenty of MEM_LOAD_RETIRED:L2_LINE_MISSes (90 million). How can this be?</div>
<div><br /></div>
<div>The nature of my multithreaded program is to allocated a shared array (an integer array of length 25 (or 40K in another test)), then spawn two threads that try to swap the numbers in this shared array. Before the swapping they lock the data (spinlock, one lock per item in the array) and then they do the swapping.</div>
<div><br /></div>
<div>Thanks!</div>
</div> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70096/</link>
      <pubDate>Mon, 23 Nov 2009 23:27:40 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70096/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>performance counters for intel core i7</title>
      <description><![CDATA[ Hi,<br />Where can I find a good documentation with all the event names and descriptions for core i7 ? <br />http://www.intel.com/software/products/documentation/vlin/ - this document only has them for core 2 duo (and some others).<br /><br />thanks ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70088/</link>
      <pubDate>Mon, 23 Nov 2009 14:51:28 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70088/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>What&amp;#39;s goin wrong with this code?</title>
      <description><![CDATA[ Hi,<br /><br />This is the first time I use vTune, to tune a quite complex bit of C-code.<br />All it does is basically "Calculate x and add it to an unsined char, and clip it to 255", for a lot of pixels.<br />Because of the complex nature of the code, its hardly possible to optimize it :-/<br /><br />vTune tells me a lot of time is used for modifying the data itself:<br /><img src="http://93.83.133.214/vtune.jpg" /><br /><br />READ/WRITE are bacially pointer-access wrapper macros, clip255 is simple clipping method.<br /><br />Any ideas why so many cycles are spent here?<br /><br />Furthermore, is this really the assembler generated for the C code, or does vTune mix things up?<br />I am only able to read assembler a bit, but clip255 should generate at least some kind of conditional operation like cmov or a compare+jump, but I don't see something like this in the code.<br /><br />Thank you in advance, Clemens ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70071/</link>
      <pubDate>Sun, 22 Nov 2009 13:59:07 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70071/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>About VTune metrics</title>
      <description><![CDATA[ Hi,<br /><br />I have been using VTune for a while and I would appreciate some advice about the metrics I'm trying to measure (I'm using a processor from Harpertown family - Core microarchitecture):<br /><br />1) <span style="text-decoration: underline;">Stall time</span>: Processor's documentation states that it can issue/retire up to 4 instructions per cycle. Assuming that the ideal CPI in this case is 0.25, may I compute the relative stall time as (Measured_CPI-0.25)/Measured_CPI? E.g. Assuming that the measured CPI is 1.25, is it correct to say that the total stall time is 80% (1/1.25)?<br /><br />2) <span style="text-decoration: underline;">L2 miss penalty</span>: How correct/accurate is to compute stall time due to L2 misses as: L2_misses * avg_mem_latency? Btw, what is the most precise way to measure average memory latency? I've tried to use the counter "BUS_REQUEST_OUTSTANDING", as suggested in the Intel 64 and IA-32 Optimization Reference Manual, but the results using this counter do not make sense (in some cases, VTune reports BUS_REQUEST_OUTSTANDING events &gt; CPU_CLK_UNHALTED.CORE events)<br /><br />3) <span style="text-decoration: underline;">L2 cache miss rate</span>: I was wondering whether the builtin "L2 Cache Miss Rate" ratio afforded by VTune is inconsistent with what most of us consider as "miss rate" (number of misses in L2 divided by number of accesses in L2). Being "L2 Cache Miss Rate" computed  as L2_LINES_IN.SELF.ANY / INST_RETIRED.ANY, shouldn't it be called "miss per instruction"? Is it correct to compute L2 miss rate as:
<div>L2_RQSTS.SELF.ANY.I_STATE / L2_RQSTS.SELF.ANY.MESI ?</div> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70064/</link>
      <pubDate>Sat, 21 Nov 2009 18:19:02 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70064/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>call graph crash</title>
      <description><![CDATA[ I've tried VTune Call Graph feature recently and it failes to start any applications. I've found this:<br /> http://software.intel.com/en-us/articles/application-crashes-when-attempting-call-graph-profiling/ article but even after setting instrumentation level for msvcr80d.dll to minimal it still failes. The jit debugger won't start as described in article though. It also raises a warning related to ntdll "Instrumented module name must be identical to original modu..." (can't read further). <br /><br />I'm on windows 7 and I use Visual Studio 2005. <br /><br />Is this a known issue? Or I'm just failing on my end?<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70062/</link>
      <pubDate>Sat, 21 Nov 2009 14:25:23 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/70062/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>meet a crash when invoke &amp;#34;modules&amp;#34; and &amp;#34;java&amp;#34;  thread used 1365m and 100% CPU</title>
      <description><![CDATA[ <p>CPU is Intel(R) Xeon(R) <a href="mailto:E5335@2.00GHz">E5335@2.00GHz</a> *2 (8 cores)<br />OS is Fedora<br />Compiler is ICC 11<br />I want to analyze my app. My app can be analyzed by vTune at former machine, but  met a crash on this server.<br />The app runs ok on vtune,but when I invoke "modules", always have a crash, and I had to kill the vtune.<br />I used "top" and found java used 100% cpu and more than 1300M.<br />at first I think maybe Hyper-Threading problem,but E5335 hasn't Hyper-Threading.<br />can you give me a hand?<br />thx a lot</p> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69998/</link>
      <pubDate>Thu, 19 Nov 2009 06:35:34 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69998/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Support for more than 128 cores</title>
      <description><![CDATA[ <p>Release notes for VTune 9.1 has a NOTE requesting to contact Intel if the tool will be used on systems with more than 128 cores. Is there a configurable parameter in VTune that needs to be tuned to get support for systems with more than 128 cores? Also is there a maximum limit on the number of cores that will be supported?<br /><br />Thanks for your time</p> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69949/</link>
      <pubDate>Tue, 17 Nov 2009 10:33:06 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69949/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Support for RedHat Enterprise Linux 5.3</title>
      <description><![CDATA[ <p>I have recently upgraded my compute nodes to RHEL5.3 and I want to know if this is a supported distribution to use Intel VTune Performance Analyzer 9.1. Release notes for 9.1 mentions that RHEL5 is unofficially supported and the call graph feature doesn't work. Is it still the case?</p> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69948/</link>
      <pubDate>Tue, 17 Nov 2009 10:25:04 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69948/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Does VTune support OPENmpi based applications?</title>
      <description><![CDATA[ Since the release notes of Intel(r) VTune(TM) Performance Analyzer 9.1 doesn't talk about supporting applications written using OPENmpi 1.3 libraries on Linux, I want to know if it is officially supported.<br /><br />Thanks for your time. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69947/</link>
      <pubDate>Tue, 17 Nov 2009 10:20:53 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69947/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>OOO Bursts Stall</title>
      <description><![CDATA[ Hello All,<br />I was doing cycle accounting with the methodology given be David Levinthal  for core micro architecture.<br />I observe that my applications are showing 15 - 25 % of OOO Brust. As per the paper OOO is an execution unit stall.<br /><br />Can you please expain in more detil about OOO brust,  cause of it and how to minimize it?<br /><br />Thanking you,<br /><br />Regards,<br />Dny<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69911/</link>
      <pubDate>Mon, 16 Nov 2009 03:41:57 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-vtune-performance-analyzer/topic/69911/</guid>
      <category>ISN General</category>
    </item>
  </channel></rss>