Intel® Performance Tuning Utility (Archived)

Intel Vtune Event CPU_CLK_UNHALTED.CORE

Here is a fundamental query about Vtune event CPU_CLK_UNHALTED.CORE . In case of multi-threaded applications running on multi-cores, how does Vtune count this event? For instance, if thread 1 of the multithreaded application runs for x cycles (unhalted) on core 1 and thread 2 runs for y cycles (unhalted) on core 2, what would be the value of CPU_CLK_UNHALTED.CORE ? Will it be (x+y) cycles?

Intel Vtune Event CPU_CLK_UNHALTED.CORE

Here is a fundamental query about Vtune event CPU_CLK_UNHALTED.CORE . In case of multi-threaded applications running on multi-cores, how does Vtune count this event? For instance, if thread 1 of the multithreaded application runs for x cycles (unhalted) on core 1 and thread 2 runs for y cycles (unhalted) on core 2, what would be the value of CPU_CLK_UNHALTED.CORE ? Will it be (x+y) cycles?

vtune rdc not sending kernel modules

I am using vtune remote agent and vtune analyzer both running on linux.When vtune analyzer gets binaries from remote agent, it is not able to get the kernel modules.It gives a pop-up window with message "vtserver:root@172.31.0.210:e1000e" not found.Also "choose file from different location" is greyed out.What to do there. I do not want copy these kernel modules to vtune analyzer machine manually.

Issue while profiling a 32 bit application running on 64 bit OS

Hi,I am trying to profile a 32 bit application, which is running on a 64 bit RHEL4 on x86_64 (Xeon - Nehalem) using x86_64 vesrion of PTU 3.2 in statistical callgraph mode. I am facing with lot of warning messages, suggesting it is not able to load ibvtssagent.so.

vtdpview question

Hi, so I've just started working with the performance tuning utility (on a 64 bit redhat machine), and I'm trying to determine some of the cachingbehaviorsof my application. So far, I've run "vtsarun -d60 -em " and then I run vtdpview on the .tb5 file that was generated. When I do that, nothing really happens all that prints out is intel's copyright header, nothing more.Am I doing something wrong?Thanks for any help with this

Remote memory access per core

Hi,Before I ask the question, I've been going through the following document :http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdfQuestion : How can find the number of remote memory accesses per core ? As far as I can see, I can get the following :-- L2 misses / core-- Local L3 misses [uncore]-- QPI requests [uncore]However, this does not tell me which of the cores is actually issuing the QPI request.

PTU, How do I select the predefined event ratio?

Hi, Experts,
I can see some pre-defined configuration in PTU. But I see from .vtr file that there are far more event ratios I could use. I know one way is to specify those related events manually. But how can I save those efforts to just select for my specific CPU?
And I know there would be large penalty for lots of events in VTune. Would it still be a problem for PTU?
Thanks.

Intel Xeon X5365 8-core Processor PAPI performance analysis anomaly

Hi,

I am a student working on analysis of libraries for their performance using hardware counters.

I am using Intel Xeon X5365 8-core processor. I am reading the hardware counter values using PAPI code written in C.

For simple code initializing 8192 integers in order to fill up the L1 cache 32 kB (64 byte line size, so 16 integers * 512 cache lines = 8192 integers).

The number of L1 cache misses after initialization measured using PAPI are 93. Assuming that hardware prefetching causes the compulsory misses to reduce from 512 to 93, this may be correct.

Assine o Intel® Performance Tuning Utility (Archived)