Intel® Performance Tuning Utility

PTU and VTune with Remote Analysis

Hi, I have installed Vtune on 2 Linux computers (Debian Lenny, kernel 2.6.26-1-amd64), one with the Vtune Analyser only (PC1), the other with the Collectors only (PC2) to do remote analysis. On the PC1, I have installed the VTune plugin to my eclipse (Option 2 "Integrateto" in Eclipse Options menu). On the PC2, I have installed the driver for Vtune, the vtserver ... I can do remote analysis on the PC2 from the PC1, all it's OK for VTune ... Now, I would like install PTU : I would like integrate the PTU plugin too my eclipse on the PC1.

Cache-references and Cache-misses counters

I hope this is an appropriate place to post my question.

Using the linux /proc/mtrr i have configured all physical memory space to be uncachable.
I have then ran 'perf stat myapp' and looked on the cache counters for references and misses.

Since i have used mtrr to set all physical memory as uncachable, i was expecting the
cache misses to be 0 (zero), as uncachable memory should not be referencing the cache to begin with..

Not able to install Intel PTU driver

Hi,

My system is : Ubuntu 10.04 64 bits. If I build the PTU's vdk driver, it complains:

yuantang@Octave:~/tool_src/Intel_PTU/ptu32_001_lin_intel64/vdk/src$ sudo ./build-driver
[sudo] password for yuantang:

Options in brackets "[ ... ]" indicate default values
that will be used when only the ENTER key is pressed.

C compiler to use: [ /usr/bin/gcc ]

Make command to use: [ /usr/bin/make ]

Kernel source directory: [ /lib/modules/2.6.322.6.32-24-generic-beta/source ]

basic data access profiling using the load latency event

Hi,

I did some experiments with the load latency event of the Intel Nehalem (MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD). The machine I'm doing the experiments on has two Xeon E5520 processors. As I'm mostly interested in high latency DRAM accesses, I thought that by setting the threshold to a value larger than the latency of the on-core caches, I would mostly get samples with DRAM operations. To my surprise, the percentage of off-core samples doesn't substantially increase with large thresholds. The table below shows the results:

Intel Vtune Event CPU_CLK_UNHALTED.CORE

Here is a fundamental query about Vtune event CPU_CLK_UNHALTED.CORE . In case of multi-threaded applications running on multi-cores, how does Vtune count this event? For instance, if thread 1 of the multithreaded application runs for x cycles (unhalted) on core 1 and thread 2 runs for y cycles (unhalted) on core 2, what would be the value of CPU_CLK_UNHALTED.CORE ? Will it be (x+y) cycles?

Intel Vtune Event CPU_CLK_UNHALTED.CORE

Here is a fundamental query about Vtune event CPU_CLK_UNHALTED.CORE . In case of multi-threaded applications running on multi-cores, how does Vtune count this event? For instance, if thread 1 of the multithreaded application runs for x cycles (unhalted) on core 1 and thread 2 runs for y cycles (unhalted) on core 2, what would be the value of CPU_CLK_UNHALTED.CORE ? Will it be (x+y) cycles?

Intel Vtune Event CPU_CLK_UNHALTED.CORE

Here is a fundamental query about Vtune event CPU_CLK_UNHALTED.CORE . In case of multi-threaded applications running on multi-cores, how does Vtune count this event? For instance, if thread 1 of the multithreaded application runs for x cycles (unhalted) on core 1 and thread 2 runs for y cycles (unhalted) on core 2, what would be the value of CPU_CLK_UNHALTED.CORE ? Will it be (x+y) cycles?

Pagine

Iscriversi a Intel® Performance Tuning Utility