Intel® Performance Tuning Utility (Archived)

Unsuccessful with PTU 4.0 u3

I have been using PTU 3.2 with success on Nehalem EPs and switched to PTU 4.0 u3 so that I could profile Nehalem EXs. So far I've been unsuccessful.

The kernel is RHEL5 (Linux 2.6.18-128.el5) and the sep3 and pax drivers build and install. The "vtsarun -cl" command runs and lists the processor as I74. But when using the eclipse interface to run experiments, vtdpview cores and the last lines of the console are:
17:09:07.538 Processing module 33 / 45 (dgraph458357)
17:09:14.712 Processing module 33 / 45 (dgraph458357)

PTU and VTune with Remote Analysis

Hi,I have installed Vtune on 2 Linux computers (Debian Lenny, kernel 2.6.26-1-amd64), one with the Vtune Analyser only (PC1), the other with the Collectors only (PC2) to do remote analysis.On the PC1, I have installed the VTune plugin to my eclipse (Option 2 "Integrateto" in Eclipse Options menu).On the PC2, I have installed the driver for Vtune, the vtserver ...I can do remote analysis on the PC2 from the PC1, all it's OK for VTune ...Now, I would like install PTU : I would like integrate the PTU plugin too my eclipse on the PC1. And I would like install the PTU driver (and others binaries

Cache-references and Cache-misses counters

I hope this is an appropriate place to post my question.

Using the linux /proc/mtrr i have configured all physical memory space to be uncachable.
I have then ran 'perf stat myapp' and looked on the cache counters for references and misses.

Since i have used mtrr to set all physical memory as uncachable, i was expecting the
cache misses to be 0 (zero), as uncachable memory should not be referencing the cache to begin with..

Not able to install Intel PTU driver


My system is : Ubuntu 10.04 64 bits. If I build the PTU's vdk driver, it complains:

yuantang@Octave:~/tool_src/Intel_PTU/ptu32_001_lin_intel64/vdk/src$ sudo ./build-driver
[sudo] password for yuantang:

Options in brackets "[ ... ]" indicate default values
that will be used when only the ENTER key is pressed.

C compiler to use: [ /usr/bin/gcc ]

Make command to use: [ /usr/bin/make ]

Kernel source directory: [ /lib/modules/2.6.322.6.32-24-generic-beta/source ]

basic data access profiling using the load latency event


I did some experiments with the load latency event of the Intel Nehalem (MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD). The machine I'm doing the experiments on has two Xeon E5520 processors. As I'm mostly interested in high latency DRAM accesses, I thought that by setting the threshold to a value larger than the latency of the on-core caches, I would mostly get samples with DRAM operations. To my surprise, the percentage of off-core samples doesn't substantially increase with large thresholds. The table below shows the results:


Here is a fundamental query about Vtune event CPU_CLK_UNHALTED.CORE . In case of multi-threaded applications running on multi-cores, how does Vtune count this event? For instance, if thread 1 of the multithreaded application runs for x cycles (unhalted) on core 1 and thread 2 runs for y cycles (unhalted) on core 2, what would be the value of CPU_CLK_UNHALTED.CORE ? Will it be (x+y) cycles?

Iscriversi a Intel® Performance Tuning Utility (Archived)