Intel® Performance Tuning Utility

Overhead of Last Branch Record

Hi folks,

a) Can you give a rough idea of how much LBR slows down program execution of common programs - both CPU and IO intensive ? I could not see any substantial overhead (~2-5%) with LBR turned ON for some microbenchmarks.

b) Is branch prediction mechanism turned OFF when LBR tracing is ON ?

Thanks !

Classifying D-TLB misses using Data access profiling

Hello, I need to estimate amount of DTLB misses that are to heap region versus those due to accesses to stack region. In the user guide of the PTU the Data Access Profiling section (2.5) mentions that PTU can figure out linear address of an memory operand for an event. If linear address of an event can be found then it seems possible to figure out whether the access goes to dynamically allocated region or not. However, I am not really clear how can I use this facilty of PTU to calssify the DTLB misses as mentioned above. Can any body provide me some leads on this? Thanks Arka

What is supposed to happen when you start the auto-tuner?

So I've been having trouble OCing this new machine, its a p67 intel board with a i5 2500k. Changing the settings manually in bios seems to have no effect, so I'm trying the auto tuner. After starting it however, all that happens is the pc reboots, fans churn for a second, it beeps once, and starts over. I thought maybe it was part of the process, but it has been doing that for about 2 hours now, with no change. Doesn't seem like that should be normal...

Statistics About QPI

Hi there, I am working on a project related to QPI. We need to collect some statistics. There are two CPUs (CPU A and B) connecting to each other with a QPI. Each CPU has direct accesses to a RAM, a SSD and a Niantic. It is possible that CPU A wants to access RAM B which connects to CPU B. The data path is: CPU A => CPU B (through QPI) => RAM B. The statistics we need is: Time{CPU A access RAM B} / Time{CPU A access RAM A}.

Heap profiler with TBB malloc

I'm using tbbmalloc_proxy.dll to replace the malloc in my application and I'd like to do some heap profiling with PTU. However, I'm having a problem: A:Source\\pin\\pin\\image.cpp:LEVEL_PINCLIENT::RTN_Size:1155: assertion failed: end > RTN_Address(rtn) NO STACK TRACE AVAILABLE @CHARM-VERSION: $Id: version.cpp 18313 2008-03-30 23:51:30Z hgpatil $ @CHARM-BUILDER: BUILDER @CHARM-COMPILER: MS-cl 1400 @CHARM-TARGET: ia32e @CHARM-CFLAGS: __OPTIMIZE__=__OPTIMIZE__ __NO_INLINE__=__NO_INLINE__

"Show Utility Chart" - Explanation

Hello,

Regarding the charts available from "Show Utility Chart" in Memory Hotspots view:

- the Access Stride Distribution

- the Working Set

- the Array of Structures
Distribution

I could not find their description in the User Guide, so would appreciate if you could let me know what the X and Y axes represent (and if by default they use data from the whole experiment as it seems), and the purpose of each chart.

Thanks

Gabriele

Data Access and Latency Histogram Pane

Hello,

I have two questions about the Data Access and Latency Histogram Pane:

1. How can I change the bin size?

2. Page 76 of the User Guide says that "The chart on the right shows a number of references
occurred with a specific latency value. This chart is empty if no events providing
precise latency information are collected. Such events are present on the Intel
Itanium processors family only." However, my chart is not empty even though I am running on Nehalem.
Could you please explain why?

Thanks
Gabriele

Pages

Subscribe to Intel® Performance Tuning Utility