Intel® Performance Tuning Utility (Archived)

vtssrun not profiling child processes


I am trying to run vtssrun with apache server where one master process spawns several child processes.
vtssrun does not seem to be collecting profile data forthe child processes and I can't figure out why.

This is how I am invoking vtssrun

vtssrun --

After the run completes, I see a lot of empty .vtss files.

vtssview -p
gives some results but very few samples and none from the child process.

I also tried attaching to one of the apache child processes with

vtssrun -a

PTU Error with OpenGL apps and new NVIDIA drivers

Running on Redhat 5 64-bit linux machines (tried on several different ones) with NVIDIA driver 185.18.14 or later, if I try to profile (vtssrun) any OpenGL app using ptu 3.2, I receive the following errors:

2714602317306492 : 2a95f18680 : CRITICAL : 'dlsym' for '_Exit' error Bad file descriptoropen log '(null)': Bad address
write log: Bad file descriptor
2714602317342532 : 2a95f18680 : CRITICAL : 'dlsym' for '_exit' faild. Error Bad file descriptoropen log '(null)': Bad address
write log: Bad file descriptor

PTU on cluster system

I'd like to profile on cluster system like below.

%>ssh -f host1 ptu_start.csh
%>ssh -f host2 ptu_start.csh
%>mpiexec -n 2 -host host1 host2
%>ssh -f host1 ptu_stop.csh
%>ssh -f host2 ptu_stop.csh

vtsarun ./exp -s -d 0

vtsarun ./exp --stop

But, I only got 1 output file(.tb5). It included the profile just on the node.
I expected the same number of files as the number of nodes were created.

Do you have any idea for profiling on multinode with PTU?


Intel Core i7 processor uncore event availability in PTU

Dear Performance Tuning Experts,

Is it possible tocollect sampling data for uncore performance events using PTU, such as, for example, UNC_QMC_NORMAL_READS.CH0 ?

If the answer it yes, how would one go about it in the PTU framework?

As far as I understand, PTU has the ability to count some uncore events using OFFCORE_RESPONSE_0.REQUEST.RESPONSE counter with appropriate REQUEST.RESPONSE encoding, but a lot of uncore counters do not fall into that category.

Thanks very much,

PTU and TBB, analyze scheduler performance

Hey, all.

I'm looking into different scheduler policies under the hood of TBB, and I'd like to use PTU to optimize the code I've (re)written within the TBB library calls. When I run a program designed to exercise them under PTU, I only see user-code represented in the Basic Sampling results, which doesn't give me any insight into what parts of the scheduler aren't performing. Is there a configuration somewhere I can change to profile that information as well?


vtdpview 64 bits segmentation fault

I have the following problem when the vtdpview load the sampling data:
18:05:05.546 Processing module 127 / 127 ( Segmentation fault

It seems the problem append when It analyses '' which is generated by my software. If I remove this library and relaunch vtdpview , there is no problem (vtdpview doesn't find this module anb bypass it ). is compiled from assembly code (with as GNU assembler 2.17) and link with gcc 4.0.2.

PTU loading data forever

I started sampling with:
vtssrun.exe ./experiment -a 1234

Then I performed the experiment that I wanted to profile by sending data to process 1234 before stopping with:
vtssrun.exe ./experiment --stop

I started eclipse and tried to import the experiment, but the progress window shows 'Loading data...' forever--well, at least for an hour before I killed it. The green dots are moving so PTU hasn't crashed but it never finishes loading the experiment data and never generates an error. How can I determine the problem?


Iscriversi a Intel® Performance Tuning Utility (Archived)