Better performance while profiling with Vtune XE 2013

Better performance while profiling with Vtune XE 2013

imagem de Francis J.

Hi,

I have a test application that essentially writes log files.
Each minute if the file is bigger than 10MB the application starts a new file.

When I use Visual Studio 2010, I run my application without debugging in Release x64 mode and I get about 40-50MB by minute.
I got the same by running the executable.

Using the same mode and same build while running a "Basic Hotspots analysis" I can get up to 380Mo/min.

No particular software is running during the test (CPU or HDD "burner").
I've ran the test several times and the result is always the same.

In other words, I get 6-7 times the writing speed while profiling!

Is there any possible reasons why it's faster with the Basic Hotspots analysis?

Is there anything that Vtune disables or enables to get faster writing speed?

Did I miss something ?

Here's my PC Specs :
Windows Professional 64bit SP1
Intel Xeon E5462 @ 2.80GHz (2 processors)
12 GB of Ram
Raid 5

We've got the latest version of Vtune according to the Intel Software Manager
The GlobalFlag is disable (set to zero) in each case. We had problem profiling with the GlobalFlag before.

Any suggestions would be greatly appreciated.

Thanks for your help,

Francis

8 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de iliyapolak

At the beggining I would try to monitor disk I/O with Xperf.Could that be coincidental that I/O speed increases when VTune is analyzing the system?

imagem de Francis J.

Okay so I did that. It seems that the I/O is use less often with the VTune analysis enabled.

Is there a way that VTune gain performance by processing the output with bigger buffer (so there's no need to flush them so often)?
Or is there a priority boost that is use to speed up the process like :
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684828(v=vs.85).aspx

Thanks for your feedback

Francis

imagem de iliyapolak

Do you measure total per system disk I/O or only your application?

If you are measuring the total speed in Mbytes/sec you can see in Xperf nice graphical breakdown of various threads issueing I/O commands.

imagem de Francis J.

I measured the total for the system, and has I said :

 It seems that the I/O is use less often with the VTune analysis enabled.

I don't directly get the speed, but my files are created every minute so I get the average writing speed by minute (by the file size). 

Is there a way that the Basic Hotspots Analysis in VTune do something to improve the writing speed? Something like:

- Buffering the disk I/O 

- Using a Thread Priority Boost (using the I/O, interface or keyboard) such as: http://msdn.microsoft.com/en-us/library/windows/desktop/ms684828

- Setting the DeviceIOControl like : http://msdn.microsoft.com/en-us/library/windows/desktop/aa363216

- Other things that I don't think of...

Thanks for your help,

Francis

imagem de iliyapolak

You can investigate the VTune disk I/O with Xperf and subtract it from the total system disk writing speed.

>>>Is there a way that the Basic Hotspots Analysis in VTune do something to improve the writing speed? Something like:>>>

Sorry, but I do not really know how the VTune is able to improve the writing speed.And I still suppose that you mean additional writes to disk caused by VTune analysis.

Vtune relies on clock interrupt which masks DIRQL(device interrupt request level) which are serviced by disk.sys driver so in case of occurence of clock interrupt outstanding writes to disk will be blocked untill DIRQL level will be decreased so I suppose that in this case you can see dcreased writing speed .

imagem de Francis J.

Considering that VTune writes big profiling files too (other than the log files of the test), is there some kind synchronisation wrapping all the disk write accesses ?

Otherwise, is there a way that VTune process keep a high priority level, making all this a bit faster ?

Thanks again,

Francis

imagem de iliyapolak

Beside the clock level IRQL VTune threads can increase its priority level even to real time.You can check with Process Explorer thread priority levels.Maybe thread which is performing I/O sets its priority to high level (above normal priority which is 8) and thus more time CPU is spending executing NtReadFile function.

Faça login para deixar um comentário.