Intel VTune Amplifier on Xeon Phi

Intel VTune Amplifier on Xeon Phi

Hi All,

I have few questions regarding Intel VTune Amplifier which I plan to use on Xeon Phi 7210:

  • Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?
  • How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled? 
  • Can this tool be used for Intel Optimized Caffe analysis?
    • Can I collect data without GUI i.e. using command line version of Intel VTune Amplifier if available?

Thanks.

Chetan Arvind Patil
5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

Hello Chetan Arvind Patil,

Please find below answers on your questions:

  • Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?

On Linux we can observe up to 20-25% of one core occupancy by VTune Amplifier UI. So if you profile a highly throughput application I would recommend to perform collection in command line and the open result in VTune GUI to eliminate any side effects on sharing the same core by VTune UI and the application under profiling.

Please also note that because of a single thread performance on Xeon Phi VTune GUI might be a bit "wiggling" so one recommended way is to collect results through command line on Xeon Phi target and then transfer them (or use file share) to a client machine  with better single thread performance. You can also use remote collection from a client machine to Xeon Phi. Then VTune will automatically copy traces and files for symbol resolving on a client machine.

  • How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled?

If you use "Launch Application" mode providing your application to profile then VTune will show you performance information that is related to your application and its follow child processes (if it is not specially switched off). However there are  metrics based on system wide monitoring like memory bandwidth on uncore events that will include all what happened on the system.

The question on Caffe was address in another thread I suppose.

BTW - are you interested in algorithmic optimization or also going to micro-architecture level?

Thanks & Regards, Dmitry

 

Hi Dmitry,

I am interested in software optimization based on how a framework/application is utilizing the architecture like Xeon Phi.

Thanks. 

Chetan Arvind Patil

Hello Chetan Arvind Patil,

Let me recommend to try "HPC Performance Characterization" analysis that shows several important aspects of application performance on Xeon Phi at once: parallelism/CPU utilization with insight to parallel runtimes efficiency like OpenMP, memory access efficiency and some vectorization efficiency information.

Also VTune now has a light weight performance snapshot tool - application performance snapshot that can make quick performance overview in the form of a command line and HTML report. The tool is in <isntall_dir>/bin64 directory with "aps" name.

Thanks & Regards, Dmitry

 

Hi Dmitry,

Can "HPC Performance Characterization" do thread level analysis?

Thanks.

Chetan Arvind Patil

Leave a Comment

Please sign in to add a comment. Not a member? Join today