Product Owner Corner - The Starter

Product Owner Corner - The Starter

Hi,

I am Valery Cherepennikov, VTune Product Owner – a guy who runs the development process and makes most decisions wrt this product, but not all of them :)  Hereby, I am starting a blog about Intel® VTune™ Amplifier  on behalf of our team. The plan is to post it on a monthly basis, aligned with my status report schedule :)  

We pursue several goals here:

First, explain how to use the tool for performance tuning. This process is known to be complicated and our goal is to make it a bit simpler.

Second, highlight new features of the product, which you may find useful for your tasks and purposes.

Third (probably the most important), we want to make the development of the tool more interactive, relying on your feedback. We want to understand what works and what doesn’t, what parts of the product look puzzling, and what we could improve.

Also, you may expect some funny stories from our developer’s life to make the process of "product cooking" more transparent :)

As for features to highlight for VTune Amplifier 2018 Gold, I (arbitrarily) start with these three:

  1. To make VTune Amplifier easier to use, we have published a cookbook, a set of recipes for identifying and resolving common performance problems with our tool. Currently it contains 8 recipes, but we plan to extend it with the most interesting use cases. If you’ve got any of this kind, please let us know.

  2. Application Performance Snapshot, which is a free lightweight tool with a simple interface to help you characterize your app. It’s targeted for HPC apps mostly, but could be useful in other areas too. You might try it downloading here.

  3. Profiling of the apps running in a container environment (Docker*, Mesos*). This is a brand new feature we would like to hear your feedback on. You can find more info on it in this cookbook article.

Follow the forum : we are going to keep you posted on our progress with the new features and share technical articles from our experts.

Finally, I would like to ask you a set of questions since we’re currently working on simplifying VTune Amplifier’s interface and making the tuning flow more intuitive. So:

  • Which of VTune Amplifier analyses do you use most often?

  • What type of analysis do you usually start with?

  • What analysis looks the most difficult/puzzling to you?

Thanks in advance for the response to the above and any other considerations you may provide.

 Yours,

Valery   

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

General Exploration seems puzzling at first, but it is the most valuable for my work. The Top-Down method is really effective.

2 Dmitry A. Thank you, we are thinking towards making General Exploration more intuitive. What in this disturbs you the most? Amount of data? Number of columns we expose? Lack of high level picture? Anything else?

Thanks for your thoughts in advance.

Valery, they say VTune has preview support of some RTOS (such as, GHS, QNX). Are you going to move it to public? Do you think if it is interesting outside Intel?

Valery, well, for me it was difficult to understand what to do when summary pane had appeared after collection. The list of potential problems looks impressive and descriptions are thorough, but the Top-Down methodology is not mentioned on summary pane, not mentioned on Analysis type tab.. The TMAM is too powerful tool to hide it inside help.

Also, may be you find it beneficial to briefly describe how does the event-based sampling works, to enable users with more clear understanding of the digits in Bottom-Up and Event count panes meaning.

2 Kirill U. Yes, we are thinking towards supporting some Real Time OS, but still not convinced how many customer's program on RTOS. And, BTW, which ones. Is there anyone here at the forum interested in this?

Any plans for OpenMPI support (APS)?

Any scalability numbers for MPI? Profile size estimations for 1k ranks? 

2 Ca.T. APS scales well for 1K ranks. Amount of data ~50K/rank. so, it is going to be roughly 50MB.

OpenMPI support is in plans. Not sure when, but believe quite soon (EoY).

Are you interested in Vtune support or APS only?

Let me ask - what do you use APS for?

Thanks,

Valery 

About the new features in VTune 2018: profiling of the native and Java apps within a container is getting popular in enterprise environment. That's great that VTune supports Docker which is a de facto container standard right now. My question is about orchestration engines (e.g. Swarm for Docker container) that usually built into the same tooling. Do you think that profiling orchestration engines add more value to Container support in VTune?

Hi Valery,

I not often face with performance problems and performance tuning. But I have some experience in that aria.

Almost all problems that I detected was found with algorithm analysis such as advanced hotspots/concurrency. These types are not so powerful as microarchitecture analysis but for general user it's mostly enough. GPU hotspots is another thing which helped me and my friends with tuning of opencl program and once again there was an algorithmic imperfection... 

As I said I see the most difficult microarchitecture analysis. I know that they are very powerful and was a witness of amazing performance improvement  with them but had no of mine own. I think they are are applicable in narrow segment and has difficult entry curve. 

I also like the VTune Amplifier progress in host-target flow. So I can check performance on several platforms and analyse them on my workstation in my habitual environment almost without setup targets.

2 Denis P.

Not sure how much value profiling of orchestration engines like Swarm, Kubrenetes, etc. Would add? How much performance overhead orchestration adds up?

Honestly, I do not know enough about orchestration engines. Does anyone here use them in the cloud? What is typical usage?

2 Pavel Gerasimov:

Pavel, let me pre-announce couple things - we are working towards our tuning flow simplification in general and more intuitive re-presentation of micro-architecture analysis. At some point we will be ready to bring new concept to discussion here at the forum.

Stay tuned:)

BTW, what type of host target flow do you use? And what else we might do to help tuning in your environment?

 

Valery, 

l'm prefer linux to linux connection. I often have deal with embedded unix-based OS for example built by Yocto build system. Now the connection set via ssh so there is a need to have ssh-demon builtin. It's not a big problem because generally build system allow to add it. But as for the end-user it may be difficult to do it in case of some OS limitations.
Also there is non-unix-based targets. For example, embedded Windows which need to be profiled as the host now.

As for micro-architecture analysis you asked Dmitry. For me lack of high level picture is the main problem. All my attempts to use it for performance tuning stopped fast. So I very glad to hear that there is a work for simplification the flow.

Leave a Comment

Please sign in to add a comment. Not a member? Join today