Intel® VTune™ Amplifier XE Tuning Guides
Our tuning guides explain how to identify common software performance issues using VTune Amplifier XE, and give suggestions for optimization.
A long time ago, a mentor encouraged me to wake up earlier each day and spend some quiet time thinking about what’s going on in my life. While my job responsibilities and family were growing, my attention span was shrinking. I was becoming someone that life just “happened to.” I didn’t like it and he was tired of hearing me complain about it. So now I slow down a little each morning to thoughtfully consider my priorities and the bigger picture of life. It’s really helped me to prioritize my time (as well as remember all my kids’ names).
Writing the sample code for this post I was amazed myself to see how simple it was to reach over 20 times performance improvement with so little effort.
When last I had a chance to play with this code, I experimented with using multiple locks to enable multiple simultaneous (and disjoint) interactions between pairs of bodies. It helped but performance still didn’t cross the base line using only one thread. Overhead in the loop could be reduced by using only one scoped lock instead of two, but it would require an array of locks indexed by i, and j.
Questions générales :
The Intel® Energy Checker (Intel® EC) SDK provides simple mechanisms to import and export measures of "useful work". For an e-mail application, this could be things like the following:
At times threaded software requires some critical sections, mutexes or locks. Do developers always know which of the objects in their code has the most impact? If I want to examine my software to minimize the impact or restructure data to eliminate some of these synchronization objects and improve performance, how do I know where I should make changes to get the biggest performance improvement? Intel Parallel Amplifier can help me determine this.
[Warning: Math and physics alert! Math and physics alert!]
I think that you've all seen this equation before:
P = a * C * V2 * f
Where P is power, C is capacitance, V is the voltage across the gate (typically, Vdd), f is the clock frequency and a is some constant.