How should I do optimizations that speed and memory access cycles associated with Intel C++ compiler

How should I do optimizations that speed and memory access cycles associated with Intel C++ compiler

My program have include a lot of loop with memory access. Now I use optimization that is O2 Maximize Speed. But should I use O3 Highest optimization instead. Also what else can I do adjustments

7 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Best Reply

-O3 adds mainly optimizations for multiple level loops, at possible expense of increased size of generated code. You could see what is added for your application by comparing compiler reports e.g. -Qopt-report-file=source.txt -Qopt-report4. Those reports are invaluable to show existence and nature of compiler optimizations applied to your critical loops. The exact meaning of the numeric suffix on opt-report varies with compiler version.
As you've no doubt read elsewhere, you should start by analysis to determine the location and nature of any performance bottlenecks. ICL offers -Qprofile-loops option for such purposes. recent VTune profiles have made great improvements in the general analysis category.

I'm using VTune already. But where is the optimizations reports ? and how to use -Qprofile-loops option ?

 

If you use Visual Studio GUI I suppose you must add the opt-report and profile-loops options in the additional command line options, and perhaps examine the results in a text editor.

If you want the opt-report results to appear in your build log, of course you will omit the opt-report-file option, but in my opinion it will be more difficult to compare to view the effect of changing your compile options and source code.

Are you trying to get by without the user guide?

I examining user guide. I add /Qprofile-loops:all and /Qopt-report-file:$(IntDir)$(TargetName).rep  in compiler command lines. I setup following way

Diagnostics

I have a file that ParallelSearch.rep but I did not see any log for profile-loops and diagnostic file .

.diag

icl: command line warning #10333: Loop profiler cannot be used when generating parallel code. Disabling '/Qprofile-loops'

.rep

<;-1:-1;IPO UNREFERENCED VAR REMOVING;;0>
  UNREF VAR REMOVAL ROUTINE-SYMTAB (....)

  UNREF VAR REMOVAL ROUTINE-SYMTAB (....)

  UNREF VAR REMOVAL ROUTINE-SYMTAB (....)

  UNREF VAR REMOVAL ROUTINE-SYMTAB (_main):VARS(8),PACKS (8)

 

I did not understand anything. What needs to be analyzed to ? 

It's probably good to begin profiling and vectorization optimizations with threaded parallelization off.
As you're using VTune, you probably don't need the profile-loops, but it's easy to be misled when starting out in VTune with parallelization.
Learning the opt-report stuff is particularly important with parallelization.

As the warning says 'when generating parallel code. Disabling '/Qprofile-loops'' ,since instrumentation calls inserted at a function's entry and exit points, and before and after instrumentable loops may not work well in parallel context and make it's hard to get analyzed.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui