performance

Code Timing and Profiling for Linux on 64-Bit Intel® Architecture


Challenge

Measure the time a program and its functions take to execute as part of the diagnosis phase of performance optimization. Such measurements are extremely valuable as a simple means to become familiar with how an application behaves during execution.


Solution

Use either the Linux time command or the clock function in the C library, and profile the application during compilation. The time command is used as follows:

prompt> time 

It gives the following information:

  • itanium
  • performance
  • How-To
  • Analyze Memory Accesses on 64-Bit Intel Architecture


    Challenge

    Determine what memory accesses are causing EXE pipeline stalls accumulated by the BE_EXE_Bubble counter. Most memory-access stall cycles are accumulated by the BE_EXE_Bubble counter. This counter accumulates stall cycles in the EXE stage of the pipeline. These stall cycles occur mostly because the data loaded into registers is not ready for consumption by the functional units.

    These dependency stalls break down to two sources:

  • itanium
  • performance
  • How-To
  • Intel® Itanium® Prozessoren
  • Tuning Guides and Performance Analysis Papers

    Intel® VTune™ Amplifier XE Tuning Guides

    Our tuning guides explain how to identify common software performance issues using VTune Amplifier XE, and give suggestions for optimization.

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • C#
  • C/C++
  • Fortran
  • Java*
  • Fortgeschrittene
  • Intel® VTune™ Amplifier XE
  • Intel® VTune™ Performance Analyzer
  • Sandy Bridge
  • ivy bridge
  • Many Integrated Core
  • Xeon Phi coprocessor
  • nehalem
  • Xeon
  • Tuning Guide
  • performance
  • optimization
  • software
  • Optimierung
  • Optimizing Software Applications for NUMA: Part 6 (of 7)

    3.3 Data Placement Using Explicit Memory Allocation Directives

    Another approach to data placement in NUMA-based systems is to make use of system APIs that explicitly configure the location of memory page allocations. An example of such APIs is the libnuma library for Linux.[1]

    Optimizing Software Applications for NUMA: Part 5 (of 7)

    3.2. Data Placement Using Implicit Memory Allocation Policies

    In the simple case, many operating systems transparently provide support for NUMA-friendly data placement. When a single-threaded application allocates memory, the processor will simply assign memory pages to the physical memory associated with the requesting thread’s node (CPU package), thus insuring that it is local to the thread and access performance is optimal.

    Seiten

    performance abonnieren