performance

Perform Code Timing and Profiling for Linux on 64-Bit Architecture


Challenge

Measure the time a program and its functions take to execute as part of the diagnosis phase of performance optimization. Such measurements are extremely valuable as a simple means to become familiar with how an application behaves during execution.


Solution

Use either the Linux time command or the clock function in the C library, and profile the application during compilation. The time command is used as follows:

prompt> time

It gives the following information:

  • Linux*
  • performance
  • Processadores Intel® Itanium®
  • Perform Back-End Bubble Root-Cause Analysis on 64-Bit Intel® Architecture


    Challenge

    Identify the root cause of a back-end processor bubble on the Intel® Itanium® processor. A separate item, How to Identify Back-End Bubbles on 64-Bit Intel® Architecture, shows how to use the Intel® VTune™ Performance Analyzer to identify a bubble. In order to resolve this performance issue, the root cause of the bubble must be determined.

  • performance
  • Processadores Intel® Itanium®
  • Code Timing and Profiling for Linux on 64-Bit Intel® Architecture


    Challenge

    Measure the time a program and its functions take to execute as part of the diagnosis phase of performance optimization. Such measurements are extremely valuable as a simple means to become familiar with how an application behaves during execution.


    Solution

    Use either the Linux time command or the clock function in the C library, and profile the application during compilation. The time command is used as follows:

    prompt> time 

    It gives the following information:

  • performance
  • Processadores Intel® Itanium®
  • Analyze Memory Accesses on 64-Bit Intel Architecture


    Challenge

    Determine what memory accesses are causing EXE pipeline stalls accumulated by the BE_EXE_Bubble counter. Most memory-access stall cycles are accumulated by the BE_EXE_Bubble counter. This counter accumulates stall cycles in the EXE stage of the pipeline. These stall cycles occur mostly because the data loaded into registers is not ready for consumption by the functional units.

    These dependency stalls break down to two sources:

  • performance
  • Processadores Intel® Itanium®
  • Optimizing Software Applications for NUMA: Part 6 (of 7)

    3.3 Data Placement Using Explicit Memory Allocation Directives

    Another approach to data placement in NUMA-based systems is to make use of system APIs that explicitly configure the location of memory page allocations. An example of such APIs is the libnuma library for Linux.[1]

    Optimizing Software Applications for NUMA: Part 5 (of 7)

    3.2. Data Placement Using Implicit Memory Allocation Policies

    In the simple case, many operating systems transparently provide support for NUMA-friendly data placement. When a single-threaded application allocates memory, the processor will simply assign memory pages to the physical memory associated with the requesting thread’s node (CPU package), thus insuring that it is local to the thread and access performance is optimal.

    Assine o performance