How-To

Solve Prefetch Performance Issues


Challenge

Avoid performance penalties associated with excessive software prefetching. Prefetch instructions are not completely free in terms of bus cycles, machine cycles, and other resources, even though they require minimal clocks and memory bandwidth. Excessive prefetching may lead to performance penalties because of issue penalties in the front-end of the machine and/or resource contention in the memory sub-system. This effect may be severe in cases where the target loops are small and/or cases where the target loop is issue-bound.

  • Memory cache
  • performance optimization
  • How-To
  • Parallel Computing
  • Send Email from an ASP .NET Environment

    by Rahul Guha


    Challenge

    Sending email from ASP .NET pages has become a very typical request. In pre-.NET days one had to make use of a COM component (usually CDO or CDONTS), which allowed the developer to send email messages. It required the developer to make sure that the component was installed properly and then maintain the versions of the component.

  • .net
  • How-To
  • Parallel Computing
  • Secure Mobilized Applications and Wireless Clients


    Challenge

    Take simple steps to dramatically increase the security of mobilized applications and wireless clients. Security issues in wireless networking environments are well known. Nevertheless, many common security holes in wireless applications and LANs can be fixed with a minimum of effort.


    Solution

    Make best use of the existing security measures that are available in the wireless LAN environment. Listed below are methods to neutralize some of the most common security vulnerabilities found in wireless LANs:

  • Off-line Synchronization
  • How-To
  • Mobility
  • Schedule Instructions Optimally on 64-Bit Intel® Architecture


    Challenge

    Schedule instructions properly for optimal performance on the Intel® Itanium® processor. Optimal scheduling will minimize the chances of implicit stops or unexpected dispersal-related stalls.


    Solution

    Observe the following heuristics whenever possible, which are based on best-known methods for instruction scheduling on 64-bit Intel architecture:

  • itanium
  • performance
  • How-To
  • Resolve Memory Access Stalls on 64-Bit Intel® Architecture


    Challenge

    Resolve memory access stalls in the EXE pipeline stage on 64-Bit Intel® Architecture. Memory access stalls occur when the data is not available in the caches as expected. The instructions that are dependent on this data being loaded and available will stall until the load has completed. There are two main causes for the data not being available:

  • itanium
  • Stall Analysis
  • How-To
  • Resolve Cache Misses on 64-Bit Intel Architecture


    Challenge

    Resolve cache misses that cause a significant number of stall cycles. These occur when data is not in the desired cache and data retrieval requires access to a slower cache, memory, or disk.


    Solution

    Prefetch the data in advance to ensure availability, or implement a more localized data use. This can mean any of the following:

  • itanium
  • How-To
  • compiler
  • Intel® Itanium® Processors
  • Remove Many Bank Conflicts on 64-Bit Intel® Architecture


    Challenge

    Remove bank conflicts from high-level loops. Removing bank conflicts is, in many cases, rather simple. Consider the following double-precision matrix multiply in Fortran:

    Do k=1,MAX 
    Do j=1,MAX 
    Do i=1,MAX 
    a(i,k)=a(i,k) + b(i,j)*c(j,k) 
    enddo 
    enddo 
    enddo 
    
    

     


    Solution

    Unroll the inner loop and then interlace the unrolled lines. The first thing to improve performance is to unroll the inner loop.

  • itanium
  • Memory Access
  • How-To
  • Quantify Memory-Stall Penalties on 64-Bit Architecture


    Challenge

    Determine memory-access stall penalties due to simple cache misses. Whenever a load instruction attempts to access data from a data cache array that does not contain the desired data, it encounters a cache miss. All integer loads attempt to access the first-level instruction cache (L1D) first. All floating-point loads access the L2 first.

  • itanium
  • Stall Analysis
  • How-To
  • Intel® Itanium® Processors
  • Pages

    Subscribe to How-To