data

Graceful Enhancement

Matt Wolf and I were walking from our hotel to IDF 2011 this morning and we were talking about various architectures and what we each preferred. Matt described his perception and I said that sounds like graceful enhancement and he said that was a great name for it. It was a great mutually derived word that may be more a candidate for general usage than Colbert's truthiness.

Replace a Set of Pointers With a Base Pointer to Reduce Data Bloat


Challenge

Reduce data bloat due to the use of many pointers. Pointers in the Itanium® architecture are twice the size of pointers in 32-bit Intel® architecture, which may effectively double the size of data structures that are largely composed of pointers.

The following code defines a structure composed entirely of pointers:

struct f {


f *Pnextf;


g *Psibling;


h *Pparent;

 
};
 

 

  • data
  • Develop for Core processor
  • How-To
  • Parallel Computing
  • Manage Structure Padding to Avoid Data Bloat


    Challenge

    Reduce or eliminate data bloat due to structure padding. With the Itanium® architecture, data boundaries are naturally aligned, instead of freely (any-byte) aligned as on 32-bit Intel® architecture. Depending on the field order in a 64-bit struct, this change in boundaries may lead to padding of 32-bit fields, causing data bloat.

    If the following Win32* code were compiled for the 64-bit Intel architecture, the two variables height and weight would be padded, because they are 32-bit variables bounded by 64-bit boundaries:

  • data
  • Develop for Core processor
  • How-To
  • Parallel Computing
  • Manipulate Data Structure to Optimize Memory Use on 32-Bit Architecture


    Challenge

    Improve memory utilization by manipulating data-structure layout. For certain algorithms, like 3D transformations and lighting, there are two basic ways of arranging the vertex data. The traditional method is the array of structures (AoS) arrangement, with a structure for each vertex, as shown below:

  • data
  • Develop for Core processor
  • How-To
  • Parallel Computing
  • Loop Blocking to Optimize Memory Use on 32-Bit Architecture


    Challenge

    Improve memory utilization by means of loop blocking. The main purpose of loop blocking is to eliminate as many cache misses as possible. Consider the following loop, as it exists before blocking:

    class="Section1">float A[MAX, MAX], B[MAX, MAX] 
    
    for (i=0; i< MAX; i++) { 
    
    for (j=0; j< MAX; j++) { 
    
    A[i,j] = A[i,j] + B[j, i]; 
    
    } 
    
    } 
    
     
    
    

     

  • data
  • Develop for Core processor
  • How-To
  • Parallel Computing
  • Avoid Partial Memory Accesses on 32-Bit Intel® Architecture


    Challenge

    Avoid partial memory accesses. Consider a case with large load after a series of small stores to the same area of memory (beginning at memory address mem). The large load will stall in this case as shown here:

    mov mem, eax ; store dword to address “mem"
     
    mov mem + 4, ebx ; store dword to address “mem + 4"
     
    :
     
    :
     
    movq mm0, mem ; load qword at address “mem", stalls 
    
    

     

  • data
  • Develop for Core processor
  • How-To
  • Manipulate Data Structure to Optimize Memory Use on 32-Bit Intel® Architecture


    Challenge

    Improve memory utilization by manipulating data-structure layout. For certain algorithms, like 3D transformations and lighting, there are two basic ways of arranging the vertex data. The traditional method is the array of structures (AoS) arrangement, with a structure for each vertex, as shown below:

    typedef struct{ 
    
    float x,y,z; 
    
    int a,b,c; 
    
    . . . 
    
    } Vertex; 
    
    Vertex Vertices[NumOfVertices]; 
    
    

     

  • data
  • Develop for Core processor
  • How-To
  • data abonnieren