# 如何使用数据结构优化 32 位英特尔® 架构上的内存使用

### 挑战

 ``````typedef struct{ float x,y,z; int a,b,c; ... } Vertex; Vertex Vertices[NumOfVertices]; ``````

### 解决方案

 ``````typedef struct{ float x[NumOfVertices]; float y[NumOfVertices]; float z[NumOfVertices]; int a[NumOfVertices]; int b[NumOfVertices]; int c[NumOfVertices]; ... } VerticesList; VerticesList Vertices; ``````

 ``````; The dot product of an array of vectors (Array) and a ; fixed vector (Fixed) is a common operation in 3D ; lighting operations, ; where Array = (x0,y0,z0),(x1,y1,z1),... ; and Fixed = (xF,yF,zF) ; A dot product is defined as the scalar quantity ; d0 = x0*xF + y0*yF + z0*zF. ; AoS code ; All values marked DC are “don’t-care.” ; In the AOS model, the vertices are stored in the ; xyz format movaps xmm0, Array ; xmm0 = DC, x0, y0, z0 movaps xmm1, Fixed ; xmm1 = DC, xF, yF, zF mulps xmm0, xmm1 ; xmm0 = DC, x0*xF, y0*yF, z0*zF movhlps xmm1, xmm0 ; xmm1 = DC, DC, DC, x0*xF addps xmm1, xmm0 ; xmm0 = DC, DC, DC, ; x0*xF+z0*zF movaps xmm2, xmm1 shufps xmm2, xmm2,55h ; xmm2 = DC, DC, DC, y0*yF addps mm2, xmm1 ; xmm1 = DC, DC, DC, ; x0*xF+y0*yF+z0*zF ; SoA code ; ; X = x0,x1,x2,x3 ; Y = y0,y1,y2,y3 ; Z = z0,z1,z2,z3 ; A = xF,xF,xF,xF ; B = yF,yF,yF,yF ; C = zF,zF,zF,zF movaps xmm0, X ; xmm0 = x0,x1,x2,x3 movaps xmm1, Y ; xmm0 = y0,y1,y2,y3 movaps xmm2, Z ; xmm0 = z0,z1,z2,z3 mulps xmm0, A ; xmm0 = x0*xF, x1*xF, x2*xF, x3*xF mulps xmm1, B ; xmm1 = y0*yF, y1*yF, y2*yF, y3*xF mulps xmm2, C ; xmm2 = z0*zF, z1*zF, z2*zF, z3*zF addps xmm0, xmm1 addps xmm0, xmm2 ; xmm0 = (x0*xF+y0*yF+z0*zF), ... ``````

 ``````NumOfGroups = NumOfVertices/SIMDwidth typedef struct{ float x[SIMDwidth]; float y[SIMDwidth]; float z[SIMDwidth]; } VerticesCoordList; typedef struct{ int a[SIMDwidth]; int b[SIMDwidth]; int c[SIMDwidth]; ... } VerticesColorList; VerticesCoordList VerticesCoord[NumOfGroups]; VerticesColorList VerticesColor[NumOfGroups]; ``````

• 数据进行了整理，可实现更高效的垂直 SIMD 计算
• 比 AoS 更简单/更少的地址生成
• 更少的数据流，可减少页面遗漏
• 由于数据流较少，因此使用的预取较少
• 对并发使用的数据元素进行高效的缓存行打包

1