Writing the sample code for this post I was amazed myself to see how simple it was to reach over 20 times performance improvement with so little effort.
This article shows how to use 256-bit Intel® Advanced Vector Extensions (Intel® AVX) to normalize an array of 3D vectors. We describe a shuffle approach to convert between AOS & SOA on-the-fly in order to make data ready for up to 8-wide SIMD processing.
This white paper describes a code sample that uses Intel® AVX for computing mesh-based cloth simulation. A structure of arrays (SOA) implementation is used to maximize data parallelism enabling usage of 256-bit (8 float) SIMD processing. Code is provided.
Photo-realistic rendering requires accurate simulation of light propagation according to physics laws. The best known way to solve this problem is Monte Carlo ray tracing. We describe a state-of-the-art photo-realistic Monte Carlo rendering engine.
DXT compression is a lossy texture compression algorithm that can reduce texture storage requirements and decrease texture bandwidth. DXT decompression is typically hardware accelerated, which makes it very fast and efficient.
AVX Cloth Intel Corporation
This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth buffer. The sample code uses frustum culling and is optimized with Streaming SIMD Extensions (SSE)...
Improving ETC1 and ETC2 texture compression What is texture compression?
Migrate highly vectorized GPU compute kernels to CPU code using the Intel® SPMD Program Compiler (commonly referred to in previous documents as ISPC). Includes a link to a Github code sample to help you utilize spare CPU cycles to create a richer gaming experience.