When searching for a good, clean working example other than perhaps writing out a "Hello World" string to the console in Netwide Assembler (NASM) for standard x86 architecture...
Test your C/C++ skills - find bugs in popular open-source projects.
This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.