Last year we published a paper, entitled “Mapping and optimization of the AVS Video Decoder on a high-performance chip multiprocessor” . Its main purpose was, as its title denotes, to optimize the Chinese “Audio Video Standard” (AVS)  decoder, on Intel’s quad-core i7. In the context of this work we evaluated the performance of different code versions, in a variable number of cores (including/excluding Hyper-threading and Turbo Boost features).
Memory access characteristics in manycore NUMA systems are not always obvious to the programmer. A process may see widely varying latency and bandwidth for memory accesses depending on which CPU the process is running and on which memory node the data is located.
Here's a quick report off my initial reactions after spending a couple of hours getting oriented to the Manycore Testing Lab (MTL) through "VIP access", from my perspective as a CS prof at a small college.
- Page 2