New white paper: in-place multithreaded transposition with common code for CPU and MIC

New white paper: in-place multithreaded transposition with common code for CPU and MIC

A few months ago I posted about issues with producing a matrix transposition code that works well on Xeon Phi. Since then, I did more homework and improved the code to yield a satisfactory 113 GB/s transposition rate on 7110P (67% of the STREAM copy bandwidth). The link to the white paper about it is: http://research.colfaxinternational.com/post/2013/08/12/Trans-7110.aspx   The paper contains a discussion of the method, code snippets, compiler flags, benchmarks and a comparison with MKL; the source code is publicly available.

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thank you, Andrey.

For those who want to see Andrey's original post, it is at: http://software.intel.com/en-us/forums/topic/391162.

Leave a Comment

Please sign in to add a comment. Not a member? Join today