An Overview of Programming for Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors

I have written a paper to explain programming for the Intel® Xeon Phi™ coprocessor. The part that may surprise you is this: it's a paper focused on just doing parallel programming. Understanding how to restructure to expose more parallelism is critically important to enable the best performance on any device (processors, GPUs or coprocessors). Advice for successful parallel programming can be summarized as “Program with lots of threads that use vectors with your preferred programming languages and parallelism models.” This restructuring itself will generally yield benefits on most general-purpose computing systems, a bonus due to the emphasis on common programming languages, models, and tools that span these processors and coprocessors. I refer to this bonus as the dual-transforming-tuning advantage - an advantage you would lose by switching to a CUDA* or OpenCL*-based solution.

Intel Xeon Phi coprocessors are designed to extend the reach of applications that have demonstrated the ability to fully utilize the scaling capabilities of Intel® Xeon® processor-based systems and fully exploit available processor vector capabilities or memory bandwidth. For such applications, the Intel Xeon Phi coprocessors offer additional power-efficient scaling, vector support, and local memory bandwidth, while maintaining the programmability and support associated with Intel Xeon processors.

In my paper, I work to explain more fully the implications of such high levels of parallelism and the work needed to develop parallelism, while benefiting your application on processors as well.

I hope you find it useful.


In addition to this paper, there is a succinct document that explains the same concepts, but in a  "flowchart format", and links to additional resources.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

Jesmin Jahan T.'s picture


I am getting an error like this:
HOST--ERROR:myoiOSSetPageAccess: mprotect failed!
Please increase the maximum of memory map areas
i.e. echo 256000 > /proc/sys/vm/max_map_count
How to fix this.

Also, is it possible for CPU and Coprocessor to simultaneously write on the same array?


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.