Knights Corner Micro-Architecture Support

How does a high performance SMP on-a-chip sound to you?  I can now share, for the first time, key details about our vision for Knights Corner (the aforementioned high performance SMP on-a-chip), and our thinking behind the software architecture and features.   There is a lot to cover here so I’ll cover it in two posts: an overview “Knights Corner micro-architecture support” and an introduction to the software stack in “Knights Corner: Open source software stack.”

The software stack and low level programming documentation for pre-production Knights Corner coprocessors are now available for download. These are most useful to software developers who have access to the pre-production systems today.

Knights Corner running Linux provides an environment that is flexible, familiar and very powerful. While most of us will simply use the software to operate Knights Corner, and build our applications on top of it, the ability to experiment by varying (and possibly improving upon) the operating environment will prove irresistible to some. I have been particularly impressed with demonstrations of exactly this last November at SC’11 by the University of Tokyo by Professor Yutaka Ishikawa and his team. They will be at ISC in Hamburg June 17-20, 2012 with “Development of a System Software Stack for Many-Core Based Supercomputers.”

Linux and Tools

The software stack download includes source for an embedded Linux environment which runs on Knights Corner coprocessors along with the driver code that connects Knights Corner coprocessors to a host processor. To build this code stack, Intel has included a minimally modified GCC compiler. To support application development, Intel has enabled GDB. This is just the beginning! The Intel® Parallel Studio XE and Intel® Cluster Studio XE products for Linux are also entering public field-testing, including support for Knights Corner coprocessors. These high-performance, familiar and popular tools simply support Knights Corner as another target without requiring new tools or separate product purchases. A number of other vendors are working on their support for Knights Corner as well in their familiar and popular products.

Low-level programming documentation

The essential Knights Corner low-level programming documents available include the “Knights Corner Instruction Set Reference Manual,” the ABI document “System V Application Binary Interface K1OM Architecture Processor Supplement,” and the “Knights Corner Performance Monitoring Units.”  These documents are aimed primarily at tools and library developers but contain useful information for all developers. Particularly useful are the explanations of the Knights Corner vector capabilities and the performance monitoring capabilities used by tools like the Intel® VTune Amplifier XE.

Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel workloads.

The MIC architecture is specifically designed to provide the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. Knights Corner is the first product to use this architecture and deliver on this exciting vision. It’s an SMP on-a-chip, with the following key improvements:

With Knights Corner, we retain the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. It’s an SMP on-a-chip, with the following key improvements:

  • We have scale and power benefits from using a small, simple micro-architecture. An Intel® Pentium® processor design is extended with 64-bit support and a few additional instructions (like CPUID) which are documented in Appendix B of the “Knights Corner Instruction Set Reference Manual.”
  • For high degrees of data parallelism, the Knights Corner vector capability not found in any other Intel processor is added.
  • The many cores are connected using a high performance on-chip interconnect design.
  • Knights Corner is designed to be a coprocessor that lives on the PCIe bus, and requires a host processor to boot the system. 

This combination of Linux, 64-bits, and new vector capabilities with an Intel® Pentium® processor-derived core, means that Knights Corner is not completely binary compatible with any previous Intel processor. Because of its unique nature, you’ll see statements like this in our code: “Disclaimer: The codes contained in these modules may be specific to the Intel® Software Development Platform codenamed: Knights Ferry, and the Intel® product codenamed: Knights Corner, and are not backward compatible with other Intel® products. Additionally, Intel® makes no commitments for support of the code or instruction set in future products.” This notice speaks to low level details and affects tool vendors primarily. Tools by Intel, GCC and other vendors will support Knights Corner by following the Instruction Set Architecture (ISA) and Application Binary Interface (ABI) documents so most developers can program at a completely portable level in their applications.

Programs written in high-level languages (C, C++, Fortran, etc.) can easily remain portable despite any ISA or ABI differences. Programming efforts will center on exploiting the high degree of parallelism through vectorization and scaling: Vectorization to utilize Knights Corner vector instructions and scaling to use more than 50 cores. This has the familiarity of optimizing to use a highly-parallel SMP system based on CPUs. The result is highly approachable with familiar programming models.

Twice the bang from a single optimization effort

The keys to a good application for Knights Corner are the same as for an Intel® Xeon® processor: scale and vectorize.  As programmers we pick a method to scale (OpenMP*, MPI, Intel TBB, etc.) and we pick a method for vectorization (compiler options, pragmas, Intel® Cilk™ Plus, etc.) and we tune.  In general, results from tuning for a MIC architecture based machine have also increased performance on the Intel Xeon processor as well. I think of it as “Two tunes for the price of one.” This gives double the motivation to tune our programs to scale and vectorize well. When you want to work on optimization, it is a nice bonus to have the benefits be widely applicable.

Coprocessor that is powerful enough to boot Linux*

Knights Corner is a coprocessor so it lives in a system with a regular “host” processor that boots up the overall system and orders the coprocessor around. This is a product design, not an architectural feature.

While it is an SMP-on-a-chip, we can think of Knights Corner as an embedded system from the standpoint of the operating system.  Therefore, Knights Corner boots and runs an embedded Linux kernel and is connected to the main system through enabling software which lives in drivers on both the host system and the embedded system.

The changes to open source components are in support of the instruction set choices, the ABI, initializing and controlling an SMP on-a-chip, and the enabling software to support the coprocessor communication with the host system.

For more information on Linux, GCC and GDB for Knights Corner, see my blog “Knights Corner: Open source software stack."

A place to discuss more

For up-to-date information on the software stack or Knights Corner documents, the Intel® Many Integrated Core (MIC) Architecture Forum is the place to go. Join me there and we can discuss the future of highly parallel computing!

For more complete information about compiler optimizations, see our Optimization Notice.


The keys are the addition of three features beyond what our 64 bit processors offer:
(1) wider vector instructions [512 bits wide, vs. 256 for AVX or 128 for SSE] instead of MMX, SSE or AVX
(2) four threads per core [hyper-threading on Intel Xeon processors have two]
(3) lots more cores [more than 50]

Taking advantage of these is particularly useful for highly parallel programs.

The complete Knights Corner Instruction manual is online at

Eugene, Good question, but I may not know enough to answer. I am not aware of all the software customers are doing for Knights Corner. I have not heard of anyone working on JavaVM so far. I have seen Python, several MPI ports and multiple scientific library ports - in additional to tens of millions of lines of applications (Fortran, C and C++). It is early, so maybe someone will do it - or you can when you have a machine!