Knights Corner Micro-Architecture Support

How does a high performance SMP on-a-chip sound to you?  I can now share, for the first time, key details about our vision for Knights Corner (the aforementioned high performance SMP on-a-chip), and our thinking behind the software architecture and features.   There is a lot to cover here so I’ll cover it in two posts: an overview “Knights Corner micro-architecture support” and an introduction to the software stack in “Knights Corner: Open source software stack.”

The software stack and low level programming documentation for pre-production Knights Corner coprocessors are now available for download. These are most useful to software developers who have access to the pre-production systems today.

Knights Corner running Linux provides an environment that is flexible, familiar and very powerful. While most of us will simply use the software to operate Knights Corner, and build our applications on top of it, the ability to experiment by varying (and possibly improving upon) the operating environment will prove irresistible to some. I have been particularly impressed with demonstrations of exactly this last November at SC’11 by the University of Tokyo by Professor Yutaka Ishikawa and his team. They will be at ISC in Hamburg June 17-20, 2012 with “Development of a System Software Stack for Many-Core Based Supercomputers.”

Linux and Tools

The software stack download includes source for an embedded Linux environment which runs on Knights Corner coprocessors along with the driver code that connects Knights Corner coprocessors to a host processor. To build this code stack, Intel has included a minimally modified GCC compiler. To support application development, Intel has enabled GDB. This is just the beginning! The Intel® Parallel Studio XE and Intel® Cluster Studio XE products for Linux are also entering public field-testing, including support for Knights Corner coprocessors. These high-performance, familiar and popular tools simply support Knights Corner as another target without requiring new tools or separate product purchases. A number of other vendors are working on their support for Knights Corner as well in their familiar and popular products.

Low-level programming documentation

The essential Knights Corner low-level programming documents available include the “Knights Corner Instruction Set Reference Manual,” the ABI document “System V Application Binary Interface K1OM Architecture Processor Supplement,” and the “Knights Corner Performance Monitoring Units.”  These documents are aimed primarily at tools and library developers but contain useful information for all developers. Particularly useful are the explanations of the Knights Corner vector capabilities and the performance monitoring capabilities used by tools like the Intel® VTune Amplifier XE.

Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel workloads.

The MIC architecture is specifically designed to provide the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. Knights Corner is the first product to use this architecture and deliver on this exciting vision. It’s an SMP on-a-chip, with the following key improvements:

With Knights Corner, we retain the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. It’s an SMP on-a-chip, with the following key improvements:

  • We have scale and power benefits from using a small, simple micro-architecture. An Intel® Pentium® processor design is extended with 64-bit support and a few additional instructions (like CPUID) which are documented in Appendix B of the “Knights Corner Instruction Set Reference Manual.”
  • For high degrees of data parallelism, the Knights Corner vector capability not found in any other Intel processor is added.
  • The many cores are connected using a high performance on-chip interconnect design.
  • Knights Corner is designed to be a coprocessor that lives on the PCIe bus, and requires a host processor to boot the system. 

This combination of Linux, 64-bits, and new vector capabilities with an Intel® Pentium® processor-derived core, means that Knights Corner is not completely binary compatible with any previous Intel processor. Because of its unique nature, you’ll see statements like this in our code: “Disclaimer: The codes contained in these modules may be specific to the Intel® Software Development Platform codenamed: Knights Ferry, and the Intel® product codenamed: Knights Corner, and are not backward compatible with other Intel® products. Additionally, Intel® makes no commitments for support of the code or instruction set in future products.” This notice speaks to low level details and affects tool vendors primarily. Tools by Intel, GCC and other vendors will support Knights Corner by following the Instruction Set Architecture (ISA) and Application Binary Interface (ABI) documents so most developers can program at a completely portable level in their applications.

Programs written in high-level languages (C, C++, Fortran, etc.) can easily remain portable despite any ISA or ABI differences. Programming efforts will center on exploiting the high degree of parallelism through vectorization and scaling: Vectorization to utilize Knights Corner vector instructions and scaling to use more than 50 cores. This has the familiarity of optimizing to use a highly-parallel SMP system based on CPUs. The result is highly approachable with familiar programming models.

Twice the bang from a single optimization effort

The keys to a good application for Knights Corner are the same as for an Intel® Xeon® processor: scale and vectorize.  As programmers we pick a method to scale (OpenMP*, MPI, Intel TBB, etc.) and we pick a method for vectorization (compiler options, pragmas, Intel® Cilk™ Plus, etc.) and we tune.  In general, results from tuning for a MIC architecture based machine have also increased performance on the Intel Xeon processor as well. I think of it as “Two tunes for the price of one.” This gives double the motivation to tune our programs to scale and vectorize well. When you want to work on optimization, it is a nice bonus to have the benefits be widely applicable.

Coprocessor that is powerful enough to boot Linux*

Knights Corner is a coprocessor so it lives in a system with a regular “host” processor that boots up the overall system and orders the coprocessor around. This is a product design, not an architectural feature.

While it is an SMP-on-a-chip, we can think of Knights Corner as an embedded system from the standpoint of the operating system.  Therefore, Knights Corner boots and runs an embedded Linux kernel and is connected to the main system through enabling software which lives in drivers on both the host system and the embedded system.

The changes to open source components are in support of the instruction set choices, the ABI, initializing and controlling an SMP on-a-chip, and the enabling software to support the coprocessor communication with the host system.

For more information on Linux, GCC and GDB for Knights Corner, see my blog “Knights Corner: Open source software stack."

A place to discuss more

For up-to-date information on the software stack or Knights Corner documents, the Intel® Many Integrated Core (MIC) Architecture Forum is the place to go. Join me there and we can discuss the future of highly parallel computing!

For more complete information about compiler optimizations, see our Optimization Notice.


anonymous's picture


What [arch] kernel does MIC run? Is the source available?

anonymous's picture

Sorry for the delayed response to James's postings, and thanks, James, for your kind comments!

There has been some work afoot to bring ispc to MIC, with some very promising early results. Now that the Knight's Corner ISA and intrinsics are public, we're working to pull this work together to make it available in the public ispc release. (Of course, this initial release will only be useful to people who currently have MIC systems, but then once MIC / Xeon Phi are released widely, I hope that ispc on MIC will be well-tested and well-tuned.)

I think that we'll have this work released pretty soon, but can't promise anything specific just yet. Once it is out, I'd be quite interested to hear feedback and suggestions from anyone with a MIC system who tries it out.


anonymous's picture

I see how vectorization is necessary for optimum performance, but what sort of improvements can be expected on MIC with just thread-level parallelism with no vectorization?

anonymous's picture

Yes, I'm coming to see that, James. Lots of modifications. The ramifications of the vector unit bounce back and forth through the architecture, it's much more than involved than SSE. The four-way threading is new and 64 bits, but it's an in-order execution unit. I also like the large number of memory channels, that's huge.

I'm looking forward to seeing the SEP extensions to cover this one. :D

James R.'s picture

RE: memory model and cache coherence. A: Yes, the cores within a Knights Corner coprocessor are cache coherent with each other. I like to call it an SMP-on-a-chip.

RE: which Pentium is the underlying core. A: the original Pentium processor with lots of modifications (like adding 64-bits and SIMD capabilities)

RE: ispc. A: Let's encourage Matt and Bill to comment, or maybe just port it for Knights Corner for us all.

anonymous's picture

James, I read through the ispc docs and discovered that it does indeed target LLVM-IR. Excellent! There will have to be some additions for the fact that MIC is a co-processor with its own memory, but an interim solution would be to insert Intel's LEO and preconfigure the board rather than trying to modify its code on the fly. Does this sound reasonable? :D

anonymous's picture

James, I think it would be helpful to reference the Knights Corner Software Developers Guide, document 488596. The ISA is really only about the deltas, and it can be confusing if you don't have the overview.

Also, *which* Pentium is the underlying core? Core2 or original, or i3?

AFA my personal preference for a development environment, I'd like to see LLVM-IR extended to support MIC. That would allow anyone to choose the front end they wish, from C++ to Ruby.

anonymous's picture

Hi James,

questions about the memory model and cache coherence in the SMP on-a-chip Knight Corner that may be already discussed (sorry to ask again). is there cache-coherence among multiple cores?, or there is NO shared cache among all the cores? so only shared memory whose consistency has to be managed by programmers.


anonymous's picture

Hi James - Thanks for the reply and your comments concerning ispc and the suitability of SPMD programming models for massively wide FP vector units that are apart of the Knights Corner (MIC) architecture.


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.