Knights Corner micro-architecture support

How does a high performance SMP on-a-chip sound to you?  I can now share, for the first time, key details about our vision for Knights Corner (the aforementioned high performance SMP on-a-chip), and our thinking behind the software architecture and features.   There is a lot to cover here so I’ll cover it in two posts: an overview “Knights Corner micro-architecture support” and an introduction to the software stack in “Knights Corner: Open source software stack.”

The software stack and low level programming documentation for pre-production Knights Corner coprocessors are now available for download. These are most useful to software developers who have access to the pre-production systems today.

Knights Corner running Linux provides an environment that is flexible, familiar and very powerful. While most of us will simply use the software to operate Knights Corner, and build our applications on top of it, the ability to experiment by varying (and possibly improving upon) the operating environment will prove irresistible to some. I have been particularly impressed with demonstrations of exactly this last November at SC’11 by the University of Tokyo by Professor Yutaka Ishikawa and his team. They will be at ISC in Hamburg June 17-20, 2012 with “Development of a System Software Stack for Many-Core Based Supercomputers.”

Linux and Tools

The software stack download includes source for an embedded Linux environment which runs on Knights Corner coprocessors along with the driver code that connects Knights Corner coprocessors to a host processor. To build this code stack, we have included a minimally modified GCC compiler. To support application development, we have enabled GDB. This is just the beginning! The Intel Parallel Studio XE and Intel Cluster Studio XE products for Linux are also entering public field-testing, including support for Knights Corner coprocessors. These high-performance, familiar and popular tools simply support Knights Corner as another target without requiring new tools or separate product purchases. A number of other vendors are working on their support for Knights Corner as well in their familiar and popular products.



Low-level programming documentation

The essential Knights Corner low-level programming documents available include the “Knights Corner Instruction Set Reference Manual,” the ABI document “System V Application Binary Interface K1OM Architecture Processor Supplement,” and the “Knights Corner Performance Monitoring Units.”  These documents are aimed primarily at tools and library developers but contain useful information for all developers. Particularly useful are the explanations of the Knights Corner vector capabilities and the performance monitoring capabilities used by tools like the Intel® VTune Amplifier XE.

Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel workloads.

The MIC architecture is specifically designed to provide the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. Knights Corner is the first product to use this architecture and deliver on this exciting vision. It’s an SMP on-a-chip, with the following key improvements:

With Knights Corner, we retain the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. It’s an SMP on-a-chip, with the following key improvements:

    • We have scale and power benefits from using a small, simple micro-architecture. We extended a Intel® Pentium® processor design with 64-bit support and a few additional instructions (like CPUID) which are documented in Appendix B of the “Knights Corner Instruction Set Reference Manual.”

    • For high degrees of data parallelism, we added the Knights Corner vector capability not found in any other Intel processor.

    • We connect the many cores using a high performance on-chip interconnect design.

    • Knights Corner is designed to be a coprocessor that lives on the PCIe bus, and requires a host processor to boot the system.



This combination of Linux, 64-bits, and new vector capabilities with an Intel® Pentium® processor-derived core, means that Knights Corner is not completely binary compatible with any previous Intel processor. Because of its unique nature, you’ll see statements like this in our code: “Disclaimer: The codes contained in these modules may be specific to the Intel® Software Development Platform codenamed: Knights Ferry, and the Intel® product codenamed: Knights Corner, and are not backward compatible with other Intel® products. Additionally, Intel® makes no commitments for support of the code or instruction set in future products.” This notice speaks to low level details and affects tool vendors primarily. Tools by Intel, GCC and other vendors will support Knights Corner by following the Instruction Set Architecture (ISA) and Application Binary Interface (ABI) documents so most developers can program at a completely portable level in their applications.

Programs written in high-level languages (C, C++, Fortran, etc.) can easily remain portable despite any ISA or ABI differences. Programming efforts will center on exploiting the high degree of parallelism through vectorization and scaling: Vectorization to utilize Knights Corner vector instructions and scaling to use more than 50 cores. This has the familiarity of optimizing to use a highly-parallel SMP system based on CPUs. The result is highly approachable with familiar programming models.



Twice the bang from a single optimization effort

The keys to a good application for Knights Corner are the same as for an Intel® Xeon® processor: scale and vectorize.  As programmers we pick a method to scale (OpenMP, MPI, TBB, etc.) and we pick a method for vectorization (compiler options, pragmas, Cilk™ Plus, etc.) and we tune.  In general, results from tuning for a MIC architecture based machine have also increased performance on the Intel Xeon processor as well. I think of it as “Two tunes for the price of one.” This gives double the motivation to tune our programs to scale and vectorize well. When you want to work on optimization, it is a nice bonus to have the benefits be widely applicable.



Coprocessor that is powerful enough to boot Linux

Knights Corner is a coprocessor so it lives in a system with a regular “host” processor that boots up the overall system and orders the coprocessor around. This is a product design, not an architectural feature.

While it is an SMP-on-a-chip, we can think of Knights Corner as an embedded system from the standpoint of the operating system.  Therefore, Knights Corner boots and runs an embedded Linux kernel and is connected to the main system through enabling software which lives in drivers on both the host system and the embedded system.

The changes to open source components are in support of the instruction set choices, the ABI, initializing and controlling an SMP on-a-chip, and the enabling software to support the coprocessor communication with the host system.

For more information on Linux, GCC and GDB for Knights Corner, see my blog “Knights Corner: Open source software stack."

A place to discuss more



For up-to-date information on the software stack or Knights Corner documents, the Intel® Many Integrated Core (MIC) Architecture Forum is the place to go. Join us there and we can discuss the future of highly parallel computing!

 

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Eugene Krutov (Intel)'s picture

Hi James,
must be my question will looks stupid, Is there are any plans to port/implement JavaVM for KNC, or only native will be available?

James Reinders (Intel)'s picture

Eugene, Good question, but I may not know enough to answer. I am not aware of all the software customers are doing for Knights Corner. I have not heard of anyone working on JavaVM so far. I have seen Python, several MPI ports and multiple scientific library ports - in additional to tens of millions of lines of applications (Fortran, C and C++). It is early, so maybe someone will do it - or you can when you have a machine!

Eugene Krutov (Intel)'s picture

Wow. Python inside MIC - Cool :)

's picture

What about OpenCL support?

Joseph Pingenot's picture

How different is Knights Corner from standard x86_64?

James Reinders (Intel)'s picture

The keys are the addition of three features beyond what our 64 bit processors offer:
(1) wider vector instructions [512 bits wide, vs. 256 for AVX or 128 for SSE] instead of MMX, SSE or AVX
(2) four threads per core [hyper-threading on Intel Xeon processors have two]
(3) lots more cores [more than 50]

Taking advantage of these is particularly useful for highly parallel programs.

The complete Knights Corner Instruction manual is online at http://software.intel.com/en-us/forums/showthread.php?t=105443

James Reinders (Intel)'s picture

OpenCL is in our future across all our products. We have released CPU support and support for graphics on Ivy Bridge. We have not yet announced Knights Corner support, and I can't elaborate other than say "stay tuned." Please let us know what you would like to see. We are very interested in input.

Joseph Pingenot's picture

But it does all the legacy x86 stuff, e.g. real mode?

's picture

Have you considered ispc (http://ispc.github.com/) for as a vector (SPMD) compiler for this architecture? I say this since I get the following when using ispc on the supplied ao bench example within its SDK,

[enright@jemez aobench]$ ./ao 10 1920 1080
[aobench ispc]: [7971.015] M cycles (1920 x 1080 image)
Wrote image file ao-ispc.ppm
[aobench ispc + tasks]: [1757.467] M cycles (1920 x 1080 image)
Wrote image file ao-ispc-tasks.ppm
[aobench serial]: [53368.671] M cycles (1920 x 1080 image)
(6.70x speedup from ISPC, 30.37x speedup from ISPC + tasks)
Wrote image file ao-serial.ppm

This result compared ispc to gcc (4.6.3) with -O3 thrown (i.e. the autovectorizer operating).

Also, what is the design of the on-chip interconnect? I.e. ring, bus, mesh, etc? I can't seem to find this information anywhere.

Thanks. Doug

James Reinders (Intel)'s picture

RE: But it does all the legacy x86 stuff, e.g. real mode?

Yes, real mode is there. You'll see references to it in the ISA document, although the main documentation sources for most functionality (aka "legacy") are the standard documents from Intel that generally apply across our products. The ISA document for Knights Corner documents the additions or changes only.

As you would expect, consistent with other instructions added since the Intel 80286, the new vector instructions are not supported in real-address mode. Segment registers and 512-bit vectors would be an odd combination. It wouldn't take long for Knights Corner vector instructions (512-bits each) to rip through 1M of memory!

Pages