Intel® Direct Ethernet Transport

What If Home | Product Overview | Intel® TM ABI specification | Technical Requirements
FAQ | Primary Technology Contacts | Discussion Forum | Blog

Product Overview

The Intel® Direct Ethernet Transport (Intel® DET) project provides two components for faster message passing on commodity Ethernet fabrics.  The first component is a Linux kernel driver and user-mode library providing RDMA/IPC semantics similar to InfiniBand® and iWARP technologies.  The second component is a 1.2 uDAPL library that provides a standardized interface to 3rd party software such as Intel® MPI.  In addition to providing superior message passing performance compared to a traditional TCP/IP socket stack, Intel® DET provides cluster software developers the opportunity to work with RDMA semantics without investing in a specific RDMA technology.

Features and Benefits

With zero copy transmit, a lightweight protocol, and asynchronous queue pair interfaces, message latencies can be significantly reduced compared to the traditional TCP/IP-socket interface on the same Ethernet fabric.  These benefits can be exploited by cluster application writers and message passing libraries such as Intel® MPI.

  • uDAPL 1.2 compatible provider library
  • Thoroughly tested with Intel® MPI and demonstrating message latency improvement by as much as 30% over the smm/socket device
  • Superior scaling compared to TCP/IP
  • Compatible with any Ethernet device using the Linux net_device interface
  • IEEE registered ethertype and coexistence with TCP/IP over the same Ethernet interface
  • Zero copy transmit, single copy receive
  • Application development support through manual pages and header files

 

Technical Requirements

Intel® DET kernel driver:
  • The kernel driver and user-mode library are delivered as source that is compatible with Linux kernel versions 2.6.9 and above.
  • Build process is tailored for RPM distribution for easy cluster distribution
Intel® DET uDAPL Provider Library
  • 64 bit Linux distribution (x86_64)
  • For this technology preview, the uDAPL provider requires a genuine Intel processor.  Attempts to run on non-Intel processors will result in an error message on the standard error device and application exit.

Frequently Asked Questions

Q: Does the Intel® DET kernel driver and protocol co-exist with TCP/IP over the same Ethernet interface?

A: Yes, Intel® DET interfaces to the standard Linux Ethernet driver interface and uses a registered IEEE “ethertype” allowing the kernel to segregate TCP/IP and Direct Ethernet Transport packets.

Q: Is the Intel® DET protocol routable?

A: No, Intel® DET is a layer 2 protocol intended for small to medium sized clusters connected to a common layer 2 sub-net..

Q: Are there any scaling limits?

A: In theory, no.  In practice, the practical size of a cluster is dependent on the communication patterns and the speed of the fabric.  We  have run workload benchmarks over a 1GigE fabric employing 128 process over 64 nodes.  In these runs, Intel® DET showed superior scaling to TCP/IP.

Q: Will Intel® DET work with any Ethernet NIC?

A: Yes, Intel® DET uses the standard Linux Ethernet driver interface.  However, some NIC drivers support interrupt coalescing that can defer interrupts.  For the best performance, the NIC/driver should be configured to generate an interrupt for each packet received packet.

Q: Is the Intel® DET uDAPL provider compatible with OpenFabric distributions?

A: Yes, although the RPM installation will complain about a conflicting package name. Refer to the release notes for installing the provider in an OpenFabric environment.

Q: Can  I use Intel® DET with message passing libraries?

A: Yes, the Intel® DET uDAPL provider can be used with any application or message passing library that is compatible with uDAPL version 1.2.  We have done extensive testing with Intel® MPI.  Refer to the release notes on how to configure the Intel® MPI environment to run with Intel® DET.

 

Please visit the Intel® Direct Ethernet Transport Forum and share your thoughts.

Primary Technical Contacts

Roy Larsenis a software engineer in the Cluster Software Technology Group.  Since joining Intel in 1988, Roy has worked on networking protocols from OSI to the message passing software of the Intel Paragon supercomputers as well as the management network architecture of the worlds first teraflop computer.  His research interests are in RDMA and direct data placement techniques in clusters environments.

Jerrie Coffmanis a software engineer in the Cluster Software and Technology Group.  Jerrie joined Intel in 1982 where his background includes system test, firmware, diagnostics, and device driver development for Intel’s family of supercomputer systems.  In recent years, Jerrie’s research includes the design and implementation of high performance I/O and communication protocols, with emphasis on scalable server technologies.

 

For more complete information about compiler optimizations, see our Optimization Notice.

15 comments

Top
Roy Larsen's picture

hmmm, let's see if this is easier to read....

Netpipe Latency (usec)

Msg Size / tcp / det-dapl
1 / 19.47 / 10.10
16 / 19.48 / 10.16
64 / 19.74 / 10.33
256 / 20.70 / 11.69
1024 / 23.30 / 14.23
4096 / 33.33 / 22.76
16384 / 63.19 / 53.90

Netpipe Bandwidth (Mbits/s)

Msg Size / tcp / det-dapl
65536 / 4031 / 4228
131072 / 4791 / 4990
262144 / 4644 / 5326
524288 / 3999 / 5250
1048576 / 3802 / 5105
2097152 / 3979 / 5026
4194304 / 3763 / 4973

Roy Larsen's picture

I can supply some sample data points for the micro benchmarks NetPIPE which measures one-way latency and bandwidth through a ping-pong message exchange. I have a modified NetPIPE that adds DAPL as a transport interface. The sender uses RDMA writes while the receiver polls memory to determine message arrival. This is similar to how MPI libraries use RDMA transports.

The benchmark was run using 3.6Ghz Xeon CPUs with Intel 82598EB 10Gig Ethernet NICs (Oplin) through a Fujitsu XG-700 switch. The Ethernet driver was configured for immediate interrupt on frame arrival. This should give you a sense of the best case potential for DET.

Netpipe Latency (usec)

Msg Size tcp det/dapl
1 19.47 10.10
16 19.48 10.16
64 19.74 10.33
256 20.70 11.69
1024 23.30 14.23
4096 33.33 22.76
16384 63.19 53.90

Netpipe Bandwidth (Mbits/s)

Msg Size tcp det/dapl
65536 4031 4228
131072 4791 4990
262144 4644 5326
524288 3999 5250
1048576 3802 5105
2097152 3979 5026
4194304 3763 4973

anonymous's picture

Any benchmarck or laboratory test for this project ?
We will implement a new HPC cluster and we would like 10 Gb instead of
IB....if Intel DET works we could use it for interconnect

anonymous's picture

sound good. someone has testing Benchmarks ?

Jeffrey B.'s picture

Very timely, useful and interesting library. The documentation is lacking (thanks for producing man pages but they are not sufficient for trying to learn how to use the API). In addition, the abstraction is weak. I am going to try to put a C++ abstraction over the top of the det user space API so that the code can be made more accessible to developers. With multi-core, 10GbE and rack density improvements having kernel bypass communication is a huge win in terms of latency. I just wish a little more time was put into the documentation that covered the basic design of the API. The ping pong code does a good job, but it would be nice if it was in a real document with some diagrams and some commentary around buffer sizes.

/JMB

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.