Intel® Direct Ethernet Transport

What If Home | Product Overview | Intel® TM ABI specification | Technical Requirements
FAQ | Primary Technology Contacts | Discussion Forum | Blog

Product Overview

The Intel® Direct Ethernet Transport (Intel® DET) project provides two components for faster message passing on commodity Ethernet fabrics.  The first component is a Linux kernel driver and user-mode library providing RDMA/IPC semantics similar to InfiniBand® and iWARP technologies.  The second component is a 1.2 uDAPL library that provides a standardized interface to 3rd party software such as Intel® MPI.  In addition to providing superior message passing performance compared to a traditional TCP/IP socket stack, Intel® DET provides cluster software developers the opportunity to work with RDMA semantics without investing in a specific RDMA technology.

Features and Benefits

With zero copy transmit, a lightweight protocol, and asynchronous queue pair interfaces, message latencies can be significantly reduced compared to the traditional TCP/IP-socket interface on the same Ethernet fabric.  These benefits can be exploited by cluster application writers and message passing libraries such as Intel® MPI.

  • uDAPL 1.2 compatible provider library
  • Thoroughly tested with Intel® MPI and demonstrating message latency improvement by as much as 30% over the smm/socket device
  • Superior scaling compared to TCP/IP
  • Compatible with any Ethernet device using the Linux net_device interface
  • IEEE registered ethertype and coexistence with TCP/IP over the same Ethernet interface
  • Zero copy transmit, single copy receive
  • Application development support through manual pages and header files

 

Technical Requirements

Intel® DET kernel driver:
  • The kernel driver and user-mode library are delivered as source that is compatible with Linux kernel versions 2.6.9 and above.
  • Build process is tailored for RPM distribution for easy cluster distribution
Intel® DET uDAPL Provider Library
  • 64 bit Linux distribution (x86_64)
  • For this technology preview, the uDAPL provider requires a genuine Intel processor.  Attempts to run on non-Intel processors will result in an error message on the standard error device and application exit.

Frequently Asked Questions

Q: Does the Intel® DET kernel driver and protocol co-exist with TCP/IP over the same Ethernet interface?

A: Yes, Intel® DET interfaces to the standard Linux Ethernet driver interface and uses a registered IEEE “ethertype” allowing the kernel to segregate TCP/IP and Direct Ethernet Transport packets.

Q: Is the Intel® DET protocol routable?

A: No, Intel® DET is a layer 2 protocol intended for small to medium sized clusters connected to a common layer 2 sub-net..

Q: Are there any scaling limits?

A: In theory, no.  In practice, the practical size of a cluster is dependent on the communication patterns and the speed of the fabric.  We  have run workload benchmarks over a 1GigE fabric employing 128 process over 64 nodes.  In these runs, Intel® DET showed superior scaling to TCP/IP.

Q: Will Intel® DET work with any Ethernet NIC?

A: Yes, Intel® DET uses the standard Linux Ethernet driver interface.  However, some NIC drivers support interrupt coalescing that can defer interrupts.  For the best performance, the NIC/driver should be configured to generate an interrupt for each packet received packet.

Q: Is the Intel® DET uDAPL provider compatible with OpenFabric distributions?

A: Yes, although the RPM installation will complain about a conflicting package name. Refer to the release notes for installing the provider in an OpenFabric environment.

Q: Can  I use Intel® DET with message passing libraries?

A: Yes, the Intel® DET uDAPL provider can be used with any application or message passing library that is compatible with uDAPL version 1.2.  We have done extensive testing with Intel® MPI.  Refer to the release notes on how to configure the Intel® MPI environment to run with Intel® DET.

 

Please visit the Intel® Direct Ethernet Transport Forum and share your thoughts.

Primary Technical Contacts

Roy Larsenis a software engineer in the Cluster Software Technology Group.  Since joining Intel in 1988, Roy has worked on networking protocols from OSI to the message passing software of the Intel Paragon supercomputers as well as the management network architecture of the worlds first teraflop computer.  His research interests are in RDMA and direct data placement techniques in clusters environments.

Jerrie Coffmanis a software engineer in the Cluster Software and Technology Group.  Jerrie joined Intel in 1982 where his background includes system test, firmware, diagnostics, and device driver development for Intel’s family of supercomputer systems.  In recent years, Jerrie’s research includes the design and implementation of high performance I/O and communication protocols, with emphasis on scalable server technologies.

 

For more complete information about compiler optimizations, see our Optimization Notice.

15 comments

Top
oran j.'s picture

Is Fedora 20 supported? 

stardust496's picture

Hello,

I have just downloaded det and have started looking at it. I am not sure how it ties together - It seems like the send involves a context switch to the kernel but a zero copy send while the receive involves an interrupt to the kernel followed by a notification to the user space about new data. Is it possible to use it like in the infiniband world - busy polling and read directly without involving the kernel? (And also send directly without context switch to the kernel)

Thanks!
MK

anonymous's picture

it's very good test aboat g41 board thanks

Roy Larsen (Intel)'s picture

Brian,

I'm sorry for the delayed post. To be clear, the Direct Ethernet Transport does not use IP protocols so I assume you’re asking if something like the InfiniBand unreliable datagram service along with multicast is implemented. The answer is no and we have no plans for implementing it. A reliable connected service is the only one available.

Roy

anonymous's picture

Thanks a great deal for this, I hope you're finding enough support to continue development. Could you elaborate a bit as to how multicast, and perhaps udp in general is handled?

Roy Larsen (Intel)'s picture

Marc,

It sounds like loaded the driver manually in which case you'd have to make the /dev/det character device node manually with mknod(1). The preferred way is to invoke the control script which is found in /etc/init.d/det if you did a formal installation of the package. If not, you'll find it at det/usr/det in the tarball. The arguments to the script are [start] | [stop] | [restart]. If you'd still rather do it all manually, you can discover the major device number after the driver is loaded by cating /proc/devices.

Roy

anonymous's picture

when I load the module and start det_perf I get

marc@mlnx_lab01:~/det-1.1/det-1.1.0/kernel$ ../../examples/det_perf

Running as server on eth0
det_open: No such file or directory

i checked the src but can not see where /dev/det gets created

Marc

Roy Larsen (Intel)'s picture

No, I don’t have QDR results and I’m not sure what could be gleaned from comparing a 40Gbs HCA to a 10Gbs commodity NIC. However, I do have SDR IB results that were measured on the same platform which are at least more of an apples-to-apples comparison.

Netpipe Latency (usec)

Msg Size / tcp / det-dapl / ib-dapl(SDR)
1 / 19.47 / 10.10 / 3.41
16 / 19.48 / 10.16 / 3.50
64 / 19.74 / 10.33 / 3.71
256 / 20.70 / 11.69 / 4.68
1024 / 23.30 / 14.23 / 6.27
4096 / 33.33 / 22.76 / 10.64
16384 / 63.19 / 53.90 / 25.15

Netpipe Bandwidth (Mbits/s)

Msg Size / tcp / det-dapl / ib-dapl(SDR)
65536 / 4031 / 4228 / 6829
131072 / 4791 / 4990 / 7145
262144 / 4644 / 5326 / 7313
524288 / 3999 / 5250 / 7398
1048576 / 3802 / 5105 / 7444
2097152 / 3979 / 5026 / 7468
4194304 / 3763 / 4973 / 7479

anonymous's picture

I just see you results today. THXS ! and I'm still analysing and trying to understand it.
One more question:
Could you have any chance to compare(to have an idea) your results against IB QDR ?

Kindest Regards and THXS again,

Mardel

Roy Larsen (Intel)'s picture

I neglected to mention that the MTU size was set to 9000 for the NetPIPE test run.

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.