| Last Modified On : | April 21, 2009 3:04 PM PDT |
Rate |
|
| What If Home | Product Overview | Intel® TM ABI specification | Technical Requirements FAQ | Primary Technology Contacts | Discussion Forum | Blog |
The Intel® Direct Ethernet Transport (Intel® DET) project provides two components for faster message passing on commodity Ethernet fabrics. The first component is a Linux kernel driver and user-mode library providing RDMA/IPC semantics similar to InfiniBand® and iWARP technologies. The second component is a 1.2 uDAPL library that provides a standardized interface to 3rd party software such as Intel® MPI. In addition to providing superior message passing performance compared to a traditional TCP/IP socket stack, Intel® DET provides cluster software developers the opportunity to work with RDMA semantics without investing in a specific RDMA technology.
Features and Benefits
With zero copy transmit, a lightweight protocol, and asynchronous queue pair interfaces, message latencies can be significantly reduced compared to the traditional TCP/IP-socket interface on the same Ethernet fabric. These benefits can be exploited by cluster application writers and message passing libraries such as Intel® MPI.
Q: Does the Intel® DET kernel driver and protocol co-exist with TCP/IP over the same Ethernet interface?
A: Yes, Intel® DET interfaces to the standard Linux Ethernet driver interface and uses a registered IEEE “ethertype” allowing the kernel to segregate TCP/IP and Direct Ethernet Transport packets.
Q: Is the Intel® DET protocol routable?
A: No, Intel® DET is a layer 2 protocol intended for small to medium sized clusters connected to a common layer 2 sub-net..
Q: Are there any scaling limits?
A: In theory, no. In practice, the practical size of a cluster is dependent on the communication patterns and the speed of the fabric. We have run workload benchmarks over a 1GigE fabric employing 128 process over 64 nodes. In these runs, Intel® DET showed superior scaling to TCP/IP.
Q: Will Intel® DET work with any Ethernet NIC?
A: Yes, Intel® DET uses the standard Linux Ethernet driver interface. However, some NIC drivers support interrupt coalescing that can defer interrupts. For the best performance, the NIC/driver should be configured to generate an interrupt for each packet received packet.
Q: Is the Intel® DET uDAPL provider compatible with OpenFabric distributions?
A: Yes, although the RPM installation will complain about a conflicting package name. Refer to the release notes for installing the provider in an OpenFabric environment.
Q: Can I use Intel® DET with message passing libraries?
A: Yes, the Intel® DET uDAPL provider can be used with any application or message passing library that is compatible with uDAPL version 1.2. We have done extensive testing with Intel® MPI. Refer to the release notes on how to configure the Intel® MPI environment to run with Intel® DET.
Please visit the What If Software Forum and share your thoughts
Roy Larsen is a software engineer in the Cluster Software Technology Group. Since joining Intel in 1988, Roy has worked on networking protocols from OSI to the message passing software of the Intel Paragon supercomputers as well as the management network architecture of the worlds first teraflop computer. His research interests are in RDMA and direct data placement techniques in clusters environments.
Jerrie Coffman is a software engineer in the Cluster Software and Technology Group. Jerrie joined Intel in 1982 where his background includes system test, firmware, diagnostics, and device driver development for Intel’s family of supercomputer systems. In recent years, Jerrie’s research includes the design and implementation of high performance I/O and communication protocols, with emphasis on scalable server technologies.
| December 7, 2008 6:55 AM PST
terrs | sound good. someone has testing Benchmarks ? |
| May 8, 2009 1:20 AM PDT
Mardel |
Any benchmarck or laboratory test for this project ? We will implement a new HPC cluster and we would like 10 Gb instead of IB....if Intel DET works we could use it for interconnect |
| May 8, 2009 12:12 PM PDT
rklarsen |
I can supply some sample data points for the micro benchmarks NetPIPE which measures one-way latency and bandwidth through a ping-pong message exchange. I have a modified NetPIPE that adds DAPL as a transport interface. The sender uses RDMA writes while the receiver polls memory to determine message arrival. This is similar to how MPI libraries use RDMA transports. The benchmark was run using 3.6Ghz Xeon CPUs with Intel 82598EB 10Gig Ethernet NICs (Oplin) through a Fujitsu XG-700 switch. The Ethernet driver was configured for immediate interrupt on frame arrival. This should give you a sense of the best case potential for DET. Netpipe Latency (usec) Msg Size tcp det/dapl 1 19.47 10.10 16 19.48 10.16 64 19.74 10.33 256 20.70 11.69 1024 23.30 14.23 4096 33.33 22.76 16384 63.19 53.90 Netpipe Bandwidth (Mbits/s) Msg Size tcp det/dapl 65536 4031 4228 131072 4791 4990 262144 4644 5326 524288 3999 5250 1048576 3802 5105 2097152 3979 5026 4194304 3763 4973 |
| May 8, 2009 12:16 PM PDT
rklarsen |
hmmm, let's see if this is easier to read.... Netpipe Latency (usec) Msg Size / tcp / det-dapl 1 / 19.47 / 10.10 16 / 19.48 / 10.16 64 / 19.74 / 10.33 256 / 20.70 / 11.69 1024 / 23.30 / 14.23 4096 / 33.33 / 22.76 16384 / 63.19 / 53.90 Netpipe Bandwidth (Mbits/s) Msg Size / tcp / det-dapl 65536 / 4031 / 4228 131072 / 4791 / 4990 262144 / 4644 / 5326 524288 / 3999 / 5250 1048576 / 3802 / 5105 2097152 / 3979 / 5026 4194304 / 3763 / 4973 |
| May 8, 2009 4:44 PM PDT
rklarsen | I neglected to mention that the MTU size was set to 9000 for the NetPIPE test run. |
| May 12, 2009 9:50 AM PDT
Mardel |
I just see you results today. THXS ! and I'm still analysing and trying to understand it. One more question: Could you have any chance to compare(to have an idea) your results against IB QDR ? Kindest Regards and THXS again, Mardel |
| May 12, 2009 10:43 AM PDT
rklarsen |
No, I don’t have QDR results and I’m not sure what could be gleaned from comparing a 40Gbs HCA to a 10Gbs commodity NIC. However, I do have SDR IB results that were measured on the same platform which are at least more of an apples-to-apples comparison. Netpipe Latency (usec) Msg Size / tcp / det-dapl / ib-dapl(SDR) 1 / 19.47 / 10.10 / 3.41 16 / 19.48 / 10.16 / 3.50 64 / 19.74 / 10.33 / 3.71 256 / 20.70 / 11.69 / 4.68 1024 / 23.30 / 14.23 / 6.27 4096 / 33.33 / 22.76 / 10.64 16384 / 63.19 / 53.90 / 25.15 Netpipe Bandwidth (Mbits/s) Msg Size / tcp / det-dapl / ib-dapl(SDR) 65536 / 4031 / 4228 / 6829 131072 / 4791 / 4990 / 7145 262144 / 4644 / 5326 / 7313 524288 / 3999 / 5250 / 7398 1048576 / 3802 / 5105 / 7444 2097152 / 3979 / 5026 / 7468 4194304 / 3763 / 4973 / 7479 |
| July 18, 2009 9:30 AM PDT
marc |
when I load the module and start det_perf I get marc@mlnx_lab01:~/det-1.1/det-1.1.0/kernel$ ../../examples/det_perf Running as server on eth0 det_open: No such file or directory i checked the src but can not see where /dev/det gets created Marc |
| July 20, 2009 11:28 AM PDT
rklarsen |
Marc, It sounds like loaded the driver manually in which case you'd have to make the /dev/det character device node manually with mknod(1). The preferred way is to invoke the control script which is found in /etc/init.d/det if you did a formal installation of the package. If not, you'll find it at det/usr/det in the tarball. The arguments to the script are [start] | [stop] | [restart]. If you'd still rather do it all manually, you can discover the major device number after the driver is loaded by cating /proc/devices. Roy |
| September 22, 2009 7:07 AM PDT
brian | Thanks a great deal for this, I hope you're finding enough support to continue development. Could you elaborate a bit as to how multicast, and perhaps udp in general is handled? |
| October 13, 2009 10:53 AM PDT
rklarsen |
Brian, I'm sorry for the delayed post. To be clear, the Direct Ethernet Transport does not use IP protocols so I assume you’re asking if something like the InfiniBand unreliable datagram service along with multicast is implemented. The answer is no and we have no plans for implementing it. A reliable connected service is the only one available. Roy |

jmbnyc@gmail.com
50
Status Points:
0
/JMB