Communications between MICs and hosts

Communications between MICs and hosts

Hi,

I am evaluating communications between MICs and host using MPI. I found that the bandwidth results are not bad between MICs, especially between two MICs that attached to different hosts (only ~2.45MB/s). Is it expected?

Besides, I notice that there are some lower level communication APIs like SCIF. Will this increase the communication performance in terms of throughput? Normally in what case will people think about using SCIF? I tired to google some sample codes of using SCIF, but only found Intel User Guide. So it seems to me that very few people are really using this.

Thanks!

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

There is probably a typo in your post, because 2.45 MB/s is more of a "not good" rather than "not bad" performance. If you are using only one system, then you should install OFED to activate the virtual Infiniband interface "ibscif", which will speed up MPI communication between the host and local MICs. If you are using multiple systems, then the only way (as of today) to improve MPI performance is to use InfiniBand interconnects and install the corresponding software. Details can be found in this white paper:

http://software.intel.com/en-us/forums/topic/507126

Thanks! That's indeed a typo. 2.45MB/s is extreamly bad.

Do you have any idea about scif?

Quote:

Andrey Vladimirov wrote:

There is probably a typo in your post, because 2.45 MB/s is more of a "not good" rather than "not bad" performance. If you are using only one system, then you should install OFED to activate the virtual Infiniband interface "ibscif", which will speed up MPI communication between the host and local MICs. If you are using multiple systems, then the only way (as of today) to improve MPI performance is to use InfiniBand interconnects and install the corresponding software. Details can be found in this white paper:

http://software.intel.com/en-us/forums/topic/507126

Taylor Kidd (Intel)'s picture

We (Intel) generally discourage using APIs at the SCIF level as it introduces platform dependence, i.e. less portability. A better alternative is COI which provides not only communication but also process functionality, and is optimized for the platform it is implemented on.

Here are some additional references and guidelines:

Regards
--
Taylor

 

 

Frances Roth (Intel)'s picture

The Intel® Xeon Phi™ Coprocessor System Software Developers Guide http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-system-software-developers-guide contains more detailed information on coprocessor/host communication, including SCIF (Symetric Communication Interface). For the most part, processes on the host and coprocessor communicate using SCIF. COI (Coprocessing Offload Infrastructure) is currently built on top of SCIF. MPI, when the OFED updates from the MPSS are not installed, uses SCIF to communicate between the host and coprocess; then communication to other nodes uses the normal network protocols. As Andrey noted, if you have the OFED updates from the MPSS installed (and ofed running), MPI should default to ibscif (InfiniBand SCIF), instead of scif, which can make coprocessor to coprocessor communication on the same node, as well as across nodes, more efficient.

There are some developers who have written code directly using SCIF. However, as Taylor noted, directly using SCIF is not generally a good idea. It is fine to use it to do some testing or to write code which will only be used for a short time but your code might not port to future Intel Xeon Phi products. If you are interested in trying to directly use SCIF, even with these caveats, you might want to look at /usr/share/doc/scif/tutorials/.

Now back to the original question - if I understand what you said, communication between two coprocessors on the same node is poor and communication between two coprocessors on different nodes is worse. In the second case, did you also measure communication between the nodes themselves to see how much of that time is due to network connectivity versus coprocessor to node communication?

Login to leave a comment.