Configuring Intel® Xeon Phi™ coprocessors inside a cluster

Author: Michael Hebenstreit
Contributions: Romain Dolbeau, Jeremy C. Siadal
Version: 0.81, 20130110

Abstract

This paper is intended to provide readers a blueprint of how to set up and configure a cluster with systems containing the Intel® Xeon Phi™ Coprocessor, based on how Intel configured its own Endeavor cluster.  Along the way, specific information about how to compile tools, configure filesystems, and setting up network interfaces is shared in great detail to help understand how this can be done en masse.    

To satisfy current standard cluster usage models, where users expect to be able to reach every system that is part of an MPI job via a simple password-less ssh command, and find all the filesystems they expect mounted on every node, some key administrative setup must be performed. 

The solution proposed in this document covers the following features:

  • users access Xeon Phi coprocessors with standard privileges using direct and passwordless ssh
  • the home NFS server is mounted, as well as Lustre* and Panasas* shares
  • use of bridged networking to avoid routing problems
  • automated detection of installed  Intel Xeon Phi coprocessors via lspci
  • USER accounts added to all Intel Xeon Phi coprocessor cards on the system, but no password is set
  • Removal of inetd on the Intel Xeon Phi coprocessors to maximize securityis
  • Correct MTU and NETMASK settings on the Intel Xeon Phi coprocessors Startup of coi_daemon as USER
  • Enhancement of dropbear ssh environment with ulimits
  • Automated startup of OFED Intel Xeon Phi Coprocessor Communication Link (CCL)

Download

Download the complete article (PDF) and code sample files below:

For more complete information about compiler optimizations, see our Optimization Notice.

2 comments

Top
Mahally's picture

I'm sorry, the mpss version was 3.1.2.

Mahally's picture

 

Hi,

I am trying to follow the direction  install mpss-3.2 on a blade (ip 10.10.1.10 nm 255.255.255.0) equipped with Xeon-Phi. The server is run with Centos OS 6.5.  I'd like to configure the co-pro to have ip address 10.10.2.10. I set eth0 to attach with br0 after booting so that the blade so that can be accessed by users though this bridge. But after micctrl --resetconfig I found that the following that mic0 ip always return to the default value (172.31.1.254). I check the configuration with micctrl --config and the result I found..

Host IP 172.31.1 and mic0  172.31.1.254

From /sbin/ifconfig I noticed that eth0 now had no ip address as well as br0. In this condition I can not log in to the server through the network.

Being repeatedly failed to set up as I wish, I tried to configure the mic0 ip address using the server tool (system-config-network-tui) as follow

eth0: 10.10.1.10     br0: 10.10.1.50   mic0:  10.10.2.10.    And after rebooting I found their ip addresses now

eth: 10.10.1.10      br0: 10.10.1.50    mic0: 172.32.1.254  and in using this setting I can log in to the server through the network.

I this condition I check the new cluster after invoking "service mpss start" using miccheck everything is OK.. I also installed OFED and activated the service which is run fine. Surprising however that  miccheck --ssh is fail, despite I can successfully ssh login to the co-pro from login prompt.

What I want to ask, is my setting is allowed, I mean will the server and the copro runs smoothly in future with this setting? We really need your suggestion. Actually I am end-user not computer nor network professional.

Regards,

Mahally, Jakarta (Indonesia)

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.