External bridge and NFS problems

External bridge and NFS problems

Hi,

I've setup an external bridge for my MICs and they can see IPs on the same subnet that they are on (off the host that they're on).  Which is great news.

However, my NFS server is on a different subnet and the MICs have no routes and no way I can see of configuring them via micctrl.

Based on the idea that they should use the routes of the host RHEL OS, I tried to add an NFS share  (as on another thread on this forum I was told that we can add external NFS mounts if using external bridges).  unfortunately, it didn't work -- it seems like it assumes that the nfs share is on my host RHEL OS as it substitutes the IP for my host :-(

I've changed IPs, but otherwise, this is the output (10.10.10.21 is my guest RHEL, and my MICs are effectively 10.10.10.22 and .23, the NFS server is 10.10.20.74) - remember, I am using genuine, routable IPs, I just don't want to post them publically :-) ;

[root@host mic0]# micctrl --addnfs=10.10.20.74:/vol/myfiler/myshare --dir=/export/myshare
[Warning] Export directory '/vol/myfiler/myshare' does not currently exist
[root@host mic0]# more /var/mpss/mic0/etc/fstab
rootfs          /               auto            defaults                1  1
proc            /proc           proc            defaults                0  0
devpts          /dev/pts        devpts          mode=0620,gid=5         0  0
10.10.10.21:/vol/myfiler/myshare /export/myshare  nfs             nolock          1 1
[root@host mic0]#
 

Any ideas?

Thanks, Sally.

 

22 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It is also strange that having an unsuccessful NFS mount configured that would be the home directory for an account seems to trash the underlying files setup.

To be more explicit, in my case, /export/myshare is actually /export/username and that contains my username's home directory.  When I first setup the MICs, then I did a micctrl --useradd to add username and it's SSH key.  That worked fine and username could SSH in with no password.

After trying to mount username's home directory via NFS and it failing, I remove the NFS share (micctrl --remnfs=/export/myshare) and now, even though the original files are still there under /var/mpss/mic*/export/myshare - including the SSH keys, then they won't get back in the config.

No matter how many times I re-create the mic*.filelist files by using micctrl --resetconfig, they are still not in the filelists.

The only thing I can do is to remove and re-add the username account.

So - the resetconfig option doesn't appear to be working -- something else must be holding on to the information about /export/myshare being an NFS mount, so ignore it in the filelist...but I can't see where...the etc/fstab for each mic (/var/mpss/mic*/etc/fstab) is clear.

Any thoughts on that one as well?

Thanks, Sally.

Hi
You have configured an (CLASS A) 10.x.x.x
This system phi offload seem to drive type RPC delimited on one CALSS (C) by card
I think the problem could be here..

For link external networks added
With respecting model CLASS C multiple this problem could be contoured with  the new bridge
configured correctly  and some (virtual address) are added.
Regards

 

Hi, I'm not sure that's true.  I'm using a genuine class C - I just didn't want to put the IP address range on a public page.  So, I'm not using a class A (10.x.x.x).

The bridge is working correctly for the phi co-processors - they can see everything on the local subnet...including phi coprocessors on another RHEL server.  But they have no route/gateway off the local subnet and I can't see a way to add that.  They don't seem to use the RHEL host's default route either.

I'm not quite clear on what you mean by the phi offload seems to drive type RPC delimited by card.  Can you explain what you mean?

Thanks

Hi
I  understand now that  you hide real addressing network to preserve anonymous.
I remark in your substitution  (and my MICs are effectively 10.10.10.22 and .23)
I don't know the mask that you use,but if it's 255.255.255.0  (22,23) in same class it's strange.
when i read documentation i understand  type class address used when more that one card.
It's:
172.31.2.1
172.31.3.1
172.31.4.1
172.31.5.1
172.31.6.1
etc...

Each card different change  his side (B) for serve an C CLASS.
About offload (with function compiler) (rpc,sockets or (object LTO/IPO (gold elf) imported by linker dependency  external patched)
are managed by an range IP delimited, an CLASS too large class could be decrease performance of the  card.

You can test with bridge dummy virtual card for verify with  one phi card real an five dummy
Require you download and build source (multimac.tar.gz)

./multimac 5  
brctl addbr br0
brctl addif br0 mic0
brctl addif br0 tap0
ifconfig mic0 down
ifconfig mic0 0.0.0.0 up
ifconfig tap0 0.0.0.0 up
ifconfig br0 172.31.1.1  up

ifconfig tap1  172.31.2.1 up
ifconfig tap2  172.31.3.1 up
ifconfig tap3  172.31.4.1 up
ifconfig tap4  172.31.5.1 up
ifconfig tap5  172.31.6.1 up

normally route must  give:

Destination     Passerelle      Genmask         Indic Metric Ref    Use Iface
....
....
....

172.31.1.0     *               255.255.255.0   U     0      0        0 br0
172.31.2.0     *               255.255.255.0   U     0      0        0 tap1
172.31.3.0     *               255.255.255.0   U     0      0        0 tap2
172.31.4.0     *               255.255.255.0   U     0      0        0 tap3
172.31.5.0     *               255.255.255.0   U     0      0        0 tap4
172.32.6.0     *               255.255.255.0   U     0      0        0 tap5

(Add  VL , maybe if **compiler (functions offload)  reject (tapX)  unknown)
ifconfig mic0:0  172.31.2.1
route add -net  172.31.2.0 netmask 255.255.255.0   mic0:0
ifconfig mic0:1  172.31.3.1
route add -net  172.31.3.0 netmask 255.255.255.0   mic0:1
etc ....

(mutimac) each tapX receive an  MAC dissociated.
Sometime,in some cases he could be is supposed to increase  an little bandwidth performance

** ( I write instruction just of  head) here we don't use this card that not sale blank , we have already our management
offload developed.
For substitute I'll test cluster mounted with an dozen GENE-QM87 (Intel processor 4th Generation) I hope it will  sufficient  
effective and could be  accepted by customers to he replace existing with the new hardware is low consumption.
I don't know if storage plugged  using  USB 3 is really performance ,I plan to use one SSD and new SD card USB 3
for all other elements.

About your problem of mount ,if you have all your  configuration  is correct, maybe  use /bin/run-parts --list /etc to see
how mounted all scripts files  already existing.
Without phi card in my hands it's difficult for me to find where could be exactly the problem.

Regards

From what I saw in the documentation -- MPSS 3.1 Users Guide, section 18.1.2 -- with multiple phi cards, the same class C subnet is assumed - otherwise the bridge wouldn't work too well.  You can specify specific IPs for each phi card or just set one and any others will increment the last octet by one.  For example;

micctrl --network=static --bridge=br0 --ip=10.10.10.6

this results in the first phi card getting 10.10.10.6, the second one getting 10.10.10.7, the third one getting 10.10.10.8 etc.

However, like I say - the local network config is working fine.  The phi cards can see everything on the local class C subnet.  They cannot ping anything off the subnet -- they have no route to it.

Also, as someone has mentioned on another forum topic - http://software.intel.com/en-us/forums/topic/488000 - the "micctrl --addnfs" command is broken and always inserting the IP address of the host system.  This makes it impossible to actually add an external NFS server without hacking the files for the mic/phi card and trying again.  I would have done that and been happy with the workaround, but the mic OS can't even ping my NFS server, so when you try this, the mic OS won't even boot as it can't mount the NFS share.

Hi
About (with multiple phi cards, the same class C subnet is assumed)
Two card require normally  two class C distinct  for the   network  could be able to manage an sum of two
ranges address dissociated.
that you read is very strange  , you add the new  card is host ??
Each card supposed work is an network not an host..
Maybe, you have wires SIP connected two card  for using  master slave ???

Exception you divide the mask, you can't  use two network distinct with on an  class unique..
In network convention it's an rule  is know as the wolf is white
Now if system of  this cards work without respect  the conventions i don't know how it could be aligned with
software same Mpi and other external.
With bridge it's possible to  make  all that you want ,even crafts DIY, but for the value of performance resulting
it's another problem.. (it's not to you, it's this for this strange configuration mentioned in your chapter documentation)

It's strange i read your answer but not write in title previous page.
Probably bug browser of  the phone that I have saved   (My wife have confiscated all wire of the computers)
she say that garden is not jungle..
Maybe ,it's 4G network interference with  the lawnmower is on ....

Regards

 

Hi,

I think we've got wires crossed -- this thread/problem is more about how NFS is not working to a different subnet.  Not about the two cards being on the same class C.

Like I've said -- the cards and their network configuration are all working.  They can reach their local subnet.  That is not the problem here.

Thanks,

To back up a bit:

You are running RHEL (version?) on the host

You are using MPSS 3.1 on the coprocessor

You have multiple nodes with multiple coprocessors all on the same class C subnet, all of which are able to talk to each other.

You cannot ping the file server, which is on a different subnet, from any of the coprocessors.

You cannot ping any of the coprocessors from the file server.

You can ping the file server from the hosts.

You can ping the hosts from the file server.

Are these statements all true? And what is the operating system of the file server?

Let's get this working, then we can deal with the second problem - the contents of the directory on the coprocessor disappearing when you disable the nfs mount over that directory.

Hi Frances,

Thanks for the reply - I hope you had a great Thanksgiving!

The answer to the questions is RHEL 6.4 and yes.

On a really positive note, we've been experimenting here and have now got it working.  I'll detail how we've got it done -- my main concern now is that we've had to hack loads of files and if we ever ran "micctrl --resetconfig" then we lose all the changes :-(

The "micctrl --addbridge=... --type=external --ip=..." command seems to get a bit twisted...among the tasks that it does, it creates the following files;

/opt/intel/mic/filesystem/mic0/etc/sysconfig/network/ifcfg-mic0
/opt/intel/mic/filesystem/mic1/etc/sysconfig/network/ifcfg-mic0    <--- yes, really
/opt/intel/mic/filesystem/mic1/etc/sysconfig/network/ifcfg-mic1

The files themselves have duplicate entries for IPADDR and PREFIX and a second GATEWAY that is set to (null).

I've removed the ifcfg-mic0 under the mic1 directory, removed the duplicate fields in the other two files and changed the GATEWAY field to be the proper GATEWAY for my subnet (e.g. 10.10.10.1).

I've also updated the gateway value to the same setting in the /var/mpss/mic*/etc/network/interfaces files for both MICs.

Then (due to the problem with the addnfs only updating with the RHEL host IP), updated the /var/mpss/mic*/etc/hosts to add the name and IP of my NFS server, /var/mpss/mic*/etc/fstab to add the NFS mount, /var/mpss/mic*.filelist to put in the directory to mount.

Then I started the mpss service again (yes, I did stop it before playing with the network settings) and everything works just fine.

Except, like I said above, I can't every run "micctrl --resetconfig" again...nor probably any of the config updates via micctrl.

 

OK, now for the good news -

remember seeing a message when you ran micctrl to add the bridge that said some (unnamed) file would be going away at MPSS 3.2? The file that is going away is those pesky /opt/intel/mic/filesystem/micX/etc/sysconfig/network/ifcfg-micX files.

Let me look and see if there is a fix to the problem of the addnfs option defaulting to the host IP regardless of what you put in the command.

I submitted a premier issue to get the developers to take a look at this IP address problem. I think this is a bug in the command. We'll see what they say.

That's good news about those files going away :-)

There is also the issue when setting up the MICs to use the bridge that they don't take the gateway from the RHEL host O/S, they effectively assume that the RHEL host is a gateway itself and that is generally not going to be true.  This gateway gets set not only in the /opt/intel/mic/* files but also in the ones in /var/mpss.

 

OK, the addnfs problem is a known bug and will be fixed in the MPSS 3.2 release. 

As far as the gateway issue, shouldn't the gateway for the coprocessor be the host? The gateway to the host doesn't know where the coprocessor is, only that it is supposed to send any data packets for the coprocessor to the host for further routing. Sounds like a gateway to me. But then again, networking is not my strong suit, so you can tell me if I am wrong.

With regards to the gateway, the MICs can see anything on the local subnet, so if they know that the gateway is as normal for that subnet (e.g. generally x.x.x.1), then they know the way off the subnet.  If the host is meant to be a gateway, then we need to configure it as a router --not something that is desirable due to security.  I guess it comes down to either being able to specify/override the gateway IP or default to the host if you are configuring it as a router.  Either way requires an update to the code.

 

Hi
@Frances Roth (Intel)
(OK, the add nfs problem is a known bug and will be fixed in the MPSS 3.2 release.)
add nfs is your script wrapped ,your team not answer where is the bug with commands network native Linux denounced clearly.
It's not your team that have wrote NFS system or network Linux where it could justified is wrapped and not explained.
I have difficulty to understand the relation of your update for an simple mount point NFS and the bridge that are part of command Linux.

About class default:

you see in this simple example that the mask default attributed   automatically could  changed with type address used
address range reserved internal network A( 10.0.0.1 -- 10.255.255.254)  B (172.16.0.1 -- 172.31.255.254) C (192.168.0.1 --  192.168.255.254)

ifconfig eth1 192.168.1.1

eth1      Link encap:Ethernet  HWaddr f4:6d:04:4b:a3:8d  
          inet adr:192.168.1.1  Bcast:192.168.1.255  Masque:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interruption:18 Mémoire:fbc00000-fbc20000
 
ifconfig eth1  172.31.1.1

eth1      Link encap:Ethernet  HWaddr f4:6d:04:4b:a3:8d  
          inet adr:172.31.1.1  Bcast:172.31.255.255  Masque:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interruption:18 Mémoire:fbc00000-fbc20000
network registering  by default mask B 255.255.0.0

ifconfig eth1 10.10.10.1
eth1      Link encap:Ethernet  HWaddr f4:6d:04:4b:a3:8d  
          inet adr:10.10.10.1  Bcast:10.255.255.255  Masque:255.0.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interruption:18 Mémoire:fbc00000-fbc20000

Now see the address default in your documentation that is aligned reseved IP on the range  B CLASS

You can add the type mask that you want but sometime it could be interpreted false  by other software ...
All the conventions defined in network must be perfectly respected ..

On an  system  network is unique that  hosted  several network card
An configuration same this one  is  an incoherence
ifconfig eth0 192.168.1.1
ifconfig eth1 192.168.1.2
ifconfig eth2 192.168.1.3

An card Phi that you add on machine XEON serve an network interface...
For the network system of xeon machine it's exactly similar you add card network without an system hosted.
Now, submit question to engineering network if this strange addressing is correc on unique system...

The problem is when you add an Phi card in the Xeon machine it's will be managed an subnetwork for him.
the system that is on the card with his address is considered for him a same an other that is plug with wire external..
Where it's complex with this card it's his network address used for his system ...
The question that must be clearly denounced:
This type of card have two interfaces dissociated ,one to him and one for the system Xeon machine ?
It's used 2 address are dissociated with using IPV6 on the same interface ?
Unique address used and the strange localhost is affected to the Mic cards system ?
it's on an other side ?
The system of this card boot with grub or is kernel stored on an bootloader (u-boot image) ?
the user must know how is mounted his system this is an elementary thing and primary when is used an system Linux
It's here where is required to have an answer before it could be possible to change his type of configuration
with using native command network are not wrapped in your strange script.

To your developer that have wrote two sub lan and system card on same CLASS and in same system.
If you execute command (man exports) you must see network address able to use mask
for the routing table in system XEON side his ethx when he will receive query from the external he could be
not able to identify if share point is on system network of the mic1 or the mic2.
I have perfectly understand he working when two mic are subnet (promiscuous default)for Xeon system but it's not
the case when external network that submit indirectly to his ethx in first step intermediate
it easy to add two vhost or strange bridge to align with dummy IP in esx level for it work side NFS external but with other
softwares other problems more complex will be possible in some case for network XEON..
The network is the part of Linux and must be respected correctly this is not done for added the crafting that able solve your alignment weird that
seems specific in your development ., you are not only one in the system that you use.
On an unique system to use addressing several interfaces on same class is an fault.

About gateway default..
Only system XEON that able to out external have value
If you have pppd on server that manage connection to your provider web access
you must not use gateway for that the IP received from provider could be
correctly routed.
Several gateway in unique server is complex to manage require some programing added ..
in your case that using strange addressing 2 sides require exactly what is the origin of each two IP.
When card don't have his outing directly independent and the bridge used his gateway default have not value.
you have already promiscuous for all other card that are child or considered sub net in your case..
As interpreted by user,maybe it's my poor English but for me it means nothing mainly say , with two card are on the same class
it's even more incomprehensible, Gateway default is only a escape address used when all other are inaccessible,
it's not designed for multiple routing...
Regards

 

@ Sally - Thanks for helping me to understand. I submitted a premier issue about the gateway not being correct. We'll see what they come back with. I also submitted an issue for the files in the user directories disappearing when you put an nfs mount over them and then not coming back when you eliminated the nfs mount. We'll see if they have a good solution for this. The normal behavior would be for the files to continue to exist but just not be visible when you mount over them. I suspect that the MPSS team chose to remove the files in order to save space on the original RAM disk. If you change things so that you nfs mount all your libraries, you don't want the old libraries sitting around on the RAM disk, taking up space.

@bustaf - So many questions. 

I think, once all the configuration files are in place, networking for the coprocessors does behave as expected for a Linux system. It is just that configuring the system gets troublesome. The micctrl configuration options are trying to manage the configuration files for both the host and the coprocessor and are trying to walk a fine line between what works best for the administrator of a single server and what works best for a cluster administrator.  Because all the files for the coprocessor reside on the host until the coprocessor is booted, micctrl cannot just issue the normal network configuration commands but instead must generate the coprocessor files itself. (It could use the standard network configuration commands if you first booted the coprocessor and then ran the commands, but all your changes would disappear when you halted the system.)

Where you refer to our documentation, I am not sure if you are talking about the User Guide for the MPSS or not. If not, I would suggest that you download and read the User Guide for the MPSS from http://software.intel.com/mic-developer. (Select the Tools and Downloads tab, then click on  Software Drivers: Intel® Manycore Platform Software Stack (Intel® MPSS) and scroll down until you find the User Guide for the latest Linux release.) This document is still undergoing a lot of changes and can be difficult to read at times.

Chapter 13 of the User Guide describes the boot process. Basically, at power-up the card initializes and boots a non-Linux kernel whose job is to wait on the PCIe connection for a Linux kernel. The system does not currently use grub. Instead the mpss daemon selects the kernel and hands it off to the program waiting on the PCIe connection. This Linux kernel gets copied over to the coprocessor memory and starts executing and from here on you are up in Linux.

I am having trouble following your statements about the network addresses. Chapter 18 of the User Guide has drawings showing what an internal and external bridge look like. When the coprocessor is running Linux, a virtual ethernet connection is created on each PCIe connection. With an external bridge, the host does not have multiple ethernet IP addresses. It has only one (well, assuming there really is only one physical ethernet port on the host.) The other addresses belong to the virtual ethernet ports on the coprocessors. In this case the addresses for host's ethernet connection and the coprocessors' virtual ethernet connections will all be on the same subnet. With an internal bridge the host has one address for its virtual ethernet connection to the coprocessors and it is on the same subnet as the coprocessors' virtual ethernet addresses but is on a different subnet from the host's physical ethernet address.

As I have said, I am definitely not a networking expert. If I am missing something basic here, perhaps some kind soul can explain what it is I am not seeing.

Hi
Before  (happy Christmas to you and all participant) and thank for your answer that is more
clear for me with my poor English.

If I resume by step the process ..

1] the system of Xeon machine boot his system disposing kernel with modules drivers network for the card
phi inserted in him with also all his interfaces network.
In an repertory of the disk machine of xeon you have the kernel of the card stored passive.

2]In same time you have Phi card with his (firmware) started that wait to receive an
copy of his kernel that is on Xeon machine storage
Now card is able to load his kernel , an Linux system is ready to work on phi card

It's strange that this card don't have  his storage (NOR)passive that could  be able to host his
kernel definitive  ..

Now if you execute ifconfig on system of XEON machine you must see all ethX and all micX supposed.
miX have an address IP that will be considered network interface for system of Xeon machine.
On side network  of the system Phi card that it's an other network
with his modules ,you can add address that you want and solve  with bridge or VL for it will be able
to comunicate with XEON system. problem is not here where is you seems not understand
If you addressing more than one Phi card (that are considered as same simple card network for him)
on same class ,it's not an correct configuration  for  network of XEON machine.
you can add six dozen bridges or VH that you want ,it will not  result
anything for solve this problem.
if command ifconfig on XEON machine show 2 lan in same class it's incorrect for his network
you see how is working  the routing table maybe, you will understand more clear.
Almost  thirty years I worked I have never yet seen such addressing used.

Regards

 

Hi
Now that effect of the champagne is passed..
I add to you  examples concrete he will more clear to you i hope..

Made this two tests ,he will give more light to you

(route add -net 172.31.1.0 netmask 255.255.255.0  reject mic0)

if you have two mic is addressed 172.31.1.1 and 172.31.1.2
now all hosts member 172.31.1.0   are rejected for mic0 and mic1 not dissociated.

Another test  (firwall  Netfilter)among many possible

(iptables -A INPUT -m iprange --src-range 172.31.1.10-172.31.1.45 -j DROP)

With your strange addressing of two cards or more on same class that is given on your
documentation as supposed  conform.
Could you able to show me how to dissociate this range of address users is locked (netfilter) for
only one of the mic is between several.  

An lan network is to manage  range address  users on his  class  potential.
it's not only to answer his address same you seem understand with your reasoning of surfaced
interpreted in your answer.
Remember that the title object question of the user it's  an connection external ,
so routed at a first level by network of the XEON machine.

Regards

 

Sally,

FYI - the problem with the gateway not being set to the host's gateway is now on the known bug list and will be fixed in a future release.

Frances

Hi
@Frances Roth (Intel)
Same always you use the escape with secondary subject  when you are unable to answering to real question.
I think it's rather to your team it's required are routed to gateway for making
upgrade with network conventions (RFC).
Regards

 

The problems with NFS mounts between networks is fixed in MPSS 3.2. The file list is no longer used for Linux, so any issues with NFS mounts messing around with file list are moot.

Leave a Comment

Please sign in to add a comment. Not a member? Join today