Doesn't work Infiniband using IntelMPI

Doesn't work Infiniband using IntelMPI

I gave tried to run parallel program using Intel MPI Library, but when I started likempirun -hosts node01,node02 -np 36 ./progit finished work without start witj an error message node02-ib0:56fa:17a5a440: 705 us(705 us):  open_hca: get lid ERR for mlx4_0 port=2, err=Invalid argumentnode01-ib0:4bd1:ceb2a440: 590 us(590 us):  open_hca: get lid ERR for mlx4_0 port=2, err=Invalid argumentnode02-ib0:56fa:17a5a440: 522 us(522 us):  open_hca: getaddr_netdev ERROR: No such device. Is ib1 configured?node02-ib0:56fa:17a5a440: 7180 us(6475 us):  open_hca: device mthca0 not foundnode02-ib0:56fa:17a5a440: 7403 us(223 us):  open_hca: device mthca0 not foundnode02-ib0:56fa:17a5a440: 7613 us(210 us):  open_hca: device ipath0 not foundnode02-ib0:56fa:17a5a440: 7810 us(197 us):  open_hca: device ipath0 not foundnode02-ib0:56fa:17a5a440: 8128 us(318 us):  open_hca: device ehca0 not foundnode02-ib0:56fa:17a5a440: 1994 us(1472 us):  open_hca: getaddr_netdev ERROR: No such device. Is eth2 configured?node01-ib0:4bd1:ceb2a440: 596 us(596 us):  open_hca: getaddr_netdev ERROR: No such device. Is ib1 configured?node01-ib0:4bd1:ceb2a440: 7904 us(7314 us):  open_hca: device mthca0 not foundnode01-ib0:4bd1:ceb2a440: 8221 us(317 us):  open_hca: device mthca0 not foundnode01-ib0:4bd1:ceb2a440: 8524 us(303 us):  open_hca: device ipath0 not foundnode02-ib0:56fa:17a5a440: 869 us(869 us):  ucm_create_services: ERR Cannot allocate memorynode01-ib0:4bd1:ceb2a440: 8784 us(260 us):  open_hca: device ipath0 not foundnode01-ib0:4bd1:ceb2a440: 9042 us(258 us):  open_hca: device ehca0 not foundnode01-ib0:4bd1:ceb2a440: 2440 us(1844 us):  open_hca: getaddr_netdev ERROR: No such device. Is eth2 configured?node01-ib0:4bd1:ceb2a440: 859 us(859 us):  ucm_create_services: ERR Cannot allocate memoryAPPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)Infiniband devices works, everything seems to be OK. People who were asked said that this error related with permissions to devices, because when I ran it under root it worked well, but when I ran it under common user I had this error. Could you please help me?[root@tisnum-head1 ~]# ibstatCA 'mlx4_0'        CA type: MT26428        Number of ports: 2        Firmware version: 2.9.1000        Hardware version: b0        Node GUID: 0x0002c90300565970        System image GUID: 0x0002c90300565973        Port 1:                State: Down                Physical state: Polling                Rate: 70                Base lid: 0                LMC: 0                SM lid: 0                Capability mask: 0x02510868                Port GUID: 0x0002c90300565971                Link layer: InfiniBand        Port 2:                State: Active                Physical state: LinkUp                Rate: 40                Base lid: 1                LMC: 0                SM lid: 1                Capability mask: 0x0251086a                Port GUID: 0x0002c90300565972                Link layer: InfiniBand[root@tisnum-head1 ~]# lsmodModule                  Size  Used byrdma_ucm               12586  0ib_ucm                 12255  0rdma_cm                35175  1 rdma_ucmiw_cm                   8836  1 rdma_cmib_addr                 6321  1 rdma_cmib_ipoib               84890  0ib_cm                  38085  3 ib_ucm,rdma_cm,ib_ipoibib_sa                  44401  4 rdma_ucm,rdma_cm,ib_ipoib,ib_cmib_uverbs              39637  2 rdma_ucm,ib_ucmib_umad                12477  6iw_nes                192353  0iw_cxgb3              133047  0cxgb3                 196233  1 iw_cxgb3mlx4_ib                80171  0mlx4_en                97664  0mlx4_core             185193  2 mlx4_ib,mlx4_enib_mthca              141407  0ib_mad                 40497  5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthcaib_core                69979  14 rdma_ucm,ib_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_nes,iw_cxgb3,mlx4_ib,ib_mthca,ib_madmpt2sas               173216  0scsi_transport_sas     35070  1 mpt2sasraid_class              4804  1 mpt2sasmptctl                 31976  0mptbase                93845  1 mptctlnfsd                  305799  13lockd                  74270  1 nfsdnfs_acl                 2647  1 nfsdauth_rpcgss            44895  1 nfsdexportfs                4236  1 nfsdautofs4                26888  3ipmi_devintf            8049  0ipmi_si                42401  0ipmi_msghandler        35992  2 ipmi_devintf,ipmi_sisunrpc                243758  26 nfsd,lockd,nfs_acl,auth_rpcgss8021q                  23575  0garp                    7344  1 8021qstp                     2173  1 garpllc                     5642  2 garp,stpipv6                  322029  134 ib_addr,ib_ipoiblibcrc32c               1246  1 iw_nesnls_utf8                1455  1ext3                  235341  1jbd                    80337  1 ext3sg                     30124  0igb                   157825  0dca                     7197  1 igbmicrocode             112594  0sr_mod                 16228  0cdrom                  39771  1 sr_modserio_raw               4818  0amd64_edac_mod         21461  0edac_core              46773  6 amd64_edac_modedac_mce_amd           15488  1 amd64_edac_modi2c_piix4              12608  0i2c_core               31276  1 i2c_piix4shpchp                 33482  0ext4                  364410  3mbcache                 8144  2 ext3,ext4jbd2                   88738  1 ext4sd_mod                 39488  7crc_t10dif              1541  1 sd_modusb_storage            49452  0megaraid_sas           77090  5ata_generic             3837  0pata_acpi               3701  0pata_atiixp             4211  0ahci                   40455  0dm_mirror              14101  0dm_region_hash         12170  1 dm_mirrordm_log                 10122  2 dm_mirror,dm_region_hashdm_mod                 81500  2 dm_mirror,dm_logThank you!--Alexander Kvashnin

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.