Trying to get CNC to work on Sockets

Trying to get CNC to work on Sockets

Hi, I've been trying to get CNC to work on sockets using the script that's provided but the lack of documentation to get it to work just doesn't help. I use a CentOS machine with name etas333-0X-git.host.ualr.edu with X being any number from 1-8. I know it says to change the number of clients and my_host names but that don't help if I'm using multiple clients. I know I'm doing something wrong. Can anyone give an example and the command to run it! Thanks! This is the first time this school is using DistCNC

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I got this far. Can anyone tell me what the issue is? 

env CNC_SOCKET_HOST=start.sh `pwd`/distmatrix_inverse 100
Loading cnc_socket...done.
sh: start.sh: command not found
[CnC] runtimes/cnc_api/src/dist/socket_comm/SocketHostInitializer.cpp:119 *** given script does not specify the number of clients via -n option.
    start.sh
, aborting execution.
Aborted

 

start.sh file

 # !!! Adjust the number of expected clients here !!!
_N_CLIENTS_=2
#__my_xterm="env DISPLAY=$DISPLAY xterm -e"
#__my_debugger="gdb --args"

mode=$1
contactString=$2

 # Special mode: query number of clients
if [ "$mode" = "-n" ]; then
    echo $_N_CLIENTS_
    exit 0
fi

# Normal mode: start client process

: ${CNC_NUM_THREADS:=4}
: ${CNC_SCHEDULER:=TBB_TASK}

__my_exe="$CNC_SOCKET_HOST_EXECUTABLE"
__my_wdir=`pwd`

# determine hostname
if [ -r "$HOST_FILE" ]; then
    echo "Using hostfile $HOST_FILE..." 1>&2
    __thishost=`hostname`
    __nhosts=`grep -v -e '^$' $HOST_FILE | grep -v $__thishost | wc -l`
    __hostId=$(($mode % $__nhosts))
    if [ "$__hostId" = "0" ]; then
        __hostId=$__nhosts
    fi
    __my_host=`cat $HOST_FILE | grep -v -e '^$' | grep -v $__thishost | head -n $__hostId | tail -n 1`
else
    __my_host=etas333-02-git
fi

_CLIENT_CMD1_="ssh $__my_host 'cd $__my_wdir && $__my_xterm env CNC_NUM_THREADS=$CNC_NUM_THREADS env CNC_SOCKET_CLIENT=$contactString env DIST_CNC=SOCKETS env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $__my_debugger $__my_exe'"

# start client process
#echo $_CLIENT_CMD1_
eval exec $_CLIENT_CMD1_

 

ok i got it to work with one client. Now i'm working on getting it to work with multiple clients

I am glad you could solve the issue. It seems all you needed was adding "./" to the script:

env CNC_SOCKET_HOST=./start.sh

should do the job.

Using MPI is usually easier as the MPI runtime takes care of all those tedious details.

Let us know how it goes.

 

frank

How do you add multiple clients to replace "localhost". Been trying to figure that out to no avail. 

When using the script provided with the package, simply create a file listing all hosts

etas333-01-git.host.ualr.edu
etas333-02-git.host.ualr.edu
etas333-03-git.host.ualr.edu
etas333-04-git.host.ualr.edu
etas333-05-git.host.ualr.edu
etas333-06-git.host.ualr.edu
etas333-07-git.host.ualr.edu
etas333-08-git.host.ualr.edu

and set env var HOST_FILE to the file you've created. Like this

env HOST_FILE=./<file> CNC_SOCKET_HOST=./start.sh `pwd`/distmatrix_inverse 100

 

Ok thanks!!

It doesn't seem to work. This is what I got:

env HOST_FILE=./host_file CNC_SOCKET_HOST=./start.sh /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse/distmatrix_inverse 100

Loading cnc_socket...done.

starting clients via script:

./start.sh <client_id> 0:1026_111@144.167.99.101

Using hostfile ./host_file...

Using hostfile ./host_file...

ssh etas333-02-git 'cd /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse && env CNC_NUM_THREADS=-1 env CNC_SOCKET_CLIENT=0:1026_111@144.167.99.101 env DIST_CNC=SOCKETS env LD_LIBRARY_PATH=/mnt/sw/cnc/0.8/lib/intel64:/mnt/sw/tbb/tbb41_20130613oss/build/linux_intel64_gcc_cc4.5.3_libc2.5_kernel2.6.18_release:/mnt/sw/gcc/4.5.3/lib64:/mnt/sw/gcc/4.5.3/lib /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse/distmatrix_inverse'

ssh etas333-03-git 'cd /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse && env CNC_NUM_THREADS=-1 env CNC_SOCKET_CLIENT=0:1026_111@144.167.99.101 env DIST_CNC=SOCKETS env LD_LIBRARY_PATH=/mnt/sw/cnc/0.8/lib/intel64:/mnt/sw/tbb/tbb41_20130613oss/build/linux_intel64_gcc_cc4.5.3_libc2.5_kernel2.6.18_release:/mnt/sw/gcc/4.5.3/lib64:/mnt/sw/gcc/4.5.3/lib /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse/distmatrix_inverse'

/home/rjmoulliet/.bashrc: line 7: /mnt/sw/Modules//bin/modulecmd: No such file or directory

/home/rjmoulliet/.bashrc: line 7: /mnt/sw/Modules//bin/modulecmd: No such file or directory

--> established socket connection 1, 1 still missing ...

--> established all socket connections to the host.

--> establishing client connections to client 1 ... done

Generating matrix of size 100

Floating point elements per matrix: 100 x 100

Floating point elements per tile: 90 x 90

tiles per matrix: 2 x 2

dim(2) size(100)

tiles created 4 tiles deleted 0 tiles remaining 4

resident memory MB 2.47398   increment MB 2.47398

Invert serially

Serial Total Time: 0.010949 sec

Floating-point operations executed: 0.002000 billion

Floating-point operations executed per unit time:   0.18 billions/sec

tiles created 12 tiles deleted 8 tiles remaining 4

resident memory MB 3.05152   increment MB 0.577536

tiles created 5 tiles deleted 1 tiles remaining 4

resident memory MB 3.31776   increment MB 0.26624

Invert CnC steps

distmatrix_inverse: matrix_inverse.cpp:512: my_tuner::my_tuner(int): Assertion `_np == 1 || ( ntiles * ntiles ) % _np == 0' failed.

Loading cnc_socket...done.

Aborted

ERROR: [rjmoulliet@etas333-01-git matrix_inverse]$ connection closed by peer, receiving remaining 8 of 8 bytes failed

[CnC] runtimes/cnc_api/src/dist/generic_comm/GenericCommunicator.cpp:208 Connection Error, aborting execution.

Loading cnc_socket...done.

ERROR: connection closed by peer, receiving remaining 8 of 8 bytes failed

[CnC] runtimes/cnc_api/src/dist/generic_comm/GenericCommunicator.cpp:208 Connection Error, aborting execution.

bash: line 1: 11700 Aborted                 env CNC_NUM_THREADS=-1 env CNC_SOCKET_CLIENT=0:1026_111@144.167.99.101 env DIST_CNC=SOCKETS env LD_LIBRARY_PATH=/mnt/sw/cnc/0.8/lib/intel64:/mnt/sw/tbb/tbb41_20130613oss/build/linux_intel64_gcc_cc4.5.3_libc2.5_kernel2.6.18_release:/mnt/sw/gcc/4.5.3/lib64:/mnt/sw/gcc/4.5.3/lib /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse/distmatrix_inverse

bash: line 1:  9380 Aborted                 env CNC_NUM_THREADS=-1 env CNC_SOCKET_CLIENT=0:1026_111@144.167.99.101 env DIST_CNC=SOCKETS env LD_LIBRARY_PATH=/mnt/sw/cnc/0.8/lib/intel64:/mnt/sw/tbb/tbb41_20130613oss/build/linux_intel64_gcc_cc4.5.3_libc2.5_kernel2.6.18_release:/mnt/sw/gcc/4.5.3/lib64:/mnt/sw/gcc/4.5.3/lib /home/rjmoulliet/cnc0.8/matrix_inverse/matrix_inverse/distmatrix_inverse

The files work now. As I was told Frank said that size must be a multiple of the tile size 90. Through my tests I've seen that if NO_CONSUMED_ON is undefined, odd multiples don't work properly (ie an odd multiple of 90 such as 90x5 =450). With it defined any multiple will work.

Login to leave a comment.