CnC for distributed memory (distCnC)

In principle, every clean CnC program should run with distCnC. Most of the mechanics of data distribution etc. is handled inside the runtime and the programmer does not need to bother about the gory details. Of course, there are a few minor changes needed to make a program distCnC-ready, but once that's done, it will run on distributed CnC as well as on "normal" CnC (decided at runtime).

Inter-process communication

Conceptually, distCnC allows data and computation distribution across any kind of network; but currently only a socket-based communication model is provided. Support for KNF or MPI might follow.

Linking for distCnC

distCnC is part of the "normal" CnC distribution, e.g. it comes with the necessary communication libraries (cnc_socket). The communication library is loaded on demand at runtime, hence you do not need to link against extra libraries to create distCnC-ready application. Just link your binaries like a "tradiditional" CnC application (explained in the CnC User Guide, which can be found in the doc directory).

Making your program distCnC-ready

As a distributed version of a CnC program needs to do things which are not required in a shared memory version, the extra code for distCnC is hidden from "normal" CnC headers. To include the features required for a distributed version you need to
 #include <cnc/dist_cnc.h> 
instead of
 <cnc/cnc.h> 
. If you want to be able to create optimized binaries for shared memory and distributed memory from the same source, you might consider protecting distCnC specifics like this:
    #ifdef _DIST_
    # include <cnc/dist_cnc.h>
    #else
    # include <cnc/cnc.h>
    #endif

In "main", initialize an object CnC::dist_cnc_init< list-of-contexts > before anything else; parameters should be all context-types ever used in the program:

    #ifdef _DIST_
        CnC::dist_cnc_init< my_context_type_1 /*, my_context_type_2, ... */ > _dinit;
    #endif

If and only if your items and/or tags are non-standard data types, the compiler will notify you about the need for serialization capability. Serialization of structs/classes without pointers or virtual functions can easily be enabled using CNC_BITWISE_SERIALIZABLE( type ); others need a "serialize" method or function. See Serialization for more details.

Step instances are distributed across clients and the host. By default, they are distributed in a round-robin fashion. Note that every process can put tags (and so prescribe new step instances). The round-robin distribution decision is made locally on each process (not globally).

If the same tag is put multiple times, it might be scheduled for execution on different processes and the preserveTags attribute of tag_collections will then not have the desired effect.

To potentially get around this problem, you can provide a tuner when prescribing steps with a CnC::tag_collection. The tuner should be derived from CnC::default_tuner and provide a method called "pass_on", which takes the tag of the step and the context as arguments and has to return the process number to run the step on. To avoid the aforementioned problem, you simply need to make sure that the return value is independent of the process it is executed on. The pass_on mechanism can be used to provide any distribution strategy which can be computed locally.

    struct my_tuner : public CnC::default_tuner< tag_type, context_type >
    {
        int pass_on( const tag_type & tag, context_type & ) const { return tag % 4; }
    };

Attention:
Pointers as tags are not yet supported by distCnC. It should be possible to implement a serializable wrapper for pointers, though.

Global variables are evil and must not be used within the execution scope of steps.

"Global" attributes of collections (e.g. size(), iteration, etc.) must not be used while steps are being executed (e.g. within the dynamic scope of step-code).

Running with distCnC over sockets

On application start-up, CnC checks the environment variable "CNC_SOCKET_HOST". If it is set to a number, it will print a contact string and wait for the given number of clients to connect. Usually this means that clients need to be started "manually" as follows: set "CNC_SOCKET_CLIENT" to the given contact string and launch the same executable on the desired machine.

If "CNC_SOCKET_HOST" is is not a number it is interpreted as a name of a script. CnC executes the script twice: First with "-n" it expects the script to return the number of clients it will start. The second invocation is expected to launch the client processes.

There is a sample script "misc/start.sh" which you can use. Usually all you need is setting the number of clients and replacing "localhost" with the names of the machines you want the application(-clients) to be started on. It requires password-less login via ssh. It also gives some details of the start-up procedure. For windows, the script "start.bat" does the same, except that it will start the clients on the same machine without ssh or alike. Adjust the script to use your preferred remote login mechanism.


Generated on Tue Aug 31 15:30:27 2010 for CnC by  doxygen 1.5.6