Developer Guide


Distributed Processing

You can use the distributed processing mode for neural network training. Approaches to neural network training in the distributed mode are based on the following kinds of parallelization:
  • Data based
  • Model based
  • Hybrid (data+model based)
The library supports data based parallelization.

Data Based Parallelization

The data based parallelization approach has the following features:
  • The training data is split across local nodes.
  • Instances of the same model are used by local nodes to compute local derivatives.
  • The master node updates the weights and biases parameters of the model using the local derivatives and delivers them back to the local nodes.
The library supports the following ways to update the parameters of the neural network model for data based parallelization:
  • Synchronous
    The master node updates the model only after all local nodes deliver the local derivatives for a given iteration of the training.
  • Asynchronous
    The master node:
    • Immediately sends the latest version of the model to the local node that delivered the local derivatives.
    • Updates the model as soon as the master node accumulates a sufficient amount of partial results. This amount is defined by the requirements of the application.


The flow of the neural network model training using data based parallelization involves these steps:
  1. Initialize the neural network model using the
    method on the master node and propagate the model to local nodes.
  2. Run the training algorithm on local nodes as described in the Usage Model: Training and Prediction > Training section with the following specifics of the distributed computation mode:
    • Provide each
      -th node of the neural network with the local data set of
    • Specify the required
    • Split the data set on a local node into
      data blocks, each to be processed by the local algorithm separately.
    • The
      parameters and
      parameters must be the same on all local nodes for synchronous computations and can be different for asynchronous computations.
    See the figure below to visualize an
    -th iteration, corresponding to the
    -th data block. After the computations for the
    -th data block on a local node are finished, send the derivatives of local weights and biases to the master node.
    The training algorithm on local nodes does not require an optimization solver.
  3. Run the training algorithm on the master node by providing the local derivatives from all local nodes. The algorithm uses the optimization solver provided in its
    parameter. For available algorithms, see Optimization Solvers. After the computations are completed, send the updated weights and biases parameters of the model to all local nodes.
    You can get the latest version of the model by calling the
    method after each run of the training algorithm on the master or local node.
  4. Perform computations 2 - 3 for all data blocks. Call the
    method of the trained model on the master to get the model to be used for validation and prediction after the training process is completed.
Neural Network Training Distributed Processing i-th Iteration Workflow

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804