Manage Deep Learning Networks with Caffe* Optimized for Intel® Architecture

Summary

Caffe* is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC). It is written in C++ and CUDA* C++ with Python* and MATLAB* wrappers. It is useful for convolutional neural networks, recurrent neural networks, and multi-layer preceptrons. There are various forks of the main Caffe branch that support detection and classification, segmentation, and Spark compatible, among others.

Caffe optimized for Intel® architecture is currently integrated with the latest release of Intel® Math Kernel Library (Intel® MKL) 2017 optimized for Advanced Vector Extensions (AVX)-2 and AVX-512 instructions which are supported in Intel® Xeon® and Intel® Xeon Phi™ processors (among others). That is, Caffe optimized for Intel® architecture contains all the goodness found in BVLC Caffe and in addition, runs efficiently on Intel architecture and can be used for distributed training across various nodes. This tutorial describes how to build Caffe optimized for Intel architecture, train deep network models using one or more compute nodes, and deploy networks. In addition, various functionalities of Caffe are explored in detail including how to fine-tune, extract and view features of different models, and use the Caffe Python API.

Vocabulary use:

  • weights - also known as kernels, filters, templates, or feature extractors
  • blob - also known as tensor - an N dimensional data structure, that is, an N-D tensor, that contains data, gradients, or weights (including biases)
  • units - also known as neurons - performs a non-linear transformation on a data blob
  • feature maps - also known as channels
  • testing - also known as inference, classification, scoring, or deployment
  • model - also known as topology or architecture

A fast way to become familiar with Caffe is:

Note that the content of this article is based in part in this blog.

Installation

The following instructions apply to Ubuntu* 14.04. Similar instructions for other Linux* or OS *X operating systems or Ubuntu versions can be found in BVLC's Caffe installation website. Get dependencies: (Note that when you scroll your mouse over the code, three icons appear. Click on the "view source" icon to view the code without the line numbers.)

sudo apt-get update &&
sudo apt-get -y install build-essential git cmake &&
sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev &&
sudo apt-get -y install libopencv-dev libhdf5-serial-dev protobuf-compiler &&
sudo apt-get -y install --no-install-recommends libboost-all-dev &&
sudo apt-get -y install libgflags-dev libgoogle-glog-dev liblmdb-dev &&
sudo apt-get -y install libatlas-base-dev

For Ubuntu 16.04 linked the following libraries:

find . -type f -exec sed -i -e 's^"hdf5.h"^"hdf5/serial/hdf5.h"^g' -e 's^"hdf5_hl.h"^"hdf5/serial/hdf5_hl.h"^g' '{}' ;
cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

On CentOS* 7 install the dependencies as follows:

sudo yum -y update &&
sudo yum -y groupinstall "Development Tools" &&
sudo yum -y install wget cmake git &&
sudo yum -y install protobuf-devel protobuf-compiler boost-devel &&
sudo yum -y install snappy-devel opencv-devel atlas-devel &&
sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel

# The following steps are only required if some packages failed to install
# add EPEL repository then install missing packages
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -ivh epel-release-latest-7.noarch.rpm
sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel &&
sudo yum -y install protobuf-devel protobuf-compiler boost-devel

# if packages are still not found--download and install/build the packages, e.g.,
# snappy:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm
sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm
# atlas:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm
sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm
# opencv:
wget https://github.com/Itseez/opencv/archive/2.4.13.zip
unzip 2.4.13.zip
cd opencv-2.4.13/
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local ..
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make all -j $NUM_THREADS
sudo make install -j $NUM_THREADS

# optional (not required for Caffe)
# other useful repositories for CentOS are RepoForge and IUS:
wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm
sudo rpm -Uvh rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm
wget https://rhel7.iuscommunity.org/ius-release.rpm
sudo rpm -Uvh ius-release*.rpm

Reasons for dependencies (source):

  • boost: a C++ library used for its math functions and shared pointer
  • glog, gflags: provides logging and command line utilities. Essential for debugging
  • leveldb, lmdb: database IO. Use for preparing your own data
  • protobuf: used to efficiently define data structure
  • BLAS (Basic Linear Algebra Subprograms): operations such as matrix multiplication, matrix addition, provided by Intel® Math Kernel Library (Intel® MKL), ATLAS*, openBLAS*, and so forth

The Caffe installation guide states: Install "MKL for better CPU performance."

For best performance, use Intel® Math Kernel Library (Intel® MKL) 2017, available for free as a Beta in Intel® Parallel Studio XE 2017 Beta. Intel MKL 2017 production release also known as gold release will be available September 2016.

Alternatively, Intel MKL 11.3.3 (the 2016 version) can be downloaded and installed. To download it, first register for a free community license and follow the installation instructions.

Once installed, the correct environment libraries can be set as follows (the path may need to be modified):

echo 'source /opt/intel/bin/compilervars.sh intel64' >> ~/.bashrc
# alternatively edit <mkl_path>/mkl/bin/mklvars.sh replacing INSTALLDIR in
# CPRO_PATH=<INSTALLDIR> with the actual mkl path: CPRO_PATH=<full mkl path>
# echo 'source <mkl path>/mkl/bin/mklvars.sh intel64' >> ~/.bashrc

Clone and prepare Caffe optimized for Intel architecture for compiling as follows:

cd ~
# For BVLC caffe use:
# git clone https://github.com/BVLC/caffe.git
# For intel caffe use:
git clone https://github.com/intel/caffe.git 
cd caffe
echo "export CAFFE_ROOT=`pwd`" >> ~/.bashrc
source ~/.bashrc
cp Makefile.config.example Makefile.config
# Open Makefile.config and modify it (see comments in the Makefile)
vi Makefile.config

Edit the Makefile.config:

# To run on CPU only and to avoid installing CUDA installers, uncomment
​CPU_ONLY := 1

# To use MKL, replace atlas with mkl as follows
# (make sure that the BLAS_DIR and BLAS_LIB paths are correct)
BLAS := mkl
BLAS_DIR := $(MKLROOT)/include
BLAS_LIB := $(MKLROOT)/lib/intel64

# To use MKL2017 DNN primitives as the default engine, uncomment
# (however leave it commented if using multinode training)
# USE_MKL2017_AS_DEFAULT_ENGINE := 1

# To customized compiler choice, uncomment and set the following
# CUSTOM_CXX := g++

# To train on multinode uncomment and verify path
# USE_MPI := 1
# CXX := /usr/bin/mpicxx

If using Ubuntu 16.04, edit the Makefile:

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

and create symlinks:

cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so 

If using CentOS 7 and ATLAS (instead of the recommended MKL library), edit the Makefile:

# Change this line
LIBRARIES += cblas atlas
# to
LIBRARIES += satlas

Build Caffe optimized for Intel architecture:

NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS
# To save the output stream to file makestdout.log use this instead
# make -j $NUM_THREADS 2>&1 | tee makestdout.log

An alternative to the steps above is to use cmake:

mkdir build
cd build
cmake -DCPU_ONLY=on -DBLAS-mkl -DUSE_MKL2017_AS_DEFAULT_ENGINE=on /path/to/caffe
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS

Install Python dependencies:

# These steps are OPTIONAL but highly recommended to use the Python interface
sudo apt-get -y install gfortran python-dev python-pip
cd ~/caffe/python
for req in $(cat requirements.txt); do sudo pip install $req; done
sudo pip install scikit-image #depends on other packages
sudo ln -s /usr/include/python2.7/ /usr/local/include/python2.7
sudo ln -s /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ \
  /usr/local/include/python2.7/numpy
cd ~/caffe
make pycaffe -j NUM_THREADS
echo "export PYTHONPATH=$CAFFE_ROOT/python" >> ~/.bashrc
source ~/.bashrc

Other installation options:

# These steps are OPTIONAL to test caffe
make test -j $NUM_THREADS
make runtest #"YOU HAVE <some number> DISABLED TESTS" output is OK

# This step is OPTIONAL to disable cam hardware OpenCV driver
# alternatively, the user can skip this and ignore the harmless 
# libdc1394 error that may occasionally appears
sudo ln /dev/null /dev/raw1394

Data layer

This section is optional and discusses the various data types; understanding it is not required to start using Caffe. It may be useful if you plan to use data in differing formats. The material in this section is based on this and this tutorial, and src/caffe/proto/caffe.proto.

Data enters Caffe through data layers, which lie at the bottom of nets and are defined in a prototxt file. More information on prototxt files is in the Training section. Data can come from efficient databases (LevelDB or LMDB), directly from memory, or, when efficiency is not critical, from files on disk in HDF5 or common image formats.

Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) transformations are available by specifying transform_params (not supported in all data types, for example, HDF5 does not support this). If the required data transformations are performed beforehand, it is not necessary to use this option in the data layer. Common data transformations can be performed as follows:

  transform_param {
    # randomly horizontally mirror the image
    mirror: 1
    # crop a `crop_size` x `crop_size` patch:
    # - at random during training
    # - from the center during testing
    crop_size: 227
    # substract mean value: these mean_values can equivalently be replaced with a mean.binaryproto file as
    # mean_file: name_of_mean_file.binaryproto
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }

In this example the images are cropped, mirrored, and have the mean subtracted. For other available general data transformations see src/caffe/proto/caffe.proto under message TransformationParameter.

Data

Lightning Memory-Mapped Databases (LMDB) and LevelDB database formats can be efficiently process as input data. They are only good for 1-of-k classification. These are the recommended data formats for 1-of-k classification due to Caffe's efficiency in reading the dataset.

data_params

Required

  • source: the name of the directory containing the database
  • batch_size: the number of inputs to process at one time

Optional

  • backend [default LEVELDB]: choose whether to use a LEVELDB or LMDB
  • rand_skip: skip this number of inputs at the beginning. This can be useful for async sgd

For other available data layer transformations see src/caffe/proto/caffe.proto under message DataParameter.

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: 1
    crop_size: 227
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  data_param {
    source: "examples/imagenet/ilsvrc12_train_lmdb"
    batch_size: 32
    backend: LMDB
  }
}

It is common but not required to have the same name for the layer and the top blob coming out of the layer; that is, in the prototxt files in each layer, name and top are usually the same.

Alternatively, the mean can be subtracted by passing a mean image and replacing all there mean_value lines with one mean_file: "data/ilsvrc12/imagenet_mean.binaryproto". This binaryproto file can be created from an LMDB dataset as follows:

cd ~/caffe
build/tools/compute_image_mean examples/imagenet/ilsvr12_train_lmdb 
data/ilsvrc12/imagenet_mean.binaryproto

replacing the examples/imagenet/ilsvr12_train_lmdb and data/ilsvrc12/imagenet_mean.binaryproto with the appropriate lmdb folder and desired binaryproto file, respectively.

ImageData

Get images and labels directly from image files.

image_data_params

Required

  • source: the name of the text file containing the path of the data inputs and labels

Optional

  • batch_size [default 1]: the number of inputs to process at one time
  • new_height [default 0]: resizes the height by warping height to this value; this is ignored if set to 0
  • new_width [default 0]: resizes the width by warping width to this value; this is ignored if set to 0
  • shuffle [default 0]: shuffles the data; this is ignored if set to 0
  • rand_skip [default 0]: skip this number of inputs at the beginning; maybe useful for async sgd

For other available image data transformation, see src/caffe/proto/caffe.proto under message ImageDataParameter.

In this example the images are shuffled, cropped, mirrored, and have the mean subtracted.

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  image_data_param {
    source: "/path/to/file/train.txt"
    batch_size: 32
    shuffle: 1
  }
}

Note that the text file has the image file names and corresponding labels. For example, "train.txt" looks like

/path/to/images/img3423.jpg 2
/path/to/images/img3424.jpg 13
/path/to/images/img3425.jpg 8
...

Input

Uses a blob of zeros as input data with the dimensions specified. This is usually used to time the forward and backward propagations. More information on timing a network is at the end of the Training section.

input_params

Required

  • shape: used to define 1 or multiple shapes to top blob(s)
layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 32
      dim: 3
      dim: 227
      dim: 227
    }
  }
}

Equivalently, the layer can be written as:

input: "data"
input_dim: 32
input_dim: 3
input_dim: 227
input_dim: 227

DummyData

Similar to Input except the type of data can be specified. This is usually used for debugging but can also be used to time the forward and backward propagations. Example based on this.

dummy_data_params

Required

  • shape: used to define 1 or multiple shapes to top blob(s)

Optional

  • data_filler [default ConstantFiller with value of 0]: specifies the values used in top blob
layer {
  name: "data"
  type: "DummyData"
  top: "data"
  include {
    phase: TRAIN
  }
  dummy_data_param {
    data_filler {
      type: "constant"
      value: 0.01
    }
    shape {
      dim: 32
      dim: 3
      dim: 227
      dim: 227
    }
  }
}
layer {
  name: "data"
  type: "DummyData"
  top: "label"
  include {
    phase: TRAIN
  }
  dummy_data_param {
    data_filler {
      type: "constant"
    }
    shape {
      dim: 32
    }
  }
}

In this example there are two data layers, one for each top because the data provided to each top blob must be specified. Note that in Data, ImageData, or HDF5Data data layers, the information on the top blob for the label is in the source file.

MemoryData

The memory data layer reads data directly from memory, without copying it. In order to use it, call MemoryDataLayer::Reset (from C++) or Net.set_input_arrays (from Python) in order to specify a source of contiguous data (as 4D row major array), which is read one batch-sized chunk at a time.

This method can be slow as it may require copying the data into memory prior to using it. However, once in memory it is very efficient.

memory_data_param

Required

  • batch_size, channels, height, width: specify the size of input chunks to read from memory
layers {
  name: "data"
  type: MEMORY_DATA
  top: "data"
  top: "label"
  transform_param {
    crop_size: 227
    mirror: true
    mean_file: "mean.binaryproto"
  }
  memory_data_param {
   batch_size: 32
   channels: 3
   height: 227
   width: 227
  }

HDF5Data

Reads arbitrary data from HDF5 files. Good for any task but only uses FP32 and FP64 data (not uint8), so image data will be huge. Does not allow transform_param. Only use this if necessary.

hdf5_data_param

Required

  • source: the name of the text file containing the path of the data inputs and labels
  • batch_size

Optional

  • shuffle [default false]: shuffle the HDF5 files
layer {
  name: "data"
  type: "HDF5_DATA"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "examples/hdf5_classification/data/train.txt"
    batch_size: 32
  }
}

HDF5DataOutput

The HDF5 output layer performs the opposite function of the other layers in this section; it writes its input blobs to disk.

hdf5_output_param

Required

  • file_name
layer {
  name: "data_output"
  type: "HDF5_OUTPUT"
  bottom: "data"
  bottom: "label"
  include {
    phase: TRAIN
  }
  hdf5_output_param {
    file_name: "output_file.h5"
  }
}

WindowData

Made for detection. Read windows from image files class labels.

window_data_param

Required

  • source: specify the data source
  • mean_file
  • batch_size

Optional

  • mirror
  • crop_size: randomly crop an image
  • crop_mode [default "warp"]: mode of cropping detection window; for example, "warp" warps to fixed size; "square" crops tightest square around the window
  • fg_threshold [default 0.5]: foreground (object) overlap threshold
  • bg_threshold [default 0.5]: background (object) overlap threshold
  • fg_fraction [default 0.25]: fraction of batch that should be foreground objects
  • context_pad [default 10]: amount of contextual padding around a window

For other available window data transformation, see src/caffe/proto/caffe.proto under message WindowDataParameter.

layers {
  name: "data"
  type: "WINDOW_DATA"
  top: "data"
  top: "label"
  window_data_param {
    source: "/path/to/file/window_train.txt"

    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
    batch_size: 128
    mirror: true
    crop_size: 227
    fg_threshold: 0.5
    bg_threshold: 0.5
    fg_fraction: 0.25
    context_pad: 16
  }
}

Dataset preparation

The recommended data format for 1-of-k classification is LMDB. In order to use Caffe's tools to make LMDBs from exthe following are required:

  • A folder with the data
  • The output folders, for example, mydataset_train_lmdb, must be non-existent
  • A text file with the image file names and corresponding labels, for example, "train.txt" looks like
img3423.jpg 2
img3424.jpg 13
img3425.jpg 8
...

Note that if the data is dispersed in various folders, train.txt can contain the full path to the data points.

The create_label_file.py is a simple script that creates a training and validation text file for Kaggle's Dog vs Cats competition and can easily be adapted to other tasks.

Note that in testing we assume that the labels are missing. If labels are available these same steps can be applied to prepare an LMDB test dataset.

Preparing data with three channels (for example, RGB images)

The example below (based on this) produces a training LMDB, and requires train.txt. It runs from the $CAFFE_ROOT directory.

#!/usr/bin/env sh
# folder containing the training and validation images
TRAIN_DATA_ROOT=/path/to/training/images

# folder containing the file with the name of training images
DATA=/path/to/file
# folder for the lmdb datasets
OUTPUT=/path/to/output/directory
TOOLS=/path/to/caffe/build/tools

# Set to resize the images to 256x256
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
echo "Creating train lmdb..."

# Delete the shuffle line if shuffle is not desired
GLOG_logtostderr=1 $TOOLS/convert_imageset 
    --resize_height=$RESIZE_HEIGHT 
    --resize_width=$RESIZE_WIDTH 
    --shuffle 
    $TRAIN_DATA_ROOT/ 
    $DATA/train.txt 
    $OUTPUT/mydataset_train_lmdb
echo "Done."

Computing the mean of the images in an LMDB dataset:

#!/usr/bin/env sh
# Compute the mean image in lmdb dataset
OUTPUT=/path/to/output/directory

 # folder for the lmdb datasets and output for mean image
TOOLS=/path/to/caffe/build/tools

$TOOLS/compute_image_mean $OUTPUT/mydataset_train_lmdb 
  $OUTPUT/train_mean.binaryproto

$TOOLS/compute_image_mean $OUTPUT/mydataset_val_lmdb 
  $OUTPUT/val_mean.binaryproto

Preparing data with various channels

Gray scale images (one channel), RADAR images (two channels), videos (four channels), image+depth (four channels), vibrometry (one channel), and spectrograms (one channel) required a wrapper in order to set the LMDB dataset (see this blog script as a guide).

Resizing images

There are two common approaches to resizing images:

  • warp an image to the desired size
  • proportionally resize with the smaller size being the desired size, and then center crop the large side to the desired size

Resizing can occur in a number of ways:

  • via OpenCV* as part making the LMDB folder, for example, build/tools/convert_imageset --resize_height=256 --resize_width=256 warps image to desired size; convert_imageset calls ReadImageToDatum which calls ReadImageToCVMat in caffe/src/util/io.cpp
  • via ImageMagick, for example, convert -resize 256x256\! <input_img> <output_img> warps image to desired size
  • via OpenCV using a script that allows for multithreading image conversion in tools/extra/resize_and_crop_images.py proportionally resizes and then center crops. This requires:
sudo pip install git+https://github.com/Yangqing/mincepie.git
sudo apt-get install -y python-opencv
vi tools/extra/launch_resize_and_crop_images.sh # set number of clients (use num_of_cores*2); file.txt, input, and output folders

In addition, as part of the data layer the images can be crop or resized:

layer {
  name: "data"
  transform_param {
    crop_size: 227
...
}

which crops an image (at random during during training and the center image during testing), and

layer {
  name: "data"
  image_data_param {
    new_height: 227
    new_width: 227
...
}

warps the image to the new_height or new_width using OpenCV.

Training

Training requires:

  • train_val.prototxt: defines the network architecture, initialization parameters, and local learning rates
  • solver.prototxt: defines optimization/training parameters and serves as the actual file that is called to train a deep network
  • deploy.prototxt: used only in testing. It must be exactly the same as train_val.prototxt except from the input layer(s), loss layer(s), and weights initialization (e.,g weight_filler) as the latter two do not exist in deploy.prototxt.

It is common but not required to have the same name for the layer and the blob coming out of the layer. In the prototxt files in each layer name and top are usually the same.

A description of what each layer does can be found here. Initialization parameters are extremely important. They are set here. Some additional tips worth mentioning:

  • weight_filter initialization (for ReLU units, MSRAFiller is usually better than xavier, and xavier is usually better than gaussian; note for MSRAFiller and xavier there is no need to manually specify std)
  • gaussian: samples weights from Gaussian distribution N(0,std)
  • xavier: samples weights from uniform distribution U(-a,a), where a=sqrt(3/fan_in), where fan_in is the number of incoming inputs
  • MSRAFiller: samples weights from normal distribution N(0,a), where a=sqrt(2/fan_in)
  • base_lr: initial learning rate (default:.01, change to a smaller number if getting NAN loss in training)
  • lr_mult: for the bias is usually set to 2x the lr_mult for the non-bias weights

LeNet example lenet_train_test.prototxt, deploy.prototxt, and solver.prototxt described below (comments about what each variable means are included):

solver.prototxt

# The train/validation net protocol buffer definition, that is, the training architecture
net: "examples/mnist/lenet_train_test.prototxt"

# Note: 1 iteration = 1 forward pass over all the images in one batch

# Carry out a validation test every 500 training iterations.
test_interval: 500 

# test_iter specifies how many forward passes the validation test should carry out
#  a good number is num_val_imgs / batch_size (see batch_size in Data layer in phase TEST in train_test.prototxt)
test_iter: 100 

# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9 
weight_decay: 0.0005

# We want to initially move fast towards the local minimum and as we approach it, we want to move slower
# To this end, there are various learning rates policies available:
#  fixed: always return base_lr.
#  step: return base_lr * gamma ^ (floor(iter / step))
#  exp: return base_lr * gamma ^ iter
#  inv: return base_lr * (1 + gamma * iter) ^ (- power)
#  multistep: similar to step but it allows non uniform steps defined by stepvalue
#  poly: the effective learning rate follows a polynomial decay, to be zero by the max_iter: return base_lr (1 - iter/max_iter) ^ (power)
#  sigmoid: the effective learning rate follows a sigmod decay: return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize))))
lr_policy: "step"
gamma: 0.1 
stepsize: 10000 # Drop the learning rate in steps by a factor of gamma every stepsize iterations

# Display every 100 iterations
display: 100 

# The maximum number of iterations
max_iter: 10000

# snapshot intermediate results, that is, every 5000 iterations it saves a snapshot of the weights
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet_multistep"

# solver mode: CPU or GPU
solver_mode: CPU

Train the network:

# The name of the output file (aka the trained weights) is in solver.prototxt
$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt

Training will produce two types of files (note the 10000 is the number of completed iterations):

  • lenet_multistep_10000.caffemodel: weights of the architecture to be used in testing
  • lenet_multistep_10000.solverstate: used if training dies (for example, power outage) to resume training from current iteration

To train the network and plot the validation accuracy or loss vs iterations:

#CHART_TYPE=[0-7]
#  0: Test accuracy  vs. Iters
#  1: Test accuracy  vs. Seconds
#  2: Test loss  vs. Iters
#  3: Test loss  vs. Seconds
#  4: Train learning rate  vs. Iters
#  5: Train learning rate  vs. Seconds
#  6: Train loss  vs. Iters
#  7: Train loss  vs. Seconds
CHART_TYPE=0
$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt 2>&1 | tee logfile.log
python $CAFFE_ROOT/tools/extra/plot_training_log.py.example $CHART_TYPE name_of_plot.png logfile.log

Dropout can be used in connection with a fully connected layer. It is only used to reduce overfitting by dropping a percentage of different weights during each forward pass which prevents coadaptations between the weights. It is ignored in testing.

layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }   
    bias_filler {
      type: "constant"
      value: 1
    }   
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5 
  }
}

Measuring forward and backward propagation time (not weight updates):

# Computes 50 iterations and returns forward, backward, and total time and the average
# note that the training samples and mean.binaryproto may be required or
# alternatively, use dummy variables
NUMITER=50
/path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

For consistency in the timings, the Linux utility numactl can be used to allocate memory buffers in MCDRAM:

numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

Model Zoo

The Caffe Model Zoo is a collection of trained deep learning models and/or prototxt files used for a variety of tasks. These models can be used in fine-tuning or testing.

Multinode distributed training

The material in this section is based on Intel's Caffe Github wiki. There are two main approaches to distribute the training across multiple nodes: model parallelism and data parallelism. In model parallelism, the model is divided among the nodes and each node has the full data batch. In data parallelism, the data batch is divided among the nodes and each node has the full model. Data parallelism is especially useful when the number of weights in a model is small and when the data batch is large. A hybrid model and data parallelism is possible where layers with few weights such as the convolutional layers are trained using the data parallelism approach and layers with many weights such as fully connected layers are trained using the model parallelism approach. Intel has published a theoretical analysis to optimally trade between data and model parallelism in this hybrid approach.

Given the recent popularity of deep networks with fewer weights such as GoogleNet and ResNet and the success of distribute training using data parallelism, Caffe optimized for Intel architecture supports data parallelism. Multinode distributed training is currently under active development with newer features being evaluated.

To train across various nodes make sure these two lines are in to Makefile.config

USE_MPI := 1
# update with the path to binary MPI library
CXX := /usr/bin/mpicxx

Using multinode is as simple as:

mpirun --hostfile path/to/hostfile -n <num_processes> /path/to/caffe/build/tools/caffe train --solver=/path/to/solver.prototxt --param_server=mpi

where <num_processes> is the number of nodes to use, and hostfile contains the ip addresses of the nodes per line. Note that solver.prototxt points to the train.prototxt in each node, and each train.prototxt needs to points to a different portion of the dataset. For more details, click here.

Fine-tuning

Recycle the layer definition prototxt file and make two changes.

1. Change the data layer to include the new data (note the scale is 1/255):

layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "newdata_lmdb" # CHANGED THIS LINE TO THE NEW DATASET
    batch_size: 64
    backend: LMDB
  }
}

2. Change the last layer, in this case ip2 (in testing, make this same change to the deploy.prototxt file):

layer {
  name: "ip2-ft" # CHANGED THIS LINE
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2-ft" # CHANGED THIS LINE
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 2 #CHANGED THIS LINE TO THE NUMBER OF CLASSES IN NEW DATASET
    bias_filler {
      type: "constant"
    }
  }
}

Invoke Caffe:

#From the command line on $CAFFE_ROOT
./build/tools/caffe train -solver /path/to/solver.prototxt -weights  /path/to/trained_model.caffemodel

Fine-tuning guidelines

  • Learn the last layer first (earlier layer weights won't change very much in fine-tuning)
  • Drop the initial learning rate (in the solver.prototxt) by 10x or 100x
  • Caffe layers have local learning rates: lr_mult
  • Freeze all but the last layer (and perhaps second to last layer) for fast optimization, that is, lr_mult=0 in local learning rates
  • Increase local learning rate of last layer by 10x and second to last by 5x
  • Stop if good enough or keep fine-tuning other layers

What happens under the hood:

  • Creates a new network
  • Copies the previous weights to initialized network weights
  • Solves in the usual way (see example)

Testing

Testing also known as inference, classification, or scoring can be done in Python or using the native C++ utility that ships with Caffe. To classify an image (or signal) or set of images the following is needed:

  • Image(s)
  • Network architecture
  • Network weights

Testing using the native C++ utility is less flexible, and using Python is preferred. The protoxt file with the model should have phase: TEST in the data layer with the testing dataset in order to test the model.

/path/to/caffe/build/tools/caffe test -model /path/to/train_val.prototxt 
- weights /path/to/trained_model.caffemodel -iterations <num_iter>

This example was adapted from this blog. To classify an image using a pretrained model, first download the pretrained model:

./scripts/download_model_binary.py models/bvlc_reference_caffenet

Next, download the dataset (ILSVRC 2012 in this example) labels (also called the synset file) which is required in order to map a prediction to the name of the class:

./data/ilsvrc12/get_ilsvrc_aux.sh

Then classify an image:

./build/examples/cpp_classification/classification.bin 
  models/bvlc_reference_caffenet/deploy.prototxt 
  models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel 
  data/ilsvrc12/imagenet_mean.binaryproto 
  data/ilsvrc12/synset_words.txt 
  examples/images/cat.jpg

The output should look like this:

---------- Prediction for examples/images/cat.jpg ----------
0.3134 - "n02123045 tabby, tabby cat"
0.2380 - "n02123159 tiger cat"
0.1235 - "n02124075 Egyptian cat"
0.1003 - "n02119022 red fox, Vulpes vulpes"
0.0715 - "n02127052 lynx, catamount"

Feature extractor and visualization

In a convolutional layer the weights from one layer to the next can be represented by a blob: output_feature_maps x height x width x input_feature_maps (feature_maps also known as channels). There are two options for using networks trained in Caffe as feature extractors: The first option (recommended) is to use the Python API. The second option is to use the native C++ utility that ships with Caffe:

# Download model params
scripts/download_model_binary.py models/bvlc_reference_caffenet

# Generate a list of the files to process
# Use the images that ship with caffe
find `pwd`/examples/images -type f -exec echo {} ; > examples/images/test.txt

# Add a 0 to the end of each line
# input data structures expect labels after each image file name
sed -i "s/$/ 0/" examples/images/test.txt

# Get the mean of trainint set to subtract it from images
./data/ilsvrc12/get_ilsvrc_aux.sh

# Copy and modify the data layer to load and resize the images:
cp examples/feature_extraction/imagenet_val.prototxt examples/images
vi examples/iamges/imagenet_val.prototxt

# Extract features
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel 
  examples/images/imagenet_val.prototxt fc7 examples/images/features 10 lmdb

The feature blob is extracted above from fc7 which represents the highest level feature of the reference model. Alternatively, other layers can be used as well, such as conv5 or pool3. The last parameter above 10 lmdb is the mini-batch size. The features are stored to LevelDB examples/images/features, ready for access by some other code.

Using the Python* API

Understanding this section is not required to start using Caffe. This section based in this blog. The Python interface is handy in testing, classifying, and feature extraction, and can also be used to define and train networks.

Setting up Python Caffe

Make sure make pycaffe was called when compiling Caffe. In Python first import the caffe module:

# Make sure that caffe is on the python path:
# (alternatively set PYTHONCAFFE var as explained the installation)
import sys 
CAFFE_ROOT = '/path/to/caffe/'
sys.path.insert(0, CAFFE_ROOT + 'python')
import caffe
caffe.set_mode_cpu()

Loading the network architecture

The network architecture can be found in the train_val.prototxt or deploy.prototxt files. To load the network:

net = caffe.Net('train_val.prototxt', caffe.TRAIN)

or if loading a specific set of weights, do this instead:

net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)

The reason to use caffe.TRAIN is because caffe.TEST crashes if run twice and caffe.TRAIN appears to give the same results.

The net contains data blobs (net.blobs) and parameter weight blobs (net.params). In the commands below conv1 can be replaced with the name of any other layer:

  • net.blobs['conv1']: data output at the conv1 layer known as feature maps
  • net.params['conv1'][0]: weight blob at the conv1 layer
  • net.params['conv1'][1]: bias blob at the conv1 layer
  • net.blobs.items(): returns the data blob for all the layers - useful in a for loop to cycle through the layers

Visualizing the network

To display the network, first install the pydot module and graphviz

sudo apt-get install -y GraphViz
sudo pip install pydot

Run the draw_net.py python script:

python python/draw_net.py examples/net_surgery/deploy.prototxt train_val_net.png
open train_val_net.png

Data input

Input data into the data layer blob using one of the following techniques:

  • modify data layer to match the size of the image:
import numpy as np
# get input image and arrange it as a 4-D tensor
im = np.array(Image.open('/path/to/caffe/examples/images/cat_gray.jpg'))
im = im[np.newaxis, np.newaxis, :, :]
# resize the blob to be the size of the input image
net.blobs['data'].reshape(im.shape) # if the image input is different 
# compute the blobs given the input data
net.blobs['data'].data[...] = im
  • modify the input data to match the size of the expected input of the data layer:
im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')
shape = net.blobs['data'].data.shape
# resize the img to be the size of the data blob
im = caffe.io.resize(im, shape[3], shape[2], shape[1])
# compute the blobs given the input data
net.blobs['data'].data[...] = im

There are common transformations to the input data that are commonly applied:

net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
ilsvrc_mean = 'python/caffe/imagenet/ilsvrc_2012_mean.npy'
transformer.set_mean('data', np.load(ilsvrc_mean).mean(1).mean(1))
# puts the channel as the first dimention
transformer.set_transpose('data', (2,0,1))
# (2,1,0) maps RGB to BGR for example
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)
# the batch size can be changed on-the-fly
net.blobs['data'].reshape(1,3,227,227)
# load the image in the data layer
im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')
# transform the image and store it in the net.blob
net.blobs['data'].data[...] = transformer.preprocess('data', im)

To view im:

import matplotlib.pyplot as plt
plt.imshow(im)

Inference

The prediction of the net on the input image can be computed as follows:

# assumes that images are loaded
prediction = net.forward()
print 'predicted class:', prediction['prob'].argmax()

To time the forward propagation (this ignores the data preprocessing time):

timeit net.forward()

Another module that transforms the data and can be used to classify various data inputs simultaneously is the net.Classifier. That is, the net.Classifier can be used instead of having to use both the net.Net and caffe.io.Transformer.

im1 = caffe.io.load.images('/path/to/caffe/examples/images/cat.jpg')
im2 = caffe.io.load.images('/path/to/caffe/examples/images/fish-bike.jpg')
imgs = [im1, im2]
ilsvrc_mean = '/path/to/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy'
net = caffe.Classifier('deploy.prototxt', 'trained_model.caffemodel',
                       mean=np.load(ilsvrc_mean).mean(1).mean(1),
                       channel_swap=(2,1,0),
                       raw_scale=255,
                       image_dims=(256, 256))
prediction = net.predict(imgs) # predict takes any number of images
print 'predicted classes:', prediction[0].argmax(), prediction[1].argmax()

If using a folder with many images, replace imgs as follows (everything else stays the same):

IMAGES_FOLDER = '/path/to/folder/w/images/'
import os
images = os.listdir(IMAGES_FOLDER)
imgs = [ caffe.io.load_image(IMAGES_FOLDER + im) for im in images ]

The entire test set may not fit in memory. Therefore, the predictions can be computed in batches, for example, batches of 100 images.

To view the probabilities of all the classes for im1 as a bar chart

plt.plot(prediction[0])

To time the full classification pipeline (including the im1 transformations) for 1 image with oversampling. Oversampling crops 10 images: the center, the corners, and their mirrors:

timeit net.predict([im1])

If oversample is set to false, it only crops the center:

timeit net.predict([im1], oversample=0)

Feature extraction and visualization

To examine the data at each a particular layers, for example, fc7:

net.blobs['fc7'].data

To retrieve details of the networks' layers and shapes

# Retrieve details of the network's layers
[(k, v.data.shape) for k, v in net.blobs.items()]

# Retrieve weights of the network's layers
[(k, v[0].data.shape) for k, v in net.params.items()]

# Retrieve the features in the last fully connected layer
# prior to outputting class probabilities
feat = net.blobs['fc7'].data[4]

# Retrieve size/dimensions of the array
feat.shape

Visualizing the blobs:

# Assumes that the "net = caffe.Classifier" module has been called
# and data has been formatted as in the example above

# Take an array of shape (n, height, width) or (n, height, width, channels)
# and visualize each (height, width) section in a grid
# of size approx. sqrt(n) by sqrt(n)
def vis_square(data, padsize=1, padval=0):
    # values between 0 and 1
    data -= data.min()
    data /= data.max()

    # force the number of filters to be square
    n = int(np.ceil(np.sqrt(data.shape[0])))
    padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))

    # tile the filters into an image
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])

    plt.imshow(data)

plt.rcParams['figure.figsize'] = (25.0, 20.0)

# visualize the weights after the 1st conv layer
net.params['conv1'][0].data.shape
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))

# visualize the feature maps after 1st conv layer
net.blobs['conv1'].data.shape
feat = net.blobs['conv1'].data[0,:96]
vis_square(feat, padval=1)

# visualize the weights after the 2nd conv layer
net.blobs['conv2'].data.shape
feat = net.blobs['conv2'].data[0,:96]
vis_square(feat, padval=1)

# visualize the weights after the 2nd pool layer
net.blobs['pool2'].data.shape
feat = net.blobs['pool2'].data[0,:256] # change 256 to number of pool outputs
vis_square(feat, padval=1)

# Visualize the neuron activations for the 2nd fully-connected layer
net.blobs['ip2'].data.shape
feat = net.blobs['ip2'].data[0]
plt.plot(feat.flat)
plt.legend()
plt.show()

Defining a network

A network can be defined in Python and saved to a prototxt file as follows:

from caffe import layers as L
from caffe import params as P

def lenet(lmdb, batch_size):
    # auto generated LeNet
    n = caffe.NetSpec()
    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, transform_param=dict(scale=1./255), ntop=2)
    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
    n.relu1 = L.ReLU(n.ip1, in_place=True)
    n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
    n.loss = L.SoftmaxWithLoss(n.ip2, n.label)
    return n.to_proto()

with open('examples/mnist/lenet_auto_train.prototxt', 'w') as f:
    f.write(str(lenet('examples/mnist/mnist_train_lmdb', 64)))

with open('examples/mnist/lenet_auto_test.prototxt', 'w') as f:
    f.write(str(lenet('examples/mnist/mnist_test_lmdb', 100)))

The code above will produce the following prototxt file:

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00392156862745
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 20
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 50
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

Training networks

Load the solver in Python and do a forward propagation:

solver = caffe.get_solver('models/bvlc_reference_caffenet/solver.prototxt')
net = caffe.Net('train_val.prototxt', caffe.TRAIN)
solver.net.forward()  # train net
solver.test_nets[0].forward()  # test net (there can be more than one)

To compute the gradients:

solver.net.backward()

The gradients values can be displayed as follows:

# data gradients
net.blobs['conv1'].diff

# weight gradients
net.params['conv1'][0].diff

# biases gradients
net.params['conv1'][1].diff

To launch one iteration, a forward propagation, a backward propagation, and the update:

solver.step(1)

To launch all the iterations defined in the solver.prototxt as max_iter:

solver.step()

Debugging

This section is optional and meant for Caffe developers only.

A few tips to help in debugging:

  • remove randomness
  • compare caffemodels
  • use Caffe's debug info

Removing randomness can be beneficial in order to reproduce behaviors and outputs. Removing randomness from non-associative floating point arithmetic operations is outside the scope of this article.

Adding randomness happens at various stages:

  • the weights are usually randomly initialized following some (for example, Gaussian) distribution.
  • the input images can be preprocessed by randomly flipping the image horizontally or randomly cropping various parts of the images (e.g., cropping 227x227 patches from a 256x256 images); and by randomly shuffling the images
  • in the dropout layer in training some weights are randomly used and others are ignored

One solution is to use a seed. In the solver.prototxt add the line:

# pick some value for random_seed that is greater or equal to 1, for example:
random_seed: 42

This ensure the same "random" values are used. However, the seed may produce different values in different machines. The alternative and more robust when working across machines:

  • Preparing the data using the same set of shuffled images, that is, do not reshuffle with each experiment
  • In train.prototxt, in the ImageData layer, in transform_param: do not crop and do not mirror the images. If smaller size images are required the warp the images in the image_data_param:
layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
 #   mirror: true
 #   crop_size: 227
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  image_data_param {
    source: "/path/to/file/train.txt"
    batch_size: 32
    new_height: 224
    new_width: 224
  }
}

In train.prototxt in the dropout layers, make dropout_ratio: 0.

Other helpful guidelines

  • In solver.prototxt change the lr_policy to fixed
  • In solver.prototxt add the line debug_info: 1

To compare two caffemodels the following script returns the sum of the difference between all the weights in the caffemodels:

# Intel Corporation
# Author: Ravi Panchumarthy

import sys, os, argparse, time
import pdb
import numpy as np

def get_args():
    parser = argparse.ArgumentParser('Compare weights of two caffe models')

    parser.add_argument('-m1', dest='modelFile1', type=str, required=True,
                        help='Caffe model weights file to compare')
    parser.add_argument('-m2', dest='modelFile2', type=str, required=True,
                        help='Caffe model weights file to compare aganist')
    parser.add_argument('-n', dest='netFile', type=str, required=True,
                        help='Network prototxt file associated with model')
    return parser.parse_args()

if __name__ == "__main__":
    import caffe

    args = get_args()
    net = caffe.Net(args.netFile, args.modelFile1, caffe.TRAIN)
    net2compare = caffe.Net(args.netFile, args.modelFile2, caffe.TRAIN)

    wt_sumOfAbsDiffByName = dict()
    bias_sumOfAbsDiffByName = dict()

    for name, blobs in net.params.iteritems():
        wt_diffTensor = np.subtract(net.params[name][0].data, net2compare.params[name][0].data)
        wt_absDiffTensor = np.absolute(wt_diffTensor)
        wt_sumOfAbsDiff = wt_absDiffTensor.sum()
        wt_sumOfAbsDiffByName.update({name : wt_sumOfAbsDiff})

        # if args.layerDebug == 1:
        #     print("%s : %s" % (name,wt_sumOfAbsDiff))

        bias_diffTensor = np.subtract(net.params[name][1].data, net2compare.params[name][1].data)
        bias_absDiffTensor = np.absolute(bias_diffTensor)
        bias_sumOfAbsDiff = bias_absDiffTensor.sum()
        bias_sumOfAbsDiffByName.update({name : bias_sumOfAbsDiff})

    print("\nThe sum of absolute difference of all layer's weight is : %s" % sum(wt_sumOfAbsDiffByName.values()))
    print("The sum of absolute difference of all layer's bias is : %s" % sum(bias_sumOfAbsDiffByName.values()))

    finalDiffVal = sum(wt_sumOfAbsDiffByName.values())+ sum(bias_sumOfAbsDiffByName.values())
    print("The sum of absolute difference of all layers weight's and bias's is : %s" % finalDiffVal )

For further debugging, in Makefile.config uncomment the line DEBUG := 1, compile the code and run it with the command:

gdb /path/to/caffe/build/caffe

Once gdb starts use the run command and add the rest of the arguments

run train -solver /path/to/solver.prototxt

Examples

LeNet on MNIST

The purpose of this section is to show the steps for a particular experiment with preparing a dataset, training a model, and timing the model. The content is based on this and this.

Preparing datasets:

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh # downloads MNIST dataset
./examples/mnist/create_mnist.sh # creates dataset in LMDB format

Training datasets:

# Reduce the number of iterations from 10K to 1K to quickly run through this example
sed -i 's/max_iter: 10000/max_iter: 1000/g' examples/mnist/lenet_solver.prototxt
./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt

Timing the forward and backward propagations (not including weight updates):

./build/tools/caffe time --model=examples/mnist/lenet_train_test.prototxt -iterations 50 # runs on CPU

For consistency in the timings, the utility numactl can be used to allocate memory buffers in MCDRAM:

numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

Testing the trained model. In this example it is tested in the validation test. In practice, it should be tested with a different dataset using the format below or the format explained above:

# the file with the model should have a 'phase: TEST'
./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt 
  -weights examples/mnist/lenet_iter_1000.caffemodel -iterations 50

Dogs vs Cats

Get an account with Kaggle and download the data. Note that you cannot just do wget because you must log in to Kaggle. Log in to Kaggle, download data, and transfer it to your machine.

Unzip dogvscat.zip and execute the dogvscat.sh script contained in the zip file. This script is shown below for convenience.

#!/usr/bin/env sh
CAFFE_ROOT=/path/to/caffe
mkdir dogvscat
DOG_VS_CAT_FOLDER=/path/to/dogvscat

cd $DOG_VS_CAT_FOLDER
## Download datasets (requires first a login)
#https://www.kaggle.com/c/dogs-vs-cats/download/train.zip
#https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip

# Unzip train and test data
sudo apt-get -y install unzip
unzip train.zip -d .
unzip test1.zip -d .

# Format data
python create_label_file.py # creates 2 text files with labels for training and validation
./build_datasets.sh # build lmdbs

# Download ImageNet pretrained weights (takes ~20 min)
$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet 

# Fine-tune weights in the AlexNet architecture (takes ~100 min)
$CAFFE_ROOT/build/tools/caffe train -solver $DOG_VS_CAT_FOLDER/dogvscat_solver.prototxt 
    -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel 

# Classify test dataset
cd $DOGVSCAT_FOLDER
python convert_binaryproto2npy.py
python dogvscat_classify.py # Returns prediction.txt (takes ~30 min)

# A better approach is to train five AlexNets w/init parameters from the same distribution,
# fine-tune those five, and compute the average of the five networks

I submitted my results to Kaggle and got a score of 0.97566 accuracy (which would have placed 15th out of 215 had I competed).

PASCAL VOC Classification

Unzip voc2012.zip and execute the voc2012.sh script (contained in the zip file and shown below for convenience). Type sudo chmod 700 *.sh to make sure the scripts can be executed. It trains and runs AlexNet.

#!/usr/bin/env sh

# Copy and unzip voc2012.zip (it contains this file) then run this file. But first
#  change paths in: voc2012.sh; build_datasets.sh; solvers/*; nets/*; classify.py

# As you run various files, you can ignore the following error if it shows up:
#  libdc1394 error: Failed to initialize libdc1394

# set Caffe root directory
CAFFE_ROOT=$CAFFE_ROOT
VOC=/path/to/voc2012

chmod 700 *.sh

# Download datasets
# Details: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit
if [ ! -f VOCtrainval_11-May-2012.tar ]; then
  wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
fi
# VOCtraival_11-May-2012.tar contains the VOC folder with:
#  JPGImages: all jpg images
#  Annotations: objects and corresponding bounding box/pose/truncated/occluded per jpg
#  ImageSets: breaks the images by the type of task they are used for
#  SegmentationClass and SegmentationObject: segmented images (duplicate directories)
tar -xvf VOCtrainval_11-May-2012.tar

# Run Python scripts to create labeled text files
python create_labeled_txt_file.py
 
# Execute shell script to create training and validation lmdbs
# Note that lmdbs directories w/the same name cannot exist prior to creating them
./build_datasets.sh
 
# Execute following command to download caffenet pre-trained weights (takes ~20 min)
#  if weights exist already then the command is ignored
$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet
 
# Fine-tune weights in the AlexNet architecture (takes ~60 min)
# you can also chose one of six solvers: pascal_solver[1-6].prototxt
$CAFFE_ROOT/build/tools/caffe train -solver $VOC/solvers/voc2012_solver.prototxt 
  -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

# The lines below are not really needed; they served as examples on how to do some tasks

# Test against voc2012_val_lmbd dataset (name of lmdb is the model under PHASE: test)
 $CAFFE_ROOT/build/tools/caffe test -model $VOC/nets/voc2012_train_val_ft678.prototxt 
   -weights $VOC/weights_iter_5000.caffemodel -iterations 116

# Classify validation dataset: returns a file w/the labels of the val dataset
#  but it doesn't report accuracy (that would require some adjusting of the code)
python convert_binaryproto2npy.py
mkdir results
python cls_confidence.py
python average_precision.py

# Note to submit results to the VOC scoreboard, retrain NN using the trainval set
# and test on the unlabeled test data provided by VOC

# A better approach is to train five CNNs w/init parameters from the same distribution,
# fine-tune those five, and compute the average of the five networks

Additional VOC information (in case the reader is interested in learning more about VOC):

  • PASCAL VOC datasets
  • To compare methods or design choices
  • uses the entire VOC2007 data, where all annotations (including test annotations) are available
  • report cross-validation results using VOC2012 "trainval" set alone (no test annotations are provided from 2008 to 2012)
  • most common metric is average precision (AP): the area under the precision/recall curve
  • VOC 2012 Data Summary
  • In 2008, there was a new dataset and each year more data was added. Therefore it is common to see published results in VOC2007 and VOC2012 (or VOC2011--no additional data for the classification and detection task between 2011 and 2012)
  • 20 classes
  • Training: 5,717 images, 13,609 objects
  • Validation: 5,823 images, 13,841 objects
  • Testing: 10,991 images

Current Caffe usages

This is a short list of popular Caffe usages. For a more comprehensive list, see the Caffe Model-Zoo.

Further reading

For more complete information about compiler optimizations, see our Optimization Notice.