Manage Deep Learning Networks with Caffe* Optimized for Intel® Architecture
By Andres R. (Intel), published on June 15, 2016
Summary
Caffe* is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC). It is written in C++ and CUDA* C++ with Python* and MATLAB* wrappers. It is useful for convolutional neural networks, recurrent neural networks, and multi-layer preceptrons. There are various forks of the main Caffe branch that support detection and classification, segmentation, and Spark compatible, among others.
Caffe optimized for Intel® architecture is currently integrated with the latest release of Intel® Math Kernel Library (Intel® MKL) 2017 optimized for Advanced Vector Extensions (AVX)-2 and AVX-512 instructions which are supported in Intel® Xeon® and Intel® Xeon Phi™ processors (among others). That is, Caffe optimized for Intel® architecture contains all the goodness found in BVLC Caffe and in addition, runs efficiently on Intel architecture and can be used for distributed training across various nodes. This tutorial describes how to build Caffe optimized for Intel architecture, train deep network models using one or more compute nodes, and deploy networks. In addition, various functionalities of Caffe are explored in detail including how to fine-tune, extract and view features of different models, and use the Caffe Python API.
Vocabulary use:
- weights - also known as kernels, filters, templates, or feature extractors
- blob - also known as tensor - an N dimensional data structure, that is, an N-D tensor, that contains data, gradients, or weights (including biases)
- units - also known as neurons - performs a non-linear transformation on a data blob
- feature maps - also known as channels
- testing - also known as inference, classification, scoring, or deployment
- model - also known as topology or architecture
A fast way to become familiar with Caffe is:
- Install it
- Train and test LeNet on MNIST
- Test a pre-trained model, for example, bvlc_googlenet.caffemodel, on some images, for example, cat and fish-bike
- Fine-tune a trained model on the Cats vs Dogs challenge
Note that the content of this article is based in part in this blog.
Installation
The following instructions apply to Ubuntu* 14.04. Similar instructions for other Linux* or OS *X operating systems or Ubuntu versions can be found in BVLC's Caffe installation website. Get dependencies: (Note that when you scroll your mouse over the code, three icons appear. Click on the "view source" icon to view the code without the line numbers.)
sudo apt-get update && sudo apt-get -y install build-essential git cmake && sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev && sudo apt-get -y install libopencv-dev libhdf5-serial-dev protobuf-compiler && sudo apt-get -y install --no-install-recommends libboost-all-dev && sudo apt-get -y install libgflags-dev libgoogle-glog-dev liblmdb-dev && sudo apt-get -y install libatlas-base-dev
For Ubuntu 16.04 linked the following libraries:
find . -type f -exec sed -i -e 's^"hdf5.h"^"hdf5/serial/hdf5.h"^g' -e 's^"hdf5_hl.h"^"hdf5/serial/hdf5_hl.h"^g' '{}' ; cd /usr/lib/x86_64-linux-gnu sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so
On CentOS* 7 install the dependencies as follows:
sudo yum -y update && sudo yum -y groupinstall "Development Tools" && sudo yum -y install wget cmake git && sudo yum -y install protobuf-devel protobuf-compiler boost-devel && sudo yum -y install snappy-devel opencv-devel atlas-devel && sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel # The following steps are only required if some packages failed to install # add EPEL repository then install missing packages wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm sudo rpm -ivh epel-release-latest-7.noarch.rpm sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel && sudo yum -y install protobuf-devel protobuf-compiler boost-devel # if packages are still not found--download and install/build the packages, e.g., # snappy: wget http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm # atlas: wget http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm # opencv: wget https://github.com/Itseez/opencv/archive/2.4.13.zip unzip 2.4.13.zip cd opencv-2.4.13/ mkdir build && cd build cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local .. NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2)) make all -j $NUM_THREADS sudo make install -j $NUM_THREADS # optional (not required for Caffe) # other useful repositories for CentOS are RepoForge and IUS: wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm sudo rpm -Uvh rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm wget https://rhel7.iuscommunity.org/ius-release.rpm sudo rpm -Uvh ius-release*.rpm
Reasons for dependencies (source):
boost
: a C++ library used for its math functions and shared pointerglog
,gflags
: provides logging and command line utilities. Essential for debuggingleveldb
,lmdb
: database IO. Use for preparing your own dataprotobuf
: used to efficiently define data structureBLAS
(Basic Linear Algebra Subprograms): operations such as matrix multiplication, matrix addition, provided by Intel® Math Kernel Library (Intel® MKL), ATLAS*, openBLAS*, and so forth
The Caffe installation guide states: Install "MKL for better CPU performance."
For best performance, use Intel® Math Kernel Library (Intel® MKL) 2017, available for free as a Beta in Intel® Parallel Studio XE 2017 Beta. Intel MKL 2017 production release also known as gold release will be available September 2016.
Alternatively, Intel MKL 11.3.3 (the 2016 version) can be downloaded and installed. To download it, first register for a free community license and follow the installation instructions.
Once installed, the correct environment libraries can be set as follows (the path may need to be modified):
echo 'source /opt/intel/bin/compilervars.sh intel64' >> ~/.bashrc # alternatively edit <mkl_path>/mkl/bin/mklvars.sh replacing INSTALLDIR in # CPRO_PATH=<INSTALLDIR> with the actual mkl path: CPRO_PATH=<full mkl path> # echo 'source <mkl path>/mkl/bin/mklvars.sh intel64' >> ~/.bashrc
Clone and prepare Caffe optimized for Intel architecture for compiling as follows:
cd ~ # For BVLC caffe use: # git clone https://github.com/BVLC/caffe.git # For intel caffe use: git clone https://github.com/intel/caffe.git cd caffe echo "export CAFFE_ROOT=`pwd`" >> ~/.bashrc source ~/.bashrc cp Makefile.config.example Makefile.config # Open Makefile.config and modify it (see comments in the Makefile) vi Makefile.config
Edit the Makefile.config:
# To run on CPU only and to avoid installing CUDA installers, uncomment CPU_ONLY := 1 # To use MKL, replace atlas with mkl as follows # (make sure that the BLAS_DIR and BLAS_LIB paths are correct) BLAS := mkl BLAS_DIR := $(MKLROOT)/include BLAS_LIB := $(MKLROOT)/lib/intel64 # To use MKL2017 DNN primitives as the default engine, uncomment # (however leave it commented if using multinode training) # USE_MKL2017_AS_DEFAULT_ENGINE := 1 # To customized compiler choice, uncomment and set the following # CUSTOM_CXX := g++ # To train on multinode uncomment and verify path # USE_MPI := 1 # CXX := /usr/bin/mpicxx
If using Ubuntu 16.04, edit the Makefile:
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
and create symlinks:
cd /usr/lib/x86_64-linux-gnu sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so
If using CentOS 7 and ATLAS (instead of the recommended MKL library), edit the Makefile:
# Change this line LIBRARIES += cblas atlas # to LIBRARIES += satlas
Build Caffe optimized for Intel architecture:
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2)) make -j $NUM_THREADS # To save the output stream to file makestdout.log use this instead # make -j $NUM_THREADS 2>&1 | tee makestdout.log
An alternative to the steps above is to use cmake:
mkdir build cd build cmake -DCPU_ONLY=on -DBLAS-mkl -DUSE_MKL2017_AS_DEFAULT_ENGINE=on /path/to/caffe NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2)) make -j $NUM_THREADS
Install Python dependencies:
# These steps are OPTIONAL but highly recommended to use the Python interface sudo apt-get -y install gfortran python-dev python-pip cd ~/caffe/python for req in $(cat requirements.txt); do sudo pip install $req; done sudo pip install scikit-image #depends on other packages sudo ln -s /usr/include/python2.7/ /usr/local/include/python2.7 sudo ln -s /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ \ /usr/local/include/python2.7/numpy cd ~/caffe make pycaffe -j NUM_THREADS echo "export PYTHONPATH=$CAFFE_ROOT/python" >> ~/.bashrc source ~/.bashrc
Other installation options:
# These steps are OPTIONAL to test caffe make test -j $NUM_THREADS make runtest #"YOU HAVE <some number> DISABLED TESTS" output is OK # This step is OPTIONAL to disable cam hardware OpenCV driver # alternatively, the user can skip this and ignore the harmless # libdc1394 error that may occasionally appears sudo ln /dev/null /dev/raw1394
Data layer
This section is optional and discusses the various data types; understanding it is not required to start using Caffe. It may be useful if you plan to use data in differing formats. The material in this section is based on this and this tutorial, and src/caffe/proto/caffe.proto.
Data enters Caffe through data layers, which lie at the bottom of nets and are defined in a prototxt file. More information on prototxt files is in the Training section. Data can come from efficient databases (LevelDB or LMDB), directly from memory, or, when efficiency is not critical, from files on disk in HDF5 or common image formats.
Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) transformations are available by specifying transform_params
(not supported in all data types, for example, HDF5 does not support this). If the required data transformations are performed beforehand, it is not necessary to use this option in the data layer. Common data transformations can be performed as follows:
transform_param { # randomly horizontally mirror the image mirror: 1 # crop a `crop_size` x `crop_size` patch: # - at random during training # - from the center during testing crop_size: 227 # substract mean value: these mean_values can equivalently be replaced with a mean.binaryproto file as # mean_file: name_of_mean_file.binaryproto mean_value: 104 mean_value: 117 mean_value: 123 }
In this example the images are cropped, mirrored, and have the mean subtracted. For other available general data transformations see src/caffe/proto/caffe.proto under message TransformationParameter
.
Data
Lightning Memory-Mapped Databases (LMDB) and LevelDB database formats can be efficiently process as input data. They are only good for 1-of-k classification. These are the recommended data formats for 1-of-k classification due to Caffe's efficiency in reading the dataset.
data_params
Required
source
: the name of the directory containing the databasebatch_size
: the number of inputs to process at one time
Optional
backend
[default LEVELDB]: choose whether to use a LEVELDB or LMDBrand_skip
: skip this number of inputs at the beginning. This can be useful for async sgd
For other available data layer transformations see src/caffe/proto/caffe.proto under message DataParameter
.
layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: 1 crop_size: 227 mean_value: 104 mean_value: 117 mean_value: 123 } data_param { source: "examples/imagenet/ilsvrc12_train_lmdb" batch_size: 32 backend: LMDB } }
It is common but not required to have the same name for the layer and the top blob coming out of the layer; that is, in the prototxt files in each layer, name
and top
are usually the same.
Alternatively, the mean can be subtracted by passing a mean image and replacing all there mean_value lines with one mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
. This binaryproto
file can be created from an LMDB dataset as follows:
cd ~/caffe build/tools/compute_image_mean examples/imagenet/ilsvr12_train_lmdb data/ilsvrc12/imagenet_mean.binaryproto
replacing the examples/imagenet/ilsvr12_train_lmdb
and data/ilsvrc12/imagenet_mean.binaryproto
with the appropriate lmdb folder and desired binaryproto
file, respectively.
ImageData
Get images and labels directly from image files.
image_data_params
Required
source
: the name of the text file containing the path of the data inputs and labels
Optional
batch_size
[default 1]: the number of inputs to process at one timenew_height
[default 0]: resizes the height by warping height to this value; this is ignored if set to 0new_width
[default 0]: resizes the width by warping width to this value; this is ignored if set to 0shuffle
[default 0]: shuffles the data; this is ignored if set to 0rand_skip
[default 0]: skip this number of inputs at the beginning; maybe useful for async sgd
For other available image data transformation, see src/caffe/proto/caffe.proto under message ImageDataParameter
.
In this example the images are shuffled, cropped, mirrored, and have the mean subtracted.
layer { name: "data" type: "ImageData" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: true crop_size: 227 mean_value: 104 mean_value: 117 mean_value: 123 } image_data_param { source: "/path/to/file/train.txt" batch_size: 32 shuffle: 1 } }
Note that the text file has the image file names and corresponding labels. For example, "train.txt" looks like
/path/to/images/img3423.jpg 2 /path/to/images/img3424.jpg 13 /path/to/images/img3425.jpg 8 ...
Input
Uses a blob of zeros as input data with the dimensions specified. This is usually used to time the forward and backward propagations. More information on timing a network is at the end of the Training section.
input_params
Required
shape
: used to define 1 or multiple shapes to top blob(s)
layer { name: "input" type: "Input" top: "data" input_param { shape { dim: 32 dim: 3 dim: 227 dim: 227 } } }
Equivalently, the layer can be written as:
input: "data" input_dim: 32 input_dim: 3 input_dim: 227 input_dim: 227
DummyData
Similar to Input except the type of data can be specified. This is usually used for debugging but can also be used to time the forward and backward propagations. Example based on this.
dummy_data_params
Required
shape
: used to define 1 or multiple shapes to top blob(s)
Optional
data_filler
[default ConstantFiller with value of 0]: specifies the values used in top blob
layer { name: "data" type: "DummyData" top: "data" include { phase: TRAIN } dummy_data_param { data_filler { type: "constant" value: 0.01 } shape { dim: 32 dim: 3 dim: 227 dim: 227 } } } layer { name: "data" type: "DummyData" top: "label" include { phase: TRAIN } dummy_data_param { data_filler { type: "constant" } shape { dim: 32 } } }
In this example there are two data layers, one for each top because the data provided to each top blob must be specified. Note that in Data, ImageData, or HDF5Data data layers, the information on the top blob for the label is in the source file.
MemoryData
The memory data layer reads data directly from memory, without copying it. In order to use it, call MemoryDataLayer::Reset
(from C++) or Net.set_input_arrays
(from Python) in order to specify a source of contiguous data (as 4D row major array), which is read one batch-sized chunk at a time.
This method can be slow as it may require copying the data into memory prior to using it. However, once in memory it is very efficient.
memory_data_param
Required
batch_size
,channels
,height
,width
: specify the size of input chunks to read from memory
layers { name: "data" type: MEMORY_DATA top: "data" top: "label" transform_param { crop_size: 227 mirror: true mean_file: "mean.binaryproto" } memory_data_param { batch_size: 32 channels: 3 height: 227 width: 227 }
HDF5Data
Reads arbitrary data from HDF5 files. Good for any task but only uses FP32 and FP64 data (not uint8), so image data will be huge. Does not allow transform_param
. Only use this if necessary.
hdf5_data_param
Required
source
: the name of the text file containing the path of the data inputs and labelsbatch_size
Optional
shuffle
[default false]: shuffle the HDF5 files
layer { name: "data" type: "HDF5_DATA" top: "data" top: "label" include { phase: TRAIN } hdf5_data_param { source: "examples/hdf5_classification/data/train.txt" batch_size: 32 } }
HDF5DataOutput
The HDF5 output layer performs the opposite function of the other layers in this section; it writes its input blobs to disk.
hdf5_output_param
Required
file_name
layer { name: "data_output" type: "HDF5_OUTPUT" bottom: "data" bottom: "label" include { phase: TRAIN } hdf5_output_param { file_name: "output_file.h5" } }
WindowData
Made for detection. Read windows from image files class labels.
window_data_param
Required
source
: specify the data sourcemean_file
batch_size
Optional
mirror
crop_size
: randomly crop an imagecrop_mode
[default "warp"]: mode of cropping detection window; for example, "warp" warps to fixed size; "square" crops tightest square around the windowfg_threshold
[default 0.5]: foreground (object) overlap thresholdbg_threshold
[default 0.5]: background (object) overlap thresholdfg_fraction
[default 0.25]: fraction of batch that should be foreground objectscontext_pad
[default 10]: amount of contextual padding around a window
For other available window data transformation, see src/caffe/proto/caffe.proto under message WindowDataParameter
.
layers { name: "data" type: "WINDOW_DATA" top: "data" top: "label" window_data_param { source: "/path/to/file/window_train.txt" mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" batch_size: 128 mirror: true crop_size: 227 fg_threshold: 0.5 bg_threshold: 0.5 fg_fraction: 0.25 context_pad: 16 } }
Dataset preparation
The recommended data format for 1-of-k classification is LMDB. In order to use Caffe's tools to make LMDBs from exthe following are required:
- A folder with the data
- The output folders, for example, mydataset_train_lmdb, must be non-existent
- A text file with the image file names and corresponding labels, for example, "train.txt" looks like
img3423.jpg 2 img3424.jpg 13 img3425.jpg 8 ...
Note that if the data is dispersed in various folders, train.txt can contain the full path to the data points.
The create_label_file.py is a simple script that creates a training and validation text file for Kaggle's Dog vs Cats competition and can easily be adapted to other tasks.
Note that in testing we assume that the labels are missing. If labels are available these same steps can be applied to prepare an LMDB test dataset.
Preparing data with three channels (for example, RGB images)
The example below (based on this) produces a training LMDB, and requires train.txt. It runs from the $CAFFE_ROOT
directory.
#!/usr/bin/env sh # folder containing the training and validation images TRAIN_DATA_ROOT=/path/to/training/images # folder containing the file with the name of training images DATA=/path/to/file # folder for the lmdb datasets OUTPUT=/path/to/output/directory TOOLS=/path/to/caffe/build/tools # Set to resize the images to 256x256 RESIZE_HEIGHT=256 RESIZE_WIDTH=256 echo "Creating train lmdb..." # Delete the shuffle line if shuffle is not desired GLOG_logtostderr=1 $TOOLS/convert_imageset --resize_height=$RESIZE_HEIGHT --resize_width=$RESIZE_WIDTH --shuffle $TRAIN_DATA_ROOT/ $DATA/train.txt $OUTPUT/mydataset_train_lmdb echo "Done."
Computing the mean of the images in an LMDB dataset:
#!/usr/bin/env sh # Compute the mean image in lmdb dataset OUTPUT=/path/to/output/directory # folder for the lmdb datasets and output for mean image TOOLS=/path/to/caffe/build/tools $TOOLS/compute_image_mean $OUTPUT/mydataset_train_lmdb $OUTPUT/train_mean.binaryproto $TOOLS/compute_image_mean $OUTPUT/mydataset_val_lmdb $OUTPUT/val_mean.binaryproto
Preparing data with various channels
Gray scale images (one channel), RADAR images (two channels), videos (four channels), image+depth (four channels), vibrometry (one channel), and spectrograms (one channel) required a wrapper in order to set the LMDB dataset (see this blog script as a guide).
Resizing images
There are two common approaches to resizing images:
- warp an image to the desired size
- proportionally resize with the smaller size being the desired size, and then center crop the large side to the desired size
Resizing can occur in a number of ways:
- via OpenCV* as part making the LMDB folder, for example,
build/tools/convert_imageset --resize_height=256 --resize_width=256
warps image to desired size;convert_imageset
callsReadImageToDatum
which callsReadImageToCVMat
incaffe/src/util/io.cpp
- via ImageMagick, for example,
convert -resize 256x256\! <input_img> <output_img>
warps image to desired size - via OpenCV using a script that allows for multithreading image conversion in
tools/extra/resize_and_crop_images.py
proportionally resizes and then center crops. This requires:
sudo pip install git+https://github.com/Yangqing/mincepie.git sudo apt-get install -y python-opencv vi tools/extra/launch_resize_and_crop_images.sh # set number of clients (use num_of_cores*2); file.txt, input, and output folders
In addition, as part of the data layer the images can be crop or resized:
layer { name: "data" transform_param { crop_size: 227 ... }
which crops an image (at random during during training and the center image during testing), and
layer { name: "data" image_data_param { new_height: 227 new_width: 227 ... }
warps the image to the new_height
or new_width
using OpenCV.
Training
Training requires:
train_val.prototxt
: defines the network architecture, initialization parameters, and local learning ratessolver.prototxt
: defines optimization/training parameters and serves as the actual file that is called to train a deep networkdeploy.prototxt
: used only in testing. It must be exactly the same astrain_val.prototxt
except from the input layer(s), loss layer(s), and weights initialization (e.,gweight_filler
) as the latter two do not exist indeploy.prototxt
.
It is common but not required to have the same name for the layer and the blob coming out of the layer. In the prototxt files in each layer name
and top
are usually the same.
A description of what each layer does can be found here. Initialization parameters are extremely important. They are set here. Some additional tips worth mentioning:
- weight_filter initialization (for
ReLU
units,MSRAFiller
is usually better thanxavier
, andxavier
is usually better thangaussian
; note forMSRAFiller
andxavier
there is no need to manually specifystd
) gaussian
: samples weights from Gaussian distributionN(0,std)
xavier
: samples weights from uniform distributionU(-a,a)
, wherea=sqrt(3/fan_in)
, wherefan_in
is the number of incoming inputsMSRAFiller
: samples weights from normal distributionN(0,a)
, wherea=sqrt(2/fan_in)
base_lr
: initial learning rate (default:.01, change to a smaller number if getting NAN loss in training)lr_mult
: for the bias is usually set to 2x thelr_mult
for the non-bias weights
LeNet example lenet_train_test.prototxt, deploy.prototxt, and solver.prototxt described below (comments about what each variable means are included):
solver.prototxt
# The train/validation net protocol buffer definition, that is, the training architecture net: "examples/mnist/lenet_train_test.prototxt" # Note: 1 iteration = 1 forward pass over all the images in one batch # Carry out a validation test every 500 training iterations. test_interval: 500 # test_iter specifies how many forward passes the validation test should carry out # a good number is num_val_imgs / batch_size (see batch_size in Data layer in phase TEST in train_test.prototxt) test_iter: 100 # The base learning rate, momentum and the weight decay of the network. base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005 # We want to initially move fast towards the local minimum and as we approach it, we want to move slower # To this end, there are various learning rates policies available: # fixed: always return base_lr. # step: return base_lr * gamma ^ (floor(iter / step)) # exp: return base_lr * gamma ^ iter # inv: return base_lr * (1 + gamma * iter) ^ (- power) # multistep: similar to step but it allows non uniform steps defined by stepvalue # poly: the effective learning rate follows a polynomial decay, to be zero by the max_iter: return base_lr (1 - iter/max_iter) ^ (power) # sigmoid: the effective learning rate follows a sigmod decay: return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize)))) lr_policy: "step" gamma: 0.1 stepsize: 10000 # Drop the learning rate in steps by a factor of gamma every stepsize iterations # Display every 100 iterations display: 100 # The maximum number of iterations max_iter: 10000 # snapshot intermediate results, that is, every 5000 iterations it saves a snapshot of the weights snapshot: 5000 snapshot_prefix: "examples/mnist/lenet_multistep" # solver mode: CPU or GPU solver_mode: CPU
Train the network:
# The name of the output file (aka the trained weights) is in solver.prototxt $CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt
Training will produce two types of files (note the 10000
is the number of completed iterations):
lenet_multistep_10000.caffemodel
: weights of the architecture to be used in testinglenet_multistep_10000.solverstate
: used if training dies (for example, power outage) to resume training from current iteration
To train the network and plot the validation accuracy or loss vs iterations:
#CHART_TYPE=[0-7] # 0: Test accuracy vs. Iters # 1: Test accuracy vs. Seconds # 2: Test loss vs. Iters # 3: Test loss vs. Seconds # 4: Train learning rate vs. Iters # 5: Train learning rate vs. Seconds # 6: Train loss vs. Iters # 7: Train loss vs. Seconds CHART_TYPE=0 $CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt 2>&1 | tee logfile.log python $CAFFE_ROOT/tools/extra/plot_training_log.py.example $CHART_TYPE name_of_plot.png logfile.log
Dropout can be used in connection with a fully connected layer. It is only used to reduce overfitting by dropping a percentage of different weights during each forward pass which prevents coadaptations between the weights. It is ignored in testing.
layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } }
Measuring forward and backward propagation time (not weight updates):
# Computes 50 iterations and returns forward, backward, and total time and the average # note that the training samples and mean.binaryproto may be required or # alternatively, use dummy variables NUMITER=50 /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER
For consistency in the timings, the Linux utility numactl can be used to allocate memory buffers in MCDRAM:
numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER
Model Zoo
The Caffe Model Zoo is a collection of trained deep learning models and/or prototxt files used for a variety of tasks. These models can be used in fine-tuning or testing.
Multinode distributed training
The material in this section is based on Intel's Caffe Github wiki. There are two main approaches to distribute the training across multiple nodes: model parallelism and data parallelism. In model parallelism, the model is divided among the nodes and each node has the full data batch. In data parallelism, the data batch is divided among the nodes and each node has the full model. Data parallelism is especially useful when the number of weights in a model is small and when the data batch is large. A hybrid model and data parallelism is possible where layers with few weights such as the convolutional layers are trained using the data parallelism approach and layers with many weights such as fully connected layers are trained using the model parallelism approach. Intel has published a theoretical analysis to optimally trade between data and model parallelism in this hybrid approach.
Given the recent popularity of deep networks with fewer weights such as GoogleNet and ResNet and the success of distribute training using data parallelism, Caffe optimized for Intel architecture supports data parallelism. Multinode distributed training is currently under active development with newer features being evaluated.
To train across various nodes make sure these two lines are in to Makefile.config
USE_MPI := 1 # update with the path to binary MPI library CXX := /usr/bin/mpicxx
Using multinode is as simple as:
mpirun --hostfile path/to/hostfile -n <num_processes> /path/to/caffe/build/tools/caffe train --solver=/path/to/solver.prototxt --param_server=mpi
where <num_processes>
is the number of nodes to use, and hostfile
contains the ip addresses of the nodes per line. Note that solver.prototxt
points to the train.prototxt
in each node, and each train.prototxt
needs to points to a different portion of the dataset. For more details, click here.
Fine-tuning
Recycle the layer definition prototxt file and make two changes.
1. Change the data layer to include the new data (note the scale is 1/255
):
layer { name: "mnist" type: "Data" top: "data" top: "label" transform_param { scale: 0.00390625 } data_param { source: "newdata_lmdb" # CHANGED THIS LINE TO THE NEW DATASET batch_size: 64 backend: LMDB } }
2. Change the last layer, in this case ip2
(in testing, make this same change to the deploy.prototxt
file):
layer { name: "ip2-ft" # CHANGED THIS LINE type: "InnerProduct" bottom: "ip1" top: "ip2-ft" # CHANGED THIS LINE param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 2 #CHANGED THIS LINE TO THE NUMBER OF CLASSES IN NEW DATASET bias_filler { type: "constant" } } }
Invoke Caffe:
#From the command line on $CAFFE_ROOT ./build/tools/caffe train -solver /path/to/solver.prototxt -weights /path/to/trained_model.caffemodel
Fine-tuning guidelines
- Learn the last layer first (earlier layer weights won't change very much in fine-tuning)
- Drop the initial learning rate (in the
solver.prototxt
) by10x
or100x
- Caffe layers have local learning rates:
lr_mult
- Freeze all but the last layer (and perhaps second to last layer) for fast optimization, that is,
lr_mult=0
in local learning rates - Increase local learning rate of last layer by
10x
and second to last by5x
- Stop if good enough or keep fine-tuning other layers
What happens under the hood:
- Creates a new network
- Copies the previous weights to initialized network weights
- Solves in the usual way (see example)
Testing
Testing also known as inference, classification, or scoring can be done in Python or using the native C++ utility that ships with Caffe. To classify an image (or signal) or set of images the following is needed:
- Image(s)
- Network architecture
- Network weights
Testing using the native C++ utility is less flexible, and using Python is preferred. The protoxt file with the model should have phase: TEST
in the data layer with the testing dataset in order to test the model.
/path/to/caffe/build/tools/caffe test -model /path/to/train_val.prototxt - weights /path/to/trained_model.caffemodel -iterations <num_iter>
This example was adapted from this blog. To classify an image using a pretrained model, first download the pretrained model:
./scripts/download_model_binary.py models/bvlc_reference_caffenet
Next, download the dataset (ILSVRC 2012 in this example) labels (also called the synset
file) which is required in order to map a prediction to the name of the class:
./data/ilsvrc12/get_ilsvrc_aux.sh
Then classify an image:
./build/examples/cpp_classification/classification.bin models/bvlc_reference_caffenet/deploy.prototxt models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel data/ilsvrc12/imagenet_mean.binaryproto data/ilsvrc12/synset_words.txt examples/images/cat.jpg
The output should look like this:
---------- Prediction for examples/images/cat.jpg ---------- 0.3134 - "n02123045 tabby, tabby cat" 0.2380 - "n02123159 tiger cat" 0.1235 - "n02124075 Egyptian cat" 0.1003 - "n02119022 red fox, Vulpes vulpes" 0.0715 - "n02127052 lynx, catamount"
Feature extractor and visualization
In a convolutional layer the weights from one layer to the next can be represented by a blob: output_feature_maps
x height
x width
x input_feature_maps
(feature_maps
also known as channels
). There are two options for using networks trained in Caffe as feature extractors: The first option (recommended) is to use the Python API. The second option is to use the native C++ utility that ships with Caffe:
# Download model params scripts/download_model_binary.py models/bvlc_reference_caffenet # Generate a list of the files to process # Use the images that ship with caffe find `pwd`/examples/images -type f -exec echo {} ; > examples/images/test.txt # Add a 0 to the end of each line # input data structures expect labels after each image file name sed -i "s/$/ 0/" examples/images/test.txt # Get the mean of trainint set to subtract it from images ./data/ilsvrc12/get_ilsvrc_aux.sh # Copy and modify the data layer to load and resize the images: cp examples/feature_extraction/imagenet_val.prototxt examples/images vi examples/iamges/imagenet_val.prototxt # Extract features ./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/images/imagenet_val.prototxt fc7 examples/images/features 10 lmdb
The feature blob is extracted above from fc7
which represents the highest level feature of the reference model. Alternatively, other layers can be used as well, such as conv5
or pool3
. The last parameter above 10 lmdb
is the mini-batch size. The features are stored to LevelDB examples/images/features
, ready for access by some other code.
Using the Python* API
Understanding this section is not required to start using Caffe. This section based in this blog. The Python interface is handy in testing, classifying, and feature extraction, and can also be used to define and train networks.
Setting up Python Caffe
Make sure make pycaffe
was called when compiling Caffe. In Python first import the caffe module:
# Make sure that caffe is on the python path: # (alternatively set PYTHONCAFFE var as explained the installation) import sys CAFFE_ROOT = '/path/to/caffe/' sys.path.insert(0, CAFFE_ROOT + 'python') import caffe caffe.set_mode_cpu()
Loading the network architecture
The network architecture can be found in the train_val.prototxt
or deploy.prototxt
files. To load the network:
net = caffe.Net('train_val.prototxt', caffe.TRAIN)
or if loading a specific set of weights, do this instead:
net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)
The reason to use caffe.TRAIN
is because caffe.TEST
crashes if run twice and caffe.TRAIN
appears to give the same results.
The net
contains data blobs (net.blobs
) and parameter weight blobs (net.params
). In the commands below conv1
can be replaced with the name of any other layer:
net.blobs['conv1']
: data output at theconv1
layer known as feature mapsnet.params['conv1'][0]
: weight blob at theconv1
layernet.params['conv1'][1]
: bias blob at theconv1
layernet.blobs.items()
: returns the data blob for all the layers - useful in afor
loop to cycle through the layers
Visualizing the network
To display the network, first install the pydot
module and graphviz
sudo apt-get install -y GraphViz
sudo pip install pydot
Run the draw_net.py
python script:
python python/draw_net.py examples/net_surgery/deploy.prototxt train_val_net.png
open train_val_net.png
Data input
Input data into the data layer blob using one of the following techniques:
- modify data layer to match the size of the image:
import numpy as np # get input image and arrange it as a 4-D tensor im = np.array(Image.open('/path/to/caffe/examples/images/cat_gray.jpg')) im = im[np.newaxis, np.newaxis, :, :] # resize the blob to be the size of the input image net.blobs['data'].reshape(im.shape) # if the image input is different # compute the blobs given the input data net.blobs['data'].data[...] = im
- modify the input data to match the size of the expected input of the data layer:
im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg') shape = net.blobs['data'].data.shape # resize the img to be the size of the data blob im = caffe.io.resize(im, shape[3], shape[2], shape[1]) # compute the blobs given the input data net.blobs['data'].data[...] = im
There are common transformations to the input data that are commonly applied:
net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN) transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) ilsvrc_mean = 'python/caffe/imagenet/ilsvrc_2012_mean.npy' transformer.set_mean('data', np.load(ilsvrc_mean).mean(1).mean(1)) # puts the channel as the first dimention transformer.set_transpose('data', (2,0,1)) # (2,1,0) maps RGB to BGR for example transformer.set_channel_swap('data', (2,1,0)) transformer.set_raw_scale('data', 255.0) # the batch size can be changed on-the-fly net.blobs['data'].reshape(1,3,227,227) # load the image in the data layer im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg') # transform the image and store it in the net.blob net.blobs['data'].data[...] = transformer.preprocess('data', im)
To view im
:
import matplotlib.pyplot as plt
plt.imshow(im)
Inference
The prediction of the net on the input image can be computed as follows:
# assumes that images are loaded
prediction = net.forward()
print 'predicted class:', prediction['prob'].argmax()
To time the forward propagation (this ignores the data preprocessing time):
timeit net.forward()
Another module that transforms the data and can be used to classify various data inputs simultaneously is the net.Classifier
. That is, the net.Classifier
can be used instead of having to use both the net.Net
and caffe.io.Transformer
.
im1 = caffe.io.load.images('/path/to/caffe/examples/images/cat.jpg') im2 = caffe.io.load.images('/path/to/caffe/examples/images/fish-bike.jpg') imgs = [im1, im2] ilsvrc_mean = '/path/to/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy' net = caffe.Classifier('deploy.prototxt', 'trained_model.caffemodel', mean=np.load(ilsvrc_mean).mean(1).mean(1), channel_swap=(2,1,0), raw_scale=255, image_dims=(256, 256)) prediction = net.predict(imgs) # predict takes any number of images print 'predicted classes:', prediction[0].argmax(), prediction[1].argmax()
If using a folder with many images, replace imgs
as follows (everything else stays the same):
IMAGES_FOLDER = '/path/to/folder/w/images/' import os images = os.listdir(IMAGES_FOLDER) imgs = [ caffe.io.load_image(IMAGES_FOLDER + im) for im in images ]
The entire test set may not fit in memory. Therefore, the predictions can be computed in batches, for example, batches of 100 images.
To view the probabilities of all the classes for im1
as a bar chart
plt.plot(prediction[0])
To time the full classification pipeline (including the im1
transformations) for 1 image with oversampling. Oversampling crops 10 images: the center, the corners, and their mirrors:
timeit net.predict([im1])
If oversample is set to false, it only crops the center:
timeit net.predict([im1], oversample=0)
Feature extraction and visualization
To examine the data at each a particular layers, for example, fc7
:
net.blobs['fc7'].data
To retrieve details of the networks' layers and shapes
# Retrieve details of the network's layers [(k, v.data.shape) for k, v in net.blobs.items()] # Retrieve weights of the network's layers [(k, v[0].data.shape) for k, v in net.params.items()] # Retrieve the features in the last fully connected layer # prior to outputting class probabilities feat = net.blobs['fc7'].data[4] # Retrieve size/dimensions of the array feat.shape
Visualizing the blobs:
# Assumes that the "net = caffe.Classifier" module has been called # and data has been formatted as in the example above # Take an array of shape (n, height, width) or (n, height, width, channels) # and visualize each (height, width) section in a grid # of size approx. sqrt(n) by sqrt(n) def vis_square(data, padsize=1, padval=0): # values between 0 and 1 data -= data.min() data /= data.max() # force the number of filters to be square n = int(np.ceil(np.sqrt(data.shape[0]))) padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3) data = np.pad(data, padding, mode='constant', constant_values=(padval, padval)) # tile the filters into an image data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1))) data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:]) plt.imshow(data) plt.rcParams['figure.figsize'] = (25.0, 20.0) # visualize the weights after the 1st conv layer net.params['conv1'][0].data.shape filters = net.params['conv1'][0].data vis_square(filters.transpose(0, 2, 3, 1)) # visualize the feature maps after 1st conv layer net.blobs['conv1'].data.shape feat = net.blobs['conv1'].data[0,:96] vis_square(feat, padval=1) # visualize the weights after the 2nd conv layer net.blobs['conv2'].data.shape feat = net.blobs['conv2'].data[0,:96] vis_square(feat, padval=1) # visualize the weights after the 2nd pool layer net.blobs['pool2'].data.shape feat = net.blobs['pool2'].data[0,:256] # change 256 to number of pool outputs vis_square(feat, padval=1) # Visualize the neuron activations for the 2nd fully-connected layer net.blobs['ip2'].data.shape feat = net.blobs['ip2'].data[0] plt.plot(feat.flat) plt.legend() plt.show()
Defining a network
A network can be defined in Python and saved to a prototxt file as follows:
from caffe import layers as L from caffe import params as P def lenet(lmdb, batch_size): # auto generated LeNet n = caffe.NetSpec() n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, transform_param=dict(scale=1./255), ntop=2) n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier')) n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX) n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier')) n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX) n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier')) n.relu1 = L.ReLU(n.ip1, in_place=True) n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier')) n.loss = L.SoftmaxWithLoss(n.ip2, n.label) return n.to_proto() with open('examples/mnist/lenet_auto_train.prototxt', 'w') as f: f.write(str(lenet('examples/mnist/mnist_train_lmdb', 64))) with open('examples/mnist/lenet_auto_test.prototxt', 'w') as f: f.write(str(lenet('examples/mnist/mnist_test_lmdb', 100)))
The code above will produce the following prototxt file:
layer { name: "data" type: "Data" top: "data" top: "label" transform_param { scale: 0.00392156862745 } data_param { source: "examples/mnist/mnist_train_lmdb" batch_size: 64 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 20 kernel_size: 5 weight_filler { type: "xavier" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" convolution_param { num_output: 50 kernel_size: 5 weight_filler { type: "xavier" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" inner_product_param { num_output: 500 weight_filler { type: "xavier" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" inner_product_param { num_output: 10 weight_filler { type: "xavier" } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }
Training networks
Load the solver in Python and do a forward propagation:
solver = caffe.get_solver('models/bvlc_reference_caffenet/solver.prototxt') net = caffe.Net('train_val.prototxt', caffe.TRAIN) solver.net.forward() # train net solver.test_nets[0].forward() # test net (there can be more than one)
To compute the gradients:
solver.net.backward()
The gradients values can be displayed as follows:
# data gradients net.blobs['conv1'].diff # weight gradients net.params['conv1'][0].diff # biases gradients net.params['conv1'][1].diff
To launch one iteration, a forward propagation, a backward propagation, and the update:
solver.step(1)
To launch all the iterations defined in the solver.prototxt
as max_iter
:
solver.step()
Debugging
This section is optional and meant for Caffe developers only.
A few tips to help in debugging:
- remove randomness
- compare caffemodels
- use Caffe's debug info
Removing randomness can be beneficial in order to reproduce behaviors and outputs. Removing randomness from non-associative floating point arithmetic operations is outside the scope of this article.
Adding randomness happens at various stages:
- the weights are usually randomly initialized following some (for example, Gaussian) distribution.
- the input images can be preprocessed by randomly flipping the image horizontally or randomly cropping various parts of the images (e.g., cropping 227x227 patches from a 256x256 images); and by randomly shuffling the images
- in the dropout layer in training some weights are randomly used and others are ignored
One solution is to use a seed. In the solver.prototxt
add the line:
# pick some value for random_seed that is greater or equal to 1, for example: random_seed: 42
This ensure the same "random" values are used. However, the seed may produce different values in different machines. The alternative and more robust when working across machines:
- Preparing the data using the same set of shuffled images, that is, do not reshuffle with each experiment
- In
train.prototxt
, in theImageData
layer, intransform_param
: do not crop and do not mirror the images. If smaller size images are required the warp the images in the image_data_param:
layer { name: "data" type: "ImageData" top: "data" top: "label" include { phase: TRAIN } transform_param { # mirror: true # crop_size: 227 mean_value: 104 mean_value: 117 mean_value: 123 } image_data_param { source: "/path/to/file/train.txt" batch_size: 32 new_height: 224 new_width: 224 } }
In train.prototxt
in the dropout layers, make dropout_ratio: 0
.
Other helpful guidelines
- In
solver.prototxt
change thelr_policy
tofixed
- In
solver.prototxt
add the linedebug_info: 1
To compare two caffemodels the following script returns the sum of the difference between all the weights in the caffemodels:
# Intel Corporation # Author: Ravi Panchumarthy import sys, os, argparse, time import pdb import numpy as np def get_args(): parser = argparse.ArgumentParser('Compare weights of two caffe models') parser.add_argument('-m1', dest='modelFile1', type=str, required=True, help='Caffe model weights file to compare') parser.add_argument('-m2', dest='modelFile2', type=str, required=True, help='Caffe model weights file to compare aganist') parser.add_argument('-n', dest='netFile', type=str, required=True, help='Network prototxt file associated with model') return parser.parse_args() if __name__ == "__main__": import caffe args = get_args() net = caffe.Net(args.netFile, args.modelFile1, caffe.TRAIN) net2compare = caffe.Net(args.netFile, args.modelFile2, caffe.TRAIN) wt_sumOfAbsDiffByName = dict() bias_sumOfAbsDiffByName = dict() for name, blobs in net.params.iteritems(): wt_diffTensor = np.subtract(net.params[name][0].data, net2compare.params[name][0].data) wt_absDiffTensor = np.absolute(wt_diffTensor) wt_sumOfAbsDiff = wt_absDiffTensor.sum() wt_sumOfAbsDiffByName.update({name : wt_sumOfAbsDiff}) # if args.layerDebug == 1: # print("%s : %s" % (name,wt_sumOfAbsDiff)) bias_diffTensor = np.subtract(net.params[name][1].data, net2compare.params[name][1].data) bias_absDiffTensor = np.absolute(bias_diffTensor) bias_sumOfAbsDiff = bias_absDiffTensor.sum() bias_sumOfAbsDiffByName.update({name : bias_sumOfAbsDiff}) print("\nThe sum of absolute difference of all layer's weight is : %s" % sum(wt_sumOfAbsDiffByName.values())) print("The sum of absolute difference of all layer's bias is : %s" % sum(bias_sumOfAbsDiffByName.values())) finalDiffVal = sum(wt_sumOfAbsDiffByName.values())+ sum(bias_sumOfAbsDiffByName.values()) print("The sum of absolute difference of all layers weight's and bias's is : %s" % finalDiffVal )
For further debugging, in Makefile.config
uncomment the line DEBUG := 1
, compile the code and run it with the command:
gdb /path/to/caffe/build/caffe
Once gdb
starts use the run
command and add the rest of the arguments
run train -solver /path/to/solver.prototxt
Examples
LeNet on MNIST
The purpose of this section is to show the steps for a particular experiment with preparing a dataset, training a model, and timing the model. The content is based on this and this.
Preparing datasets:
cd $CAFFE_ROOT ./data/mnist/get_mnist.sh # downloads MNIST dataset ./examples/mnist/create_mnist.sh # creates dataset in LMDB format
Training datasets:
# Reduce the number of iterations from 10K to 1K to quickly run through this example sed -i 's/max_iter: 10000/max_iter: 1000/g' examples/mnist/lenet_solver.prototxt ./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt
Timing the forward and backward propagations (not including weight updates):
./build/tools/caffe time --model=examples/mnist/lenet_train_test.prototxt -iterations 50 # runs on CPU
For consistency in the timings, the utility numactl can be used to allocate memory buffers in MCDRAM:
numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER
Testing the trained model. In this example it is tested in the validation test. In practice, it should be tested with a different dataset using the format below or the format explained above:
# the file with the model should have a 'phase: TEST' ./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_1000.caffemodel -iterations 50
Dogs vs Cats
Get an account with Kaggle and download the data. Note that you cannot just do wget because you must log in to Kaggle. Log in to Kaggle, download data, and transfer it to your machine.
Unzip dogvscat.zip and execute the dogvscat.sh
script contained in the zip file. This script is shown below for convenience.
#!/usr/bin/env sh CAFFE_ROOT=/path/to/caffe mkdir dogvscat DOG_VS_CAT_FOLDER=/path/to/dogvscat cd $DOG_VS_CAT_FOLDER ## Download datasets (requires first a login) #https://www.kaggle.com/c/dogs-vs-cats/download/train.zip #https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip # Unzip train and test data sudo apt-get -y install unzip unzip train.zip -d . unzip test1.zip -d . # Format data python create_label_file.py # creates 2 text files with labels for training and validation ./build_datasets.sh # build lmdbs # Download ImageNet pretrained weights (takes ~20 min) $CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet # Fine-tune weights in the AlexNet architecture (takes ~100 min) $CAFFE_ROOT/build/tools/caffe train -solver $DOG_VS_CAT_FOLDER/dogvscat_solver.prototxt -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel # Classify test dataset cd $DOGVSCAT_FOLDER python convert_binaryproto2npy.py python dogvscat_classify.py # Returns prediction.txt (takes ~30 min) # A better approach is to train five AlexNets w/init parameters from the same distribution, # fine-tune those five, and compute the average of the five networks
I submitted my results to Kaggle and got a score of 0.97566 accuracy (which would have placed 15th out of 215 had I competed).
PASCAL VOC Classification
Unzip voc2012.zip and execute the voc2012.sh
script (contained in the zip file and shown below for convenience). Type sudo chmod 700 *.sh
to make sure the scripts can be executed. It trains and runs AlexNet.
#!/usr/bin/env sh # Copy and unzip voc2012.zip (it contains this file) then run this file. But first # change paths in: voc2012.sh; build_datasets.sh; solvers/*; nets/*; classify.py # As you run various files, you can ignore the following error if it shows up: # libdc1394 error: Failed to initialize libdc1394 # set Caffe root directory CAFFE_ROOT=$CAFFE_ROOT VOC=/path/to/voc2012 chmod 700 *.sh # Download datasets # Details: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit if [ ! -f VOCtrainval_11-May-2012.tar ]; then wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar fi # VOCtraival_11-May-2012.tar contains the VOC folder with: # JPGImages: all jpg images # Annotations: objects and corresponding bounding box/pose/truncated/occluded per jpg # ImageSets: breaks the images by the type of task they are used for # SegmentationClass and SegmentationObject: segmented images (duplicate directories) tar -xvf VOCtrainval_11-May-2012.tar # Run Python scripts to create labeled text files python create_labeled_txt_file.py # Execute shell script to create training and validation lmdbs # Note that lmdbs directories w/the same name cannot exist prior to creating them ./build_datasets.sh # Execute following command to download caffenet pre-trained weights (takes ~20 min) # if weights exist already then the command is ignored $CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet # Fine-tune weights in the AlexNet architecture (takes ~60 min) # you can also chose one of six solvers: pascal_solver[1-6].prototxt $CAFFE_ROOT/build/tools/caffe train -solver $VOC/solvers/voc2012_solver.prototxt -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel # The lines below are not really needed; they served as examples on how to do some tasks # Test against voc2012_val_lmbd dataset (name of lmdb is the model under PHASE: test) $CAFFE_ROOT/build/tools/caffe test -model $VOC/nets/voc2012_train_val_ft678.prototxt -weights $VOC/weights_iter_5000.caffemodel -iterations 116 # Classify validation dataset: returns a file w/the labels of the val dataset # but it doesn't report accuracy (that would require some adjusting of the code) python convert_binaryproto2npy.py mkdir results python cls_confidence.py python average_precision.py # Note to submit results to the VOC scoreboard, retrain NN using the trainval set # and test on the unlabeled test data provided by VOC # A better approach is to train five CNNs w/init parameters from the same distribution, # fine-tune those five, and compute the average of the five networks
Additional VOC information (in case the reader is interested in learning more about VOC):
- PASCAL VOC datasets
- To compare methods or design choices
- uses the entire VOC2007 data, where all annotations (including test annotations) are available
- report cross-validation results using VOC2012 "trainval" set alone (no test annotations are provided from 2008 to 2012)
- most common metric is average precision (AP): the area under the precision/recall curve
- VOC 2012 Data Summary
- In 2008, there was a new dataset and each year more data was added. Therefore it is common to see published results in VOC2007 and VOC2012 (or VOC2011--no additional data for the classification and detection task between 2011 and 2012)
- 20 classes
- Training: 5,717 images, 13,609 objects
- Validation: 5,823 images, 13,841 objects
- Testing: 10,991 images
Current Caffe usages
This is a short list of popular Caffe usages. For a more comprehensive list, see the Caffe Model-Zoo.
- Ross Girshick et al., "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR, 2014. code
- Ross Girshick, "Fast R-CNN." ICCV, 2015. code
- Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, "Faster R-CNN: towards real-time object detection." NIPS, 2015. code
- Jonathan Long, Evan Shelhamer, Trevor Darrell, "Fully convolutional networks for semantic segmentation." CVPR, 2015
Further reading
- Caffe homepage
- Soumith Chintala, "Intel are CPU magicians." Oct. 2015
- Praddep Dubbey, "Myth Busted: General Purpose CPUs Can't Tackle Deep Neural Network Training." Oct. 2015
- Dipankar Das, et al., "Distributed Deep Learning Using Synchronous Stochastic Gradient Descent." Feb. 2016
- Yann LeCun, Yoshua Bengio and Geoffrey Hinton, "Deep Learning." Nature. May 2015
- Ian Goodfellow, Yoshua Bengio and Aaron Courville, "Deep Learning." MIT Press, 2016
- Jeff Donahue, "Sequences in Caffe." CVPR Tutorial, June 2015
- Andrej Karpathy, "Caffe Tutorial." Stanford CS 231n, 2015
- Xinlei Chen, "Caffe Tutorial." Carnegie Mellon University 16824, 2015
- Andrej Karpathy, "The Unreasonable Effectiveness of Recurrent Neural Networks", May 2015
- Oriol Vinyals, et al., "Show and Tell: A Neural Image Caption Generator." CVPR, June 2015
- Wei Hu, et al., "Deep convolutional neural networks for hyperspectral image classification." Journal of Sensors, 2015
- Clarifai demo: Pick an image or video from an URL or give it your own
- MIT Scene Recognition demo: Pick an image of a scene from an URL or give it your own