Library of Kernels

SSD* network features number of layers that have no corresponding kernels in the OpenVX* CNN list today. Yet, the Intel® OpenVX implementation supports the "custom kernels" extension (refer to the SDK User Guide) that allows writing OpenVX kernels in OpenCL. For gentle introduction to the topic, refer to <SDK_SAMPLES_ROOT>/samples/cnn_custom_kernel.

In this release, the kernels specific to the SSD (and Yolo*) are implemented as a dedicated library <SDK_SAMPLES_ROOT>/samples/cnn_custom_kernels_lib.

The library contains two components:

  • Registration plugin to the ModelOptimizer tool that defines mapping between SSD-specific layers and OpenVX kernels. Although it is as part of the sample package, its pre-compiled version is shipped with the Model Optimizer tool in $INTEL_CVSDK_DIR\mo\bin\CustomKernelsPlugins\libBuiltinCustomLayers.so. It is used during the Caffe to OpenVX code-generation phase.
  • Layers implementation as OpenCL™ kernels for OpenVX itself. It is used during execution of the resulting (generated) graph. It is also shipped with Model Optimizer in the $INTEL_CVSDK_DIR\mo\bin\CustomKernelsImpl\ folder with libcnn_csutom_kernels_lib.so library and OpenCL code for the kernels in the custom_cnn.cl).

Summary of the Workflow

Follow the steps below to get full application from the SSD* implementation in Caffe* using the Model Optimizer tool:

  1. Clone the Caffe git repository of the SSD to your machine from https://github.com/weiliu89/caffe/tree/ssd. You need to modify the code of one layer slightly, because OpenVX* does not allow dynamic tensor sizing. The next section covers this topic with more details.
  2. Compile Caffe with Model Optimizer with special interface wrappers. Set the path to the folder with the resulting libcaffe.so as FRAMEWORK_HOME environment. Refer to the Model Optimizer Developer Guide for details.
  3. Select a specific model (prototxt from the models list: https://github.com/weiliu89/caffe/tree/ssd/models). Download corresponding trained model data: https://github.com/weiliu89/caffe/tree/ssd#models.
  4. Generate OpenVX code for the topology using the Model Optimizer. The following sections explain this procedure. Model Optimizer produces the right OpenVX graph with custom kernels calls based on the expected kernels signatures provided by the registration plugin.
  5. Add pre- and post- processing, specific to the SSD, to a boilerplate code that Model Optimizer generates as explained in the Intel Deep Learning Model Optimizer Developer Guide and below. Also, add the code that loads the implementation of the custom kernels as shown below.

The rest of the chapter covers some steps in details.

Patching the original SSD* code for the OpenVX*

Currently, the OpenVX* has a limitation of not allowing the dynamic tensor sizing. So, you need to override that in the original Caffe* code for the Reshape method of the Detection output layer.

Specifically, add the following lines to the DetectionOutputLayer<Dtype>::Reshape method:

vector<int> top_shape2(2, 1);
top_shape2.push_back(keep_top_k_);
top_shape2.push_back(7);
top[0]->Reshape(top_shape2);

Model Optimizer Command Line for SSD*

Here is an example command line and output of Model Optimizer:

$ ./ModelOptimizer -w ./VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.caffemodel 
-d ./deploy.prototxt -p FP32 -f 1 -b 1 --target APLK -c 
Start working...
Framework plugin: CAFFE
Target type: APLK
Network type: CLASSIFICATION
Batch size: 1
Precision: FP32
Layer fusion: false
Horizontal layer fusion: PARTIAL
Output directory: Artifacts
Custom kernels directory: 
Code generation mode: RELEASE
Network input normalization: 1
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.crop 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.deconv 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.permute_flatten 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.2dsoftmax 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel org.khronos.nn_extension.softmax_layer 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.tensorcopy 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.permute 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.flatten 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.normalize 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.reshape 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.detection_output 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel org.khronos.nn_extension.activation_layer 
[ SAMPLE cnn_custom_kernel_lib ] Registering custom kernel com.intel.cnn.sample.prior_box 
Softmax 2d
OpenVX code was generated to 'Artifacts/VGG_VOC0712Plus_SSD_300x300_ft_deploy/generated_code' folder 
Normalized model was saved to: Artifacts/VGG_VOC0712Plus_SSD_300x300_ft_deploy/Binaries/

The tool generates the OpenVX* code and converts weights for the SSD* for the regular floating point codepath.

Refer to the Model Optimizer Developer Guide for details on the command line options.

Model Optimizer-Generated Code Explained

Unless no -o option, which defines the output directory, specified, ModelOptimizer generates the following files with the OpenVX* code:

  • graph.c/.h for the OpenVX* graph creation.

    This is a "read only" file. You should aplly any changes of the network topology or layer parameters to the original network descriptor and generate the code again.

  • load_weights.c/.h for routine that loads the generated binary data (e.g. convolution weights).

    This is a "read only" file. You should aplly any changes of the network topology or layer parameters to the original network descriptor and generate the code again.

  • graph_process.c/.h, which are entry points for image pre- and post-processing.

    This file is just a stub that you are free to change or replace for your application needs. For example, the next section covers adding of SSD*-specific processing to that.

  • bunch of helper files to help you build the application, for example, data precision converters, image loading etc. You are not expected to change these files.
  • main.c is a stub for an app. The following section describes adding of some SSD-specific code to that.
  • CMakeLists.txt and build.sh that allow you to compile the resulting code with cmake and make as explained below.

Refer to the Model Optimizer Developer Guide for further details.

Уведомление об оптимизации: 
Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.