Intel Optimized Caffe - MKL vs MKL-DNN

Intel Optimized Caffe - MKL vs MKL-DNN

Hi All,

Are there any plans to use MKL-DNN instead of MKL for Intel Optimized Caffe? I think, MKL-DNN will provide better code to hardware translation than MKL, but are there any case studies done by Intel?

Thanks.

Chetan Arvind Patil
5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

Dear Chetan,

If you build with the latest version of the Intel Caffe, you can see both the MKL2017 and MKLDNN packages getting downloaded. The Intel documentation says that the MKL_DNN is bundled with highly vectorized & threaded building blocks for implementation of CNN with c/C++ interfaces.

To test your network using MKL_DNN , either build with Makefile.config with USE_MKLDNN_AS_DEFAULT_ENGINE := 1 or build with CMAKE  with -DUSE_MKLDNN_AS_DEFAULT_ENGINE := 1. ( I think now Caffe builds with MKL2017 & MKLDNN by default, but to ensure , you may give this option in Make file /CMAKE)

Post that while executing the Caffe time, use --engine=MKLDNN. Kindly let me know how it goes?

Thanks

Anand

 

Hi Anand,

It seems both MKL and MKL-DNN are being compiled together. I am sharing below log of benchmarking. Some improvement compared to MKL, but training will reveal real performance gain.

Intel will suggest MKL-DNN usage or MKL for training? Should I just stick to MKL-DNN as default engine for my analysis? 

Benchmarking AlexNet with MKL-DNN: 

./build/tools/caffe time --model=models/bvlc_alexnet/train_val.prototxt -engine=MKL2017
I0929 11:14:41.197069 94698 caffe.cpp:603] Average time per layer:
I0929 11:14:41.197131 94698 caffe.cpp:606]       data   forward: 10.1842 ms.
I0929 11:14:41.197214 94698 caffe.cpp:610]       data   backward: 0.00414 ms.
I0929 11:14:41.197298 94698 caffe.cpp:606]      conv1   forward: 30.2765 ms.
I0929 11:14:41.197381 94698 caffe.cpp:610]      conv1   backward: 24.602 ms.
I0929 11:14:41.197463 94698 caffe.cpp:606]      relu1   forward: 2.43648 ms.
I0929 11:14:41.197728 94698 caffe.cpp:610]      relu1   backward: 3.9487 ms.
I0929 11:14:41.197835 94698 caffe.cpp:606]      norm1   forward: 8.51826 ms.
I0929 11:14:41.197949 94698 caffe.cpp:610]      norm1   backward: 4.96228 ms.
I0929 11:14:41.198061 94698 caffe.cpp:606]      pool1   forward: 3.1976 ms.
I0929 11:14:41.198420 94698 caffe.cpp:610]      pool1   backward: 5.53408 ms.
I0929 11:14:41.198541 94698 caffe.cpp:606]      conv2   forward: 33.187 ms.
I0929 11:14:41.199218 94698 caffe.cpp:610]      conv2   backward: 60.0811 ms.
I0929 11:14:41.199319 94698 caffe.cpp:606]      relu2   forward: 1.63926 ms.
I0929 11:14:41.199645 94698 caffe.cpp:610]      relu2   backward: 2.55934 ms.
I0929 11:14:41.199739 94698 caffe.cpp:606]      norm2   forward: 4.28016 ms.
I0929 11:14:41.200063 94698 caffe.cpp:610]      norm2   backward: 3.17122 ms.
I0929 11:14:41.200389 94698 caffe.cpp:606]      pool2   forward: 2.03388 ms.
I0929 11:14:41.200470 94698 caffe.cpp:610]      pool2   backward: 3.67968 ms.
I0929 11:14:41.200551 94698 caffe.cpp:606]      conv3   forward: 19.3136 ms.
I0929 11:14:41.200902 94698 caffe.cpp:610]      conv3   backward: 46.9721 ms.
I0929 11:14:41.200984 94698 caffe.cpp:606]      relu3   forward: 0.5654 ms.
I0929 11:14:41.201064 94698 caffe.cpp:610]      relu3   backward: 1.00884 ms.
I0929 11:14:41.201405 94698 caffe.cpp:606]      conv4   forward: 14.7033 ms.
I0929 11:14:41.201486 94698 caffe.cpp:610]      conv4   backward: 35.66 ms.
I0929 11:14:41.201567 94698 caffe.cpp:606]      relu4   forward: 0.60776 ms.
I0929 11:14:41.201654 94698 caffe.cpp:610]      relu4   backward: 1.22586 ms.
I0929 11:14:41.201755 94698 caffe.cpp:606]      conv5   forward: 10.1247 ms.
I0929 11:14:41.201835 94698 caffe.cpp:610]      conv5   backward: 24.1622 ms.
I0929 11:14:41.201916 94698 caffe.cpp:606]      relu5   forward: 0.42664 ms.
I0929 11:14:41.202003 94698 caffe.cpp:610]      relu5   backward: 0.59368 ms.
I0929 11:14:41.202083 94698 caffe.cpp:606]      pool5   forward: 0.69412 ms.
I0929 11:14:41.202163 94698 caffe.cpp:610]      pool5   backward: 1.39514 ms.
I0929 11:14:41.202242 94698 caffe.cpp:606]        fc6   forward: 10.6318 ms.
I0929 11:14:41.202324 94698 caffe.cpp:610]        fc6   backward: 16.6964 ms.
I0929 11:14:41.202405 94698 caffe.cpp:606]      relu6   forward: 0.31638 ms.
I0929 11:14:41.202486 94698 caffe.cpp:610]      relu6   backward: 0.0632 ms.
I0929 11:14:41.202567 94698 caffe.cpp:606]      drop6   forward: 0.4264 ms.
I0929 11:14:41.202647 94698 caffe.cpp:610]      drop6   backward: 0.22182 ms.
I0929 11:14:41.202736 94698 caffe.cpp:606]        fc7   forward: 4.7359 ms.
I0929 11:14:41.202816 94698 caffe.cpp:610]        fc7   backward: 18.8064 ms.
I0929 11:14:41.202898 94698 caffe.cpp:606]      relu7   forward: 0.2818 ms.
I0929 11:14:41.202985 94698 caffe.cpp:610]      relu7   backward: 0.0984 ms.
I0929 11:14:41.203065 94698 caffe.cpp:606]      drop7   forward: 0.37228 ms.
I0929 11:14:41.203145 94698 caffe.cpp:610]      drop7   backward: 0.27132 ms.
I0929 11:14:41.203225 94698 caffe.cpp:606]        fc8   forward: 2.2675 ms.
I0929 11:14:41.203305 94698 caffe.cpp:610]        fc8   backward: 3.74484 ms.
I0929 11:14:41.203478 94698 caffe.cpp:606]       loss   forward: 1.24444 ms.
I0929 11:14:41.203562 94698 caffe.cpp:610]       loss   backward: 0.28544 ms.
I0929 11:14:41.203680 94698 caffe.cpp:616] Average Forward pass: 163.006 ms.
I0929 11:14:41.203742 94698 caffe.cpp:619] Average Backward pass: 260.251 ms.
I0929 11:14:41.203807 94698 caffe.cpp:621] Average Forward-Backward: 423.64 ms.
I0929 11:14:41.203871 94698 caffe.cpp:624] Total Time: 21182 ms.
I0929 11:14:41.203935 94698 caffe.cpp:625] *** Benchmark ends ***

Benchmarking AlexNet with MKL:

./build/tools/caffe time --model=models/bvlc_alexnet/train_val.prototxt -engine=MKL2017
I0929 11:15:51.072782 94877 caffe.cpp:603] Average time per layer:
I0929 11:15:51.073092 94877 caffe.cpp:606]       data   forward: 9.21434 ms.
I0929 11:15:51.073175 94877 caffe.cpp:610]       data   backward: 0.0047 ms.
I0929 11:15:51.073508 94877 caffe.cpp:606]      conv1   forward: 29.5236 ms.
I0929 11:15:51.073595 94877 caffe.cpp:610]      conv1   backward: 24.226 ms.
I0929 11:15:51.073701 94877 caffe.cpp:606]      relu1   forward: 0.01546 ms.
I0929 11:15:51.074129 94877 caffe.cpp:610]      relu1   backward: 10.8779 ms.
I0929 11:15:51.074255 94877 caffe.cpp:606]      norm1   forward: 32.068 ms.
I0929 11:15:51.074375 94877 caffe.cpp:610]      norm1   backward: 19.3456 ms.
I0929 11:15:51.074625 94877 caffe.cpp:606]      pool1   forward: 6.0133 ms.
I0929 11:15:51.074765 94877 caffe.cpp:610]      pool1   backward: 8.86312 ms.
I0929 11:15:51.075006 94877 caffe.cpp:606]      conv2   forward: 34.2566 ms.
I0929 11:15:51.075117 94877 caffe.cpp:610]      conv2   backward: 57.471 ms.
I0929 11:15:51.075455 94877 caffe.cpp:606]      relu2   forward: 0.01504 ms.
I0929 11:15:51.075541 94877 caffe.cpp:610]      relu2   backward: 7.05758 ms.
I0929 11:15:51.075891 94877 caffe.cpp:606]      norm2   forward: 17.995 ms.
I0929 11:15:51.076226 94877 caffe.cpp:610]      norm2   backward: 15.1878 ms.
I0929 11:15:51.076309 94877 caffe.cpp:606]      pool2   forward: 3.74608 ms.
I0929 11:15:51.076390 94877 caffe.cpp:610]      pool2   backward: 5.1986 ms.
I0929 11:15:51.076489 94877 caffe.cpp:606]      conv3   forward: 17.9821 ms.
I0929 11:15:51.076572 94877 caffe.cpp:610]      conv3   backward: 47.6729 ms.
I0929 11:15:51.076653 94877 caffe.cpp:606]      relu3   forward: 0.0151 ms.
I0929 11:15:51.076776 94877 caffe.cpp:610]      relu3   backward: 2.3493 ms.
I0929 11:15:51.076859 94877 caffe.cpp:606]      conv4   forward: 14.1661 ms.
I0929 11:15:51.076942 94877 caffe.cpp:610]      conv4   backward: 32.9311 ms.
I0929 11:15:51.077039 94877 caffe.cpp:606]      relu4   forward: 0.01496 ms.
I0929 11:15:51.077121 94877 caffe.cpp:610]      relu4   backward: 2.36124 ms.
I0929 11:15:51.077201 94877 caffe.cpp:606]      conv5   forward: 9.6799 ms.
I0929 11:15:51.077289 94877 caffe.cpp:610]      conv5   backward: 21.8215 ms.
I0929 11:15:51.077371 94877 caffe.cpp:606]      relu5   forward: 0.01402 ms.
I0929 11:15:51.077452 94877 caffe.cpp:610]      relu5   backward: 1.7863 ms.
I0929 11:15:51.077539 94877 caffe.cpp:606]      pool5   forward: 1.374 ms.
I0929 11:15:51.077635 94877 caffe.cpp:610]      pool5   backward: 1.42016 ms.
I0929 11:15:51.077746 94877 caffe.cpp:606]        fc6   forward: 10.3947 ms.
I0929 11:15:51.077836 94877 caffe.cpp:610]        fc6   backward: 16.6605 ms.
I0929 11:15:51.077922 94877 caffe.cpp:606]      relu6   forward: 0.06918 ms.
I0929 11:15:51.078013 94877 caffe.cpp:610]      relu6   backward: 0.0984 ms.
I0929 11:15:51.078094 94877 caffe.cpp:606]      drop6   forward: 0.39066 ms.
I0929 11:15:51.078174 94877 caffe.cpp:610]      drop6   backward: 0.24138 ms.
I0929 11:15:51.078356 94877 caffe.cpp:606]        fc7   forward: 4.5181 ms.
I0929 11:15:51.078440 94877 caffe.cpp:610]        fc7   backward: 17.9564 ms.
I0929 11:15:51.078522 94877 caffe.cpp:606]      relu7   forward: 0.06822 ms.
I0929 11:15:51.078619 94877 caffe.cpp:610]      relu7   backward: 0.09658 ms.
I0929 11:15:51.078735 94877 caffe.cpp:606]      drop7   forward: 0.38122 ms.
I0929 11:15:51.078821 94877 caffe.cpp:610]      drop7   backward: 0.22254 ms.
I0929 11:15:51.078917 94877 caffe.cpp:606]        fc8   forward: 2.2504 ms.
I0929 11:15:51.078999 94877 caffe.cpp:610]        fc8   backward: 3.9006 ms.
I0929 11:15:51.079077 94877 caffe.cpp:606]       loss   forward: 1.06348 ms.
I0929 11:15:51.079164 94877 caffe.cpp:610]       loss   backward: 0.2898 ms.
I0929 11:15:51.079257 94877 caffe.cpp:616] Average Forward pass: 195.639 ms.
I0929 11:15:51.079319 94877 caffe.cpp:619] Average Backward pass: 298.556 ms.
I0929 11:15:51.079391 94877 caffe.cpp:621] Average Forward-Backward: 494.52 ms.
I0929 11:15:51.079465 94877 caffe.cpp:624] Total Time: 24726 ms.

Thanks.

Chetan Arvind Patil

Dear Chetan.

I cannot give a recommendation at this point of time. It is up to your choice. Intel claims MKL_DNN has primitives designed to handle lot of attributes like convolution, relu e.t.c. But I am not sure how Intel's road map in terms of MKL & MKL_DNN. Given a choice I would prefer MKLDNN looking at its advantages.

 

Can this thread also be closed ? or do you need more information on this

 

Thanks

Anand

Hi Anand,

Yes.

Thanks.

Chetan Arvind Patil

Leave a Comment

Please sign in to add a comment. Not a member? Join today