sse4 vs. avx2

sse4 vs. avx2

I find that I'm getting near identical performance in speed for running CNNs on the CPU when using AVX2 and SSE4 as the extension (choosing and respectively)

Looking at the Intel hardware specification, AVX2 can do 32 single precision FP operations per second, while SSE4.2 can only do 8.  This difference is huge.  Why am I not seeing a similar difference in actual performance?

My platform:
CPU: core i7 8700K
OS: Ubuntu 16.04 64-bit
RAM: 32 GB
OpenVINO version: 2018.2.319
CNN configuration: 4 convolution layers + 3 FC layers (8 MB of coeffs)

Python code snippet for setting the CPU mode:

global plugin
plugin = IEPlugin(device="CPU", plugin_dirs= "")
net = cvsdk_det_net(model_xml, model_bin)
net.infer(inputs={input_blob: im})    # Time taken for infer is measured


2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


If your topology has only 2 types of layer (convolution and fc), then extension library isn't used at all. You can find the list of layers included in the extension lib here:

Most of the layers that are distributed as a part of CPU Plugin have runtime cpu features detection. So AVX2 code was executed in both cases on your platform.

Leave a Comment

Please sign in to add a comment. Not a member? Join today