I experience performance drop when using the model optimizer mo.py to convert models that are using batchnorm layers, when the output data type is float16. When the data type is float32 there is no issue.
I first saw this using custom a custom yolo model and our own testing scripts, but also reproduced with the mobilenet-ssd model found in the model zoo and with scoring on the Pascal VOC 2007 testing set.
In both cases a caffe model was used as input to the model optimizer. Issue seen when running on device MYRIAD (MyriadX).
By default the model optimizer fuses batch norm layers and I suspect that combined with having float 16 datatype, something goes wrong. When disabling fusing with the "--disable_fusing" option the output score is fine, but then of course the latency is twice as high. I test while setting the confidence_threshold in the deploy.prototxt to 0.01 to test over the whole confidence range. The difference in score is then bigger.
Mobilenet SSD, Pascal VOC 2007, FP16 on MYRIAD without fusing: 74.01 mAP
Mobilenet SSD, Pascal VOC 2007, FP16 on MYRIAD with fusing: 73.38 mAP