I've noticed that the CORE-AVX2 or xHost option sometimes produces AVX-128 code (about equivalent to SSE) where the AVX option produces AVX-256. I've submitted a premier report in case this may be accepted as a bug.
It seems I was over-confident in assuming that AVX2 should perform at least as well as AVX. Such expectation seems to work out more often with F77 source code in conjunction with testing the various directives (much ifdefing of directives by architecture). !dir$ simd or vector aligned may work with array assignment, but of course !$omp simd does not.
In some of these cases, /QaxAVX2 removes vectorization entirely even though /QaxAVX produces both AVX and SSE2 vector code. The vector speedup estimate shows that SSE vectorization would kill performance (on some long-disappeared CPU?) and the vec-report advises use of directives. Unfortunately, directives ruin performance sometimes when there is good vectorization without them. Where I have to tinker with directives, the estimated vector speedup may be OK for one of the alternatives but wrong for others.
Vector speedup estimate is done through the vec-report7 option and python script with compilers 13.1 and 14.0. It moves to opt-report4 with 15.0. I'm guessing the numbers quoted there may relate to the "seems inefficient" diagnostic issued when the compiler decides not to vectorize.