Hi, I'm trying to figure out why I get a 20% Haswell slowdown porting some C++ IIR filter loops from AVX128 to AVX256. Advisor's no help as both Check Dependencies and Check Memory Access Patterns always fail with internal errors all variations I've had time to try so far. Find Trip Counts and FLOPS fails too as it eventually causes a crash in the target process; it seems to add no information over Survey Target (which is the only bit of the workflow clearly working) but might reach the loops in question. Output from
"C:\Program Files (x86)\IntelSWTools\Advisor\bin32\advixe-feedback.exe" -create-bug-report Advisor2017InternalErrors
If there's a workaround to try to get unblocked short of factoring code out into a standalone .exe that'd be great. VTune is able to run the same profiling target without issue in all of its analyses but lacks the specificity I'm looking for as to why replacing _mm_load/store_pd() with _mm256_load/store_pd() per optimization manual guidance seems to result in degraded memory access.