Question: Does the SDE allow me to estimate performance gains with AVX?

Question: Does the SDE allow me to estimate performance gains with AVX?

Hi,
I'm new to the AVX instruction set and the SDE emulation tools, etc... I've just been reading the overview material re the SDE and am not sure if it will allow me to estimate the performance gains that I might see when using the AVX-aware Intel C/C++ compiler. (Yes, I saw that such a compiler is in Beta, and might apply to participate in the Beta, but...) From what I've read, the SDE and tools allow me to look at the instruction mix, basic blocks, memory/register contents, etc. But I'm not sure what facilities are available to help me estimate how much faster my code will actually run with AVX (other than by manually counting instructions and memory references, etc). Can someone please let me know if/how the tools can help with such performance estimation?

Thanks and regards,

David

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

No, the public SDE doesn't include a count of expected clock cycles or the like. My approach, informally, has been to profile the performance on current Core i7, and determine the time currently spent in loops which qualify for 32-byte aligned vectorization, which ought to see the performance improvement. cf Amdahl's law. Even with the restricted SDE which might count minimum cycles, it seems unlikely to give a useful result for a large application, and we haven't been encouraged to think otherwise.
Others posting here seem to be interested in serial code, for later versions of AVX where fused multiply-add could shorten dependency chains. There again, one would assess hot spots which would qualify for such speedup.
We have been doing a lot of testing of the beta compiler on SSE2 machines, with the dual path -axAVX build, to locate compiler failures, e.g. where it fails to compile or breaks the SSE2 code. We are hoping to see improvement on the high code size expansion, which necessarily is big with the dual paths. It looks inescapable that large sections of a dual path executable will require no-AVX specification, depending on profiling to make the tradeoff function by function according to potential AVX performance gain, but the only obvious way to do it is with Makefile. Even a counting SDE seems unlikely to tell how much is lost by increased instruction cache misses associated with code size increase.

might want to checkout Intel Architecture Code Analyzer tool posted on whatif.intel.com last night. See: http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/

Login to leave a comment.