I've been using Vtune Amplifier XE to monitor the performance of some software for which the CPI goes up as I increase the numbe of threads, the LLC misses are always zero. In addition to this I have noticed that the CPU around part of the software that handles floating point operations also increases in line with the number of threads. A tool tip in the Vtune GUI hints that the CPI might be going up due to port saturation, this is where my question comes from and it is more related to the Sandybridge architecture than Vtune per se. I'm trying to get a handle on what is being referred to by ports, is Vtune Amplier alluding to scheduler ports, there might be something in this because it could be saturation around the ports associated with the processing of floating point instructions, can someone please point me in right direction as to what 'Ports' Vtune is referring to.
For more complete information about compiler optimizations, see our Optimization Notice.