Threads overhead Nehalem vs Sandy-bridge vs Ivy-bridge

Sergey Kostrov escribió:

>>What "DSB" stands for?

When somebody uses an abbreviation, like DSB, and doesn't explain what it means for everybody is the most unpleasant thing on any forum. Personally, I don't have time to "hunt down" on the Internet for all these unexplained abbreviations if I don't know what they mean.

Yes I completely agree with you.

DSB is short for decode stream buffer... which is yhe uop cache.   Probanly googleable.  I asked that question on this forum and there is a topic I started here with that and many more acronyms which intel uses.

>>> If you're at 3+ in IPC.. and you're not hitting in the DSB>>>

This is obvious, but SB sustained rate of uops per cycle should be 4 uops and even AVX instruction are decoded into single uops.So when you are dealing with the code which uses a lot of SSE/AVX instructions which for execution need  <3 uops there should not be any starvation.

Excuse me, what is obvious to you?

You don't have uops from the ILD if you can't fetch enough B to decode, seems to make sense to me.  This was my original inquiry, how big are the instructions and what's the ipc.  If you haven't tried running many tests to isolate the theoretical limits of the ILD or DSB upon your chips, then don't treat anything as obvious.   It is unpleasent, esp. for someone like me who has run those tests and knows the differences in capability between the ILD on SB and IB, to have you state this is obvious.

I've determined in my workloads, my customers and through the SPEC suite what the sources of uops are (ILD, DSB or MS), are upon SB and IB.  If you've not done that or you haven't written directed tests to study these issues and collected performance counter data, please refrain from stating something is obvious.  


 This sentence is obvious to me "If you're at 3+ in IPC.. and you're not hitting in the DSB.. you may degrade performance".You probably misunderstood my post.


And it is clear that there is direct dependency on the front end x86 instructions fetching bandwidth.


