Question about address translation on Xeon 5600s L3 cache

Question about address translation on Xeon 5600s L3 cache

zhangyihere's picture

Hi ,all !
I am using Xeon 5650 processor. It shows its last-level cache is 12MB and shared among 6 cores.

What I am wondering is, 12 is not the power of 2, so if address falls in 12MB to 16MB, how to allocte L3 cache position for it?

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Hussam Mousa (Intel)'s picture

Hello,

The last level cache is actually split evenly across the 6 cores. So while each core can access (load) from the entire 12MB range, their requests will only be cached into their slice.

Most caches, including the LLC, use set associativite. This means that when address is mapped to a cache line, there are several locations that it can be written to. As opposed to direct mapping which has only one location per cache line.

You can read more about cache associativity on wikipedia: CPU cache

I hope this helps,
Hussam

zhangyihere's picture

Thank you Hussam,

I am sorry, I think I didn't give my question clearly enough. I make it again.

If the last level cache can entirely be accessed by 6 cores, it also means on each core, all the address space(we neglect physical address holes here) can use last level cache.

BUT on Xeon 5650, because the size of last level cache is not power of 2, if we directly use address divides cache size, it is not exactly divisible by all the addresses. For the undivisible addresses, they should use last level cache as well. But here is my question, how do these undivisible addresses be mapped to last level cache? If directly using undivisible remaider as cache index, it is unavoidablely some cache sets service more accesses. Therefore, accesses to last level cache is not evenly distributed.

Am I correct? Or is there some additional design at cache?

Thank you in advence!

Yi Zhang

suhailinternational's picture

Thank you hasam your answer is very help ful for me . i personlly thank you.

Hussam Mousa (Intel)'s picture
Hello Yi Zhang,

Let me make some clarrifications below.

Quoting zhangyihere If the last level cache can entirely be accessed by 6 cores, it also means on each core, all the address space(we neglect physical address holes here) can use last level cache.

The access to the portions of the Last Level cache by each core is different. Each core "owns" a part of the LLC which it will have it's reads brought into, i.e. if a line is not in the LLC it will be read from memory, sent to the core, AND written in the portion of the LLC assigned to this particular core. If another core then reads this same line, it will be able to access it in the first core's LLC portion.

This means that if a request from a core needs to replace a line in the LLC, it will only replace lines in the portion of the LLC allocated to this core and not to another core.

Quoting zhangyihere BUT on Xeon 5650, because the size of last level cache is not power of 2, if we directly use address divides cache size, it is not exactly divisible by all the addresses. For the undivisible addresses, they should use last level cache as well. But here is my question, how do these undivisible addresses be mapped to last level cache? If directly using undivisible remaider as cache index, it is unavoidablely some cache sets service more accesses. Therefore, accesses to last level cache is not evenly distributed.

The LLC is set associative. What this means is that eachaddressfrom thePhysical Addressspace will map into exactly oneposition in the LLC, however each position has several slots that can store several memory lines that have all mappedto this sameposition.

For exampleimagine addressesX1, X2,X3 all map to position A in the LLC. And imagine position A has 2 slots. So ifa read to X1 will bring it to[position A, slot 1]. a later read to X2 will bring X2 to [position A, slot 2]. If later X3 is read, then the LLC will need to decide to evict X1 or X2 since X3 can only be written to position A (slots 1 or 2).

In this example this is calles a 2-way set associative cache. In general you can have an N-way set associative cache. The number ofphsical address that canmap to each position is equal to (ADDRESS_SPACE / (CACHE_SIZE /SET_ASSOCIATIVITY_DEGREE) )

The operator is a DIV so they don't need to be perfect multiples in general, although in practice the value of (CACHE_SIZE / SET_ASSOCIATIVITY_DEGREE) iswhat needs to be a perfect divisor of the ADDRESS_SPACE.

Regarding the segregation of the the LLC across the cores,the key to understanding lies in understanding how the setsare distributed across the cores. Each portion of the LLC allocated to a core willslots that represent all the possible positions that a physical address can map to.

I hope this clarifies things some more.
-Hussam

Login to leave a comment.