Cache Identifier

Cache Identifier

Hi everyone!

I'd like to find out which cores share a particular cache. With the 'cpuid' command I found lots of useful information, but I still need some sort of unique cache identifier to really determine which cores use which cache.

Does anyone know how to get this information? Or is there another way to get the information?

Thanks in advance!
Robert

16 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Roman Dementiev (Intel)

Robert,

did you trythe cache topology enumeration algorithm/utility described in the article about "Intel 64 Architecture Processor Topology Enumeration".

Roman

Hey Roman

Yes I quickly looked at it, but I stumbled over these 'affinity masks'. As far as I understood it, this information is from the operating system, right? (or can this information be obtained from the hardware?)

The problem with this is, that I cannot rely on an operating system to do the work, since the code is for an operating system :)

Bild des Benutzers Patrick Fay (Intel)

The affinity masks are used to just get the cpuid information from each cpu.
We have to read the cpuid info from all the cpus.
What are you trying to accomplish?
An OS independent way of figuring out cache sharing?
You can't really get a method that doesn't use some aspect ofan OS.
Or do you wanta way that works on multiple OS's?
Pat

Yes, it should be an OS independent way of figuring out cache sharing. Because this code is the part of the operating system that gathers this information...

So there is no way to get this information from the hardware directly? ... Then I guess I have to come up with another technique to determine which caches belong to which core ...

Bild des Benutzers Patrick Fay (Intel)

I would say it like this:
I don't see how you can get the cpuid information fromall thecpus without using some facility of the OS to switch your software fromone cpu to another cpu.
Certainly you can get the cpuid info from the cpu you are currently running on without the OS but you need the cpuid info from ALL the cpus.

Most of the enumeration library is windows & linux OS independent and the OS specific code is in util_os.c.
The library code is not "part of the operating system" but util_os.c does call OS routines to move the thread from 1 cpu to the next.
Hope this helps,
Pat

Bild des Benutzers Patrick Fay (Intel)

Also, on Windows, system routines like GetLogicalProcessorInformationEx() will detail which cpus share a cache. See http://msdn.microsoft.com/en-us/library/windows/desktop/dd405488%28v=vs.85%29.aspx

Yes the switching is necessary and this is already done!

Edit:
Thanks, then I'll have a closer look at the enumeration algorithm since it can be executed independently!

Hi again

I looked at the topology enumeration algorithm provided by Intel. I think I understood the basic concept, but there are still some things that work incorrectly.
For example I wrote some lines to gather the information about a cache at a specific level (see below).
The log_roundToNearestPof2 performs the same operation as described in the documentation (and cpuid just calls CPUID and stores the values of all registers in the parameters).
This piece of code is then executed on all levels (subLevelIndex) and on all processors.

uint32_t eax, ebx, ecx, edx;
	eax = 1;

	ecx = 0;

	cpuid(&eax, &ebx, &ecx, &edx);

	const uint8_t initialAPICID = 0xff & (ebx >> 24);
	eax = 4;

	ecx = subLevelIndex;

	cpuid(&eax, &ebx, &ecx, &edx);
	const uint8_t levelType = 0xf & eax;

	const char* levelName[] = {"Invalid", "Data Cache      ", "Instruction Cache", "Unified Cache"};
	const uint16_t cacheMaskWidth = log_roundToNearestPof2(((eax >> 14) & 0xfff) + 1);

	const uint32_t mask = ~((-1) << cacheMaskWidth);

	const uint8_t threadsSharingCache = ((eax >> 14) & 0xfff) + 1;

	const uint32_t cacheID = mask & initialAPICID;
	printf("Level: %d (%s),t %d threads/cache, tCache ID = %dn",

			levelType, levelName[levelType], threadsSharingCache, cacheID);

Does anyone see the where the problem lies in this code?

Thanks in advance!
Robert

Bild des Benutzers Patrick Fay (Intel)

Hello Robert,
Can you give us a clue?
Perhaps include the output?
Thanks,
Pat

Sorry forgot all about that.
I executed the code on all 8 cores (with taskset) on my linux system. On Each core the code was executed for the subLevelIndex's 0 .. 3.

This is the output

Running on core 0

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 0

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 1

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 1

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 1

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 2

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 0

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 3

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 1

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 3

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 4

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 0

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 4

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 5

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 1

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 5

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 6

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 0

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 0

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 6

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 7

Level: 1 (Data Cache      ),	 2 threads/cache, 	Cache ID = 1

Level: 2 (Instruction Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 2 threads/cache, 	Cache ID = 1

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 7

When I execute cpu_topology (the enumeration algorithm provided by intel), I get the following output:

	Software visible enumeration in the system:

Number of logical processors visible to the OS: 8

Number of logical processors visible to this process: 8

Number of processor cores visible to this process: 4

Number of physical packages visible to this process: 1 
	Hierarchical counts by levels of processor topology:

 # of cores in package  0 visible to this process: 4 .

	 # of logical processors in Core 0 visible to this process: 2 .

	 # of logical processors in Core  1 visible to this process: 2 .

	 # of logical processors in Core  2 visible to this process: 2 .

	 # of logical processors in Core  3 visible to this process: 2 .
	Affinity masks per SMT thread, per core, per package:

Individual:

	P:0, C:0, T:0 --> 1

	P:0, C:0, T:1 --> 2
Core-aggregated:

	P:0, C:0 --> 3

Individual:

	P:0, C:1, T:0 --> 4

	P:0, C:1, T:1 --> 8
Core-aggregated:

	P:0, C:1 --> c

Individual:

	P:0, C:2, T:0 --> 10

	P:0, C:2, T:1 --> 20
Core-aggregated:

	P:0, C:2 --> 30

Individual:

	P:0, C:3, T:0 --> 40

	P:0, C:3, T:1 --> 80
Core-aggregated:

	P:0, C:3 --> c0
Pkg-aggregated:

	P:0 --> ff
	APIC ID listings from affinity masks

Affinity mask 00000001 - apic id 0

Affinity mask 00000002 - apic id 1

Affinity mask 00000004 - apic id 2

Affinity mask 00000008 - apic id 3

Affinity mask 00000010 - apic id 4

Affinity mask 00000020 - apic id 5

Affinity mask 00000040 - apic id 6

Affinity mask 00000080 - apic id 7
Package 0 Cache and Thread details
Box Description:

Cache  is cache level designator

Size   is cache size

OScpu# is cpu # as seen by OS

Core   is core#[_thread# if > 1 thread/core] inside socket

AffMsk is AffinityMask(extended hex) for core and thread

CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache

       CmbMsk will differ from AffMsk if > 1 hw_thread/cache

Extended Hex replaces trailing zeroes with 'z#'

       where # is number of zeroes (so '8z5' is '0x800000')

L1D is Level 1 Data cache, size(KBytes)= 32,  Cores/cache= 2, Caches/package= 4

L1I is Level 1 Instruction cache, size(KBytes)= 32,  Cores/cache= 2, Caches/package= 4

L2 is Level 2 Unified cache, size(KBytes)= 256,  Cores/cache= 2, Caches/package= 4

L3 is Level 3 Unified cache, size(KBytes)= 6144,  Cores/cache= 8, Caches/package= 1

      +-----------+-----------+-----------+-----------+

Cache |  L1D      |  L1D      |  L1D      |  L1D      |

Size  |  32K      |  32K      |  32K      |  32K      |

OScpu#|    0     1|    2     3|    4     5|    6     7|

Core  |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|

AffMsk|    1     2|    4     8|   10    20|   40    80|

CmbMsk|    3      |    c      |   30      |   c0      |

      +-----------+-----------+-----------+-----------+
Cache |  L1I      |  L1I      |  L1I      |  L1I      |

Size  |  32K      |  32K      |  32K      |  32K      |

      +-----------+-----------+-----------+-----------+
Cache |   L2      |   L2      |   L2      |   L2      |

Size  | 256K      | 256K      | 256K      | 256K      |

      +-----------+-----------+-----------+-----------+
Cache |   L3                                          |

Size  |   6M                                          |

CmbMsk|   ff                                          |

      +-----------------------------------------------+

I hope this helps!

Bild des Benutzers Patrick Fay (Intel)

Thanks Robchip,
Your output looks reasonable.
But, as to whetherthe code will work on allIntel chips, I would have to go through the cpu_topology code, extract out the relevant lines, and compare it to what you've done.
I don't have the time to go through the code like this right now.
If you've extracted out the relevant code from the library correctly then it should work.
Sorry to not be more helpful,
Pat

Hey Pat

thanks for the answer!
You said, that the output looks reasonable - but then I have trouble understanding it:

how do I have to interpret the last line of each core:

Running on core 0

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 1

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 1

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 2

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 3

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 3

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 4

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 4

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 5

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 5

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 6

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 6

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Running on core 7

...

Level: 3 (Unified Cache),	 16 threads/cache, 	Cache ID = 7

shouldn't this cache ID always be the same since there is only one L3 cache shared by all threads?

Bild des Benutzers Patrick Fay (Intel)

Yourcache IDprobably should be the same.
Are you doing the same code method as in the cpu_topology library?
If not, why not just use the library?
Pat

Hmm, I looked at the code but I'm only querying the hardware not interpreting the information.
Yeah - using the library is a good idea, of course - but at the moment I'd just like to find the bug in the code :)

Thanks a lot for your help!

Hey everyone

just for completeness I'd like to post the solution to the problem:
the bug was in line 18 of the original code. The cacheId is calculated differently:

const uint32_t cacheID = initialAPICID & (-1 ^ mask)

that way every cacheID is unique.

Thanks everyone for the help!
Robert

Melden Sie sich an, um einen Kommentar zu hinterlassen.