Purpose of CPUID Deterministic Cache Parameters Leaf

Purpose of CPUID Deterministic Cache Parameters Leaf

Hello, 

Can someone explain the purpose of having two separate cache leaves (leaves 2 and 4) for the cpuid instruction? I ask because on my Intel Xeon 5650 system, the data from leaf 2 does not include any info for the L1 data cache. Is it standard to put this in the info from leaf 4? Please advise. 

publicaciones de 17 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Please take a look at a Datasheet for your CPU at ark.intel.com web-page ( on a right side as soon as CPU is selected ). The Datasheets have significantly more technical information specific to some CPU(s).

This information can be found in this manual "IA-32 Intel Architecture Software Developer's Manual (vol 2a)"

Thank you for your responses. Allow me to clarify my question:

@Sergey Kostrov: Unfortunately, I need to be able to read such information (cache sizes etc.) at install time for my application so that the code can tune itself appropriately (I'm making something kind of like the ATLAS linear algebra project...), so I need to understand how to read cache information

@iliyapolak: I have read the relevant manual sections, and here is where I am confused: 

1. How is the cache data divided up between the two leaves? All of the caches I see when I use CPUID(eax=2) appear to be TLBs according to the table in the manual. Additionally, one of the cache descriptor bytes is 0xff, which according to the manual indicates: 

"CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters"

But what does this mean? Does this mean leaf 2 should not report ANY cache parameters (and therefore are all of the codes are junk?), or do I only need to call CPUID(eax=4) for a certain defined subset of caches, and keep the values from CPUID(eax=2) for the others? Please advise.

For reference, these are all of the cache codes I get reading from eax,ebx,ecx,edx:

Code: 01
Code: 5a
Code: 03
Code: 55
Code: ff
Code: b2
Code: f0
Code: 00
Code: 00
Code: 00
Code: 00
Code: 00
Code: 00
Code: 00
Code: ca
Code: 00

3. Why leaf 4 called "Deterministic"? What is the significance of this word? 

@Samuel

Tomorrow I will read the relevant to your question section and I will try to give you an answer.

Regarding the meaning "deterministic" I suppose that cache parameters are looked up directly.

>>>For reference, these are all of the cache codes I get reading from eax,ebx,ecx,edx:

Code: 01 Code: 5a Code: 03 Code: 55 Code: ff Code: b2 Code: f0 Code: 00 Code: 00 Code: 00 Code: 00 Code: 00 Code: 00 Code: 00 Code: ca Code: 00>>>

I assume that you have called cpuid with eax==2 and the order of the "code" values coresponds to actual information returned in registers eax,ebx,ecx,edx.Null values can be eliminated because they do not contain any encoded information.

Starting from eax LSB to MSB order:

eax == 0x01 - this indicates that cpuid must be executed once with an input value 2 in order to obtain full info about the cache and TLB

eax == 0x5a - Data TLB0: 2-MByte or 4 MByte pages, 4-way set associative, 32 entries

eax == 0x03 - Data TLB: 4 KByte pages, 4-way set associative, 64 entries

eax == 0x55 - Instruction TLB: 2-MByte or 4-MByte pages, fully associative, 7 entries

Starting from ebx LSB to MSB order

ebx == 0xff - CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters

ebx == 0xb2 - Instruction TLB: 4KByte pages, 4-way set associative, 64 entries

ebx == 0xf0 - 64-Byte prefetching

Start on of NULL byte values in ebx and ecx and edx registers skipping to edx == 0xca

edx == 0xca - Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries

As you pointed out the return byte in ebx == 0xff  has some kind of cryptic meaning.Please refer to the table 3-17 on page 3-149 and try to execute cpuid with eax == 4 and ecx == 0 and post the results.

 

 

 

 

 

 

 

 

 

Here is the raw data that I get from cpuid(EAX=4,ECX=0,1,2,3...) in bytes. Registers are in order eax,ebx,ecx,edx, separated by a blank line, and the registers returned for different ECX values are separated by '---------------------'

I give my interpretation of some of the values. They show an L1 Data cache, an L1 Instruction cache, and an L2 Data Cache. I assume that the last cache is the L3 data cache. All of the values that I analyzed match the output of the MacCPUID program, so this must be where it gets its information. 

If anyone out there knows, it would be nice to know if the proper procedure is

1. Read all caches from leaf 2

2. If the 0xff byte is present, read all caches from leaf 4 

3. Combine the lists to have all of the caches.

Right now, I don't know the answer to the key questions:

1. When 0xff is set, are the leaf 2 caches valid?

2. Is leaf 4 guaranteed not to duplicate leaf 2?

----------------------------------------------------------------------------

BEGIN REGISTER DATA

21 - 00100001 - 001 = Level 1, 00001 = Data cache
41 - 01000001 - 01=2 threads, 0000= Reserved,0=Not fully associative,1 = Self-initializing cache
00
3c

3f - 00111111=64 B system coherency line size
00 - 00000000
c0 - 11000000, 0000000000=1 Physical line partitions
01 - 00000001, 0000000111=8 way associative

3f
00
00
00

00
00
00
00

---------------------
22 - 00100010 - 001 = Level 1,00010 = Instruction Cache,
41 - 01000001 - 01=2 threads, 0000= Reserved,0=Not fully associative,1 = Self-initializing cache
00
3c

3f - 00111111=64 B system coherency line size
00 - 00000000
c0 - 11000000, 0000000000=1 Physical line partitions
00 - 00000000, 0000000011=4 way associative

7f
00
00
00

00
00
00
00

---------------------
43 - 01000011, 010=Level 2, 0011 = Unified cache
41 - 01000001 - 01=2 threads, 0000= Reserved,0=Not fully associative,1 = Self-initializing cache
00
3c

3f - 00111111=64 B system coherency line size
00 - 00000000
c0 - 11000000, 0000000000=1 Physical line partitions
01 - 00000001, 0000000111=8 way associative

ff
01
00
00

00
00
00
00

---------------------
63
c1
07
3c

3f
00
c0
03

ff
2f
00
00

02
00
00
00

---------------------

>>>I give my interpretation of some of the values. They show an L1 Data cache, an L1 Instruction cache, and an L2 Data Cache. I assume that the last cache is the L3 data cache. All of the values that I analyzed match the output of the MacCPUID program, so this must be where it gets its information>>>

Thanks for posting eax == 4 data.I suppose that procedure used by you is the right.Have you decode all values?How many indices have you used?

My description of  eax,ebc,ecx,edx values which have not been decoded by you.

eax = [31:26] == 0x3c - 11110000 - Maximum number of addressable IDs for processor cores in the physical.

ecx = [31:0]  == 0x3f - 111111 - Number of Sets ??? // manual says to add one to the ret value to get the result.Can you execute ecx = ecx+1 ecx holds the result(0x3f)

Null values skipped.

>>>1. When 0xff is set, are the leaf 2 caches valid?>>>

Can you execute eax ==2 next eax == 4 and eax == 2 and compare the results of  cpuid.eax==2 calls?

 

 

 

Regarding the eax=4 data, I decoded them all; I just didn't want to type them all out in the above format; they all make sense and match MacCPUID. I think that your interpretation of the bytes I didn't do is correct. 

>>> Can you execute eax ==2 next eax == 4 and eax == 2 and compare the results of  cpuid.eax==2 calls?

I'm sorry, but I am confused by what you mean here. Do you want me compare the output of eax=4 and eax=2? I think that we have all of the relevant data for that. But regardless, I don't think that just executing the call in one case on my one system can answer the question of whether they are guaranteed to be valid on all systems, even though they appear to be on mine, as well as whether eax=4 always returns information that is non-redundant with eax=2. I think we just have to wait for someone who knows the instruction implementation. 

>>>I'm sorry, but I am confused by what you mean here. Do you want me compare the output of eax=4 and eax=2?>>>

No.I simply wanted to compare the results of two calls when cpuid.eax == 2 because you expressed concern  about the validity of  the data when leaf 4 is executed.

Oh, I am sorry, that is not what I meant. I do not believe that leaf 4 would corrupt the leaf 2 data; I was simply wondering about the correct interpretation of the 0xff byte returned by the leaf 2 call. It says "CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters", and I did not know if that meant that when you see 0xff, that means that you should not call leaf 2 for this processor. Since leaf 4 returns only four caches, it seems that 0xff means that you are supposed to call both and aggregate the results. It would be good to have that clarified though. 

It is ok.I think that your explanation of 0xff byte make sense.Unfortunatly  Intel official documentation does not explain that in great details.

FWIW, my understanding is that leaf 2 was used in early CPU models and then was deprecated in favor of leaf 4, which reports information in a more flexible and complete form. All SB and IB CPUs I tried reported cache information in leaf 4 rather than 2.

@andysem

Sorry for off topic,but what FWIW stands for?

Regarding your leaf 4 explanation it does make sense.I suppose that leaf 2 info was left maybe for compatibility reason.

I do not think it is exclusively a compatibility reason. Leaf 4 does not return all of the information present in leaf 2; for example, no TLBs are given by leaf 4. So it seems always necessary to call both for full information. 

@iliyapolak

FWIW = For What It's Worth.

Thanks @andysem

Inicie sesión para dejar un comentario.