Haswell TLBs undefined in Intel cpu spec

Haswell TLBs undefined in Intel cpu spec

I am currently upgrading my cpuid detection of Intel TLBs and have a Haswell 4770 cpu.  I note that in the 4 registers returned by cpuid test eax=2 I observe undefined descriptors of 0xc1 and 0xb6 being returned which are not defined in the Intel cpu spec for my Intel i7 4770 released cpu.  

CAn someone at intel update the spec for tlb detection in leaf eax=2 and let me know what is missing "please".  I use this in my high perf code for tlb detection and currently don't detect any 2nd level TLB.

Perfwise

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Do you have Intel(R) Processor Identification and the CPUID Instruction Application Note 485 ( May 2012 edition )?

I just did a verification and take a look at Table 5.8 in the Application Note 485.

Do you have a link to the document.  I only find it missing.. and intel refers one to their arch manuals which are missing the pertinent dara I previously highlighted.  Did you find out what descriptors 0xc1 and 0xb6 mean.  

Here it is - Attached.

Attachments: 

AttachmentSize
Download intelappnote485.zip667.54 KB

Sergey... thanks for uploading it.  It does not contain info on what descriptors 0xc1 and 0xb6 mean though.  If you had a haswell.. which is what I am on... you would find these descriptors provided by the cpuid function eax=2... but I need to know what TLB info they correlate to.. which intel is missing in their documentation.   Thanks anyways...

Perfwise

Can someone @ Intel update the CPUID spec in their documents.  You're missing descriptors in the cpuid results returned by my HW cpu which is released.  The results returned by my HW i7-4770K cpu for cpuid test EAX=2 are:

eax input: 0x2
eax output: 0x76036301
ebx output: 0x f0b6ff
ecx output: 0x 0
edx output: 0x c10000

The 0xff descriptor signifies that I should use EAX=4 leaf for detecting cache parameters.. but TLBs are specified by test EAX=2 on Intel cpus. 0xc1 and 0xb6 are descriptor for which in your current spec there's no description of what they signify.  I'd like to be able to detect this in my cpuid code I've built.  Detecting the TLB size can be very useful for blocking high performance code so you don't overrun the memory mapped by the TLB and create misses.  Thanks for any help..

Perfwise

b6 is ITLB 128 entries of 8-way, 4K page

c1 is L2 STLB 1024 entries of 8-way, 4K/2M page

Thank you very much for the information Shih Kuo.  

Shih Kuo,

    I don't see any L1TLB entries specified for the 2M pages.  Is this the case, or just a typo in the cpuid spec.  Is this a typo in the CPUID spec?  So from your assistance I see there's 2M page support in the "shared" L2 TLB for data and instruction requests.. but no L1 support?  I don't think that's accurate.  Does descriptor 0x03 include 2M/4M page support. If you can please clarify.. 

Perfwise

Just replying in the hope of getting a response.  Seems there must be L1 DTLB entries for 2M pages.. but none are detailed by CPUID instruction.  Seems odd.. 

Perfwise

I am sad to see AppNote 485 is gone - all links now point to the reference manuals.

Unfortunately the current combined manual's Table 3-22 does not document descriptor B5 reported by Haswell. May we please know its meaning?

 

 

Jan, Take a look at one of my previous posts ( 7th from your post ) for a zip-file with AppNote 485. I just verified and downloaded it ( no problems ).

Sergey, thanks for uploading, it is newer than the one I had.

The issue is that Intel will apparently no longer update the appnote, considering it superceded by the other manuals.

 

Jan, the other issue is that the documentation for CPUID in the manuals is dated.  They've not updated it to that of the App Note nor have they documented the TLBs for Haswell.  Case in point is my question, what's the size of the TLBs for 2MB pages, which are supported in linux now annonymously, for the L1 DTLB.  All I've gotten documentation of is the size of the L2 which provided by Intel above is a shared L2 for I and D requests and has 1024 entries for 4K or 2M pages.  

Perfwise

Inquiring again... your TLB descriptors do not specify what your L1 DTLB configuration is for 2M and 4M pages.  However, if you look at what the architect said:

http://www.theregister.co.uk/2012/09/20/intel_haswell_microarchitecture_deep_dive/?page=3

There are 32 entries for 2M/4M page support in the L1DTLB.  Where are these in the descriptors, again.  I just looked at the latest Intel doc.. and there's no record of them.. and they are not reported by the processor.  Any info on where this info is, is a descriptor not properly documented, it seems they do exist .. but without any documentation.

perfwise

Leave a Comment

Please sign in to add a comment. Not a member? Join today