Performance monitoring counters for Ivy Bridge

Performance monitoring counters for Ivy Bridge

Hi,

 

I am trying to read some events from the performance monitoring counters of processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz,

but I have not been able to find out the performance monitoring guidelines for Ivy Bridge processors in the Intel documentation.

Can anybody guide me in this regard?

 

Thanks

7 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Read Intel Optimization Reference Manual appendix B.

引文:

iliyapolak 写道:

Read Intel Optimization Reference Manual appendix B.

 

Thanks for your reply. Could you please point out where exactly I can find the performance monitoring information for Ivy Bridge

in this manual as I could not see much regarding Ivy Bridge in Appendix B . http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-i...

You are right.I double checked that B Appendix ant it does not have any info about the Ivy Bridge.

Revision 028 of the Optimization Reference Manual contains a statement in Appendix B about an Ivy Bridge performance monitoring event called "CYCLE_ACTIVITY.STALLS_LDM_PENDING".

This appears to relate to Event A3H of the performance counters, but the descriptions of this event in sections 19.2 (Haswell), 19.3 (Ivy Bridge), and 19.2 (Sandy Bridge) of Volume 3 of the SW Developers Manual are quite confusing.    For example, in section 19.3 (Ivy Bridge), Event A3H, Umask 01H indicates "cycles with outstanding L2 load misses", while Umask 04H indicates "cycles with dispatch stalls".   If combining these two masks into 05H is interpreted as a logical AND operation, then this should give a count of the stall cycles that occur while there is at least one load pending that has missed in the L2 cache.

In section 19.2 (Haswell), the Umask 04H event is not included, but the Umask 05H event is name "CYCLE_ACTIVITY.STALLS_L2_PENDING".  The text description does not make sense ("Number of loads missed L2"), but this might indicate that the 01H and 04H masks can be used to count stall cycles while L2 misses are pending.

Looks like an opportunity for Intel to review the documentation....

John D. McCalpin, PhD "Dr. Bandwidth"

 

Are there some events which would be common between Sandy Bridge and Ivy Bridge  architectures.

I am looking for events like L2 cache misses, uops retired etc.

Here is the output of cpuid for one of the cores of Intel Xeon E5 1650.

CPU 0:
   vendor_id = "GenuineIntel"
   version information (1/eax):
      processor type  = primary processor (0)
      family          = Intel Pentium Pro/II/III/Celeron/Core/Core 2/Atom, AMD Athlon/Duron, Cyrix M2, VIA C3 (6)
      model           = 0xd (13)
      stepping id     = 0x7 (7)
      extended family = 0x0 (0)
      extended model  = 0x2 (2)
      (simple synth)  = Intel Core i7-3800/3900 (Sandy Bridge-E C2) / Xeon E5-1600/2600 (Sandy Bridge-E C2/M1), 32nm
   miscellaneous (1/ebx):
      process local APIC physical ID = 0x0 (0)
      cpu count                      = 0x20 (32)
      CLFLUSH line size              = 0x8 (8)
      brand index                    = 0x0 (0)
   brand id = 0x00 (0): unknown
   feature information (1/edx):
      x87 FPU on chip                        = true
      virtual-8086 mode enhancement          = true
      debugging extensions                   = true
      page size extensions                   = true
      time stamp counter                     = true
      RDMSR and WRMSR support                = true
      physical address extensions            = true
      machine check exception                = true
      CMPXCHG8B inst.                        = true
      APIC on chip                           = true
      SYSENTER and SYSEXIT                   = true
      memory type range registers            = true
      PTE global bit                         = true
      machine check architecture             = true
      conditional move/compare instruction   = true
      page attribute table                   = true
      page size extension                    = true
      processor serial number                = false
      CLFLUSH instruction                    = true
      debug store                            = true
      thermal monitor and clock ctrl         = true
      MMX Technology                         = true
      FXSAVE/FXRSTOR                         = true
      SSE extensions                         = true
      SSE2 extensions                        = true
      self snoop                             = true
      hyper-threading / multi-core supported = true
      therm. monitor                         = true
      IA64                                   = false
      pending break event                    = true
   feature information (1/ecx):
      PNI/SSE3: Prescott New Instructions     = true
      PCLMULDQ instruction                    = true
      64-bit debug store                      = true
      MONITOR/MWAIT                           = true
      CPL-qualified debug store               = true
      VMX: virtual machine extensions         = true
      SMX: safer mode extensions              = true
      Enhanced Intel SpeedStep Technology     = true
      thermal monitor 2                       = true
      SSSE3 extensions                        = true
      context ID: adaptive or shared L1 data  = false
      FMA instruction                         = false
      CMPXCHG16B instruction                  = true
      xTPR disable                            = true
      perfmon and debug                       = true
      process context identifiers             = true
      direct cache access                     = true
      SSE4.1 extensions                       = true
      SSE4.2 extensions                       = true
      extended xAPIC support                  = true
      MOVBE instruction                       = false
      POPCNT instruction                      = true
      time stamp counter deadline             = true
      AES instruction                         = true
      XSAVE/XSTOR states                      = true
      OS-enabled XSAVE/XSTOR                  = true
      AVX: advanced vector extensions         = true
      F16C half-precision convert instruction = false
      RDRAND instruction                      = false
      hypervisor guest status                 = false
   cache and TLB information (2):
      0x5a: data TLB: 2M/4M pages, 4-way, 32 entries
      0x03: data TLB: 4K pages, 4-way, 64 entries
      0x76: instruction TLB: 2M/4M pages, fully, 8 entries
      0xff: cache data is in CPUID 4
      0xb0: instruction TLB: 4K, 4-way, 128 entries
      0xf0: 64 byte prefetching
      0xca: L2 TLB: 4K, 4-way, 512 entries
   processor serial number: 0002-06D7-0000-0000-0000-0000
   deterministic cache parameters (4):
      --- cache 0 ---
      cache type                           = data cache (1)
      cache level                          = 0x1 (1)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0xf (15)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 63
      --- cache 1 ---
      cache type                           = instruction cache (2)
      cache level                          = 0x1 (1)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0xf (15)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 63
      --- cache 2 ---
      cache type                           = unified cache (3)
      cache level                          = 0x2 (2)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0xf (15)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 511
      --- cache 3 ---
      cache type                           = unified cache (3)
      cache level                          = 0x3 (3)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1f (31)
      extra processor cores on this die    = 0xf (15)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0xf (15)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = true
      complex cache indexing               = true
      number of sets - 1 (s)               = 12287
   MONITOR/MWAIT (5):
      smallest monitor-line size (bytes)       = 0x40 (64)
      largest monitor-line size (bytes)        = 0x40 (64)
      enum of Monitor-MWAIT exts supported     = true
      supports intrs as break-event for MWAIT  = true
      number of C0 sub C-states using MWAIT    = 0x0 (0)
      number of C1 sub C-states using MWAIT    = 0x2 (2)
      number of C2 sub C-states using MWAIT    = 0x1 (1)
      number of C3/C6 sub C-states using MWAIT = 0x1 (1)
      number of C4/C7 sub C-states using MWAIT = 0x2 (2)
   Thermal and Power Management Features (6):
      digital thermometer                     = true
      Intel Turbo Boost Technology            = true
      ARAT always running APIC timer          = true
      PLN power limit notification            = true
      ECMD extended clock modulation duty     = true
      PTM package thermal management          = true
      digital thermometer thresholds          = 0x2 (2)
      ACNT/MCNT supported performance measure = true
      ACNT2 available                         = false
      performance-energy bias capability      = true
   extended feature flags (7):
      FSGSBASE instructions                   = false
      BMI instruction                         = false
      SMEP support                            = false
      enhanced REP MOVSB/STOSB                = false
      INVPCID instruction                     = false
   Direct Cache Access Parameters (9):
      PLATFORM_DCA_CAP MSR bits = 1
   Architecture Performance Monitoring Features (0xa/eax):
      version ID                               = 0x3 (3)
      number of counters per logical processor = 0x8 (8)
      bit width of counter                     = 0x30 (48)
      length of EBX bit vector                 = 0x7 (7)
   Architecture Performance Monitoring Features (0xa/ebx):
      core cycle event not available           = false
      instruction retired event not available  = false
      reference cycles event not available     = false
      last-level cache ref event not available = false
      last-level cache miss event not avail    = false
      branch inst retired event not available  = false
      branch mispred retired event not avail   = false
   Architecture Performance Monitoring Features (0xa/edx):
      number of fixed counters    = 0x3 (3)
      bit width of fixed counters = 0x30 (48)
   x2APIC features / processor topology (0xb):
      --- level 0 (thread) ---
      bits to shift APIC ID to get next = 0x1 (1)
      logical processors at this level  = 0x2 (2)
      level number                      = 0x0 (0)
      level type                        = thread (1)
      extended APIC ID                  = 0
      --- level 1 (core) ---
      bits to shift APIC ID to get next = 0x5 (5)
      logical processors at this level  = 0xc (12)
      level number                      = 0x1 (1)
      level type                        = core (2)
      extended APIC ID                  = 0
   0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   XSAVE features (0xd/0):
      XCR0 lower 32 bits valid bit field mask = 0x00000007
      bytes required by fields in XCR0        = 0x00000340 (832)
      bytes required by XSAVE/XRSTOR area     = 0x00000340 (832)
      XCR0 upper 32 bits valid bit field mask = 0x00000000
   YMM features (0xd/2):
      YMM save state byte size                = 0x00000100 (256)
      YMM save state byte offset              = 0x00000240 (576)
   LWP features (0xd/0x3e):
      LWP save state byte size                = 0x00000000 (0)
      LWP save state byte offset              = 0x00000000 (0)
   extended feature flags (0x80000001/edx):
      SYSCALL and SYSRET instructions        = true
      execution disable                      = true
      1-GB large page support                = true
      RDTSCP                                 = true
      64-bit extensions technology available = true
   Intel feature flags (0x80000001/ecx):
      LAHF/SAHF supported in 64-bit mode = true
   brand = "       Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz"
   L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
      instruction # entries     = 0x0 (0)
      instruction associativity = 0x0 (0)
      data # entries            = 0x0 (0)
      data associativity        = 0x0 (0)
   L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
      instruction # entries     = 0x0 (0)
      instruction associativity = 0x0 (0)
      data # entries            = 0x0 (0)
      data associativity        = 0x0 (0)
   L1 data cache information (0x80000005/ecx):
      line size (bytes) = 0x0 (0)
      lines per tag     = 0x0 (0)
      associativity     = 0x0 (0)
      size (Kb)         = 0x0 (0)
   L1 instruction cache information (0x80000005/edx):
      line size (bytes) = 0x0 (0)
      lines per tag     = 0x0 (0)
      associativity     = 0x0 (0)
      size (Kb)         = 0x0 (0)
   L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
      instruction # entries     = 0x0 (0)
      instruction associativity = L2 off (0)
      data # entries            = 0x0 (0)
      data associativity        = L2 off (0)
   L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
      instruction # entries     = 0x0 (0)
      instruction associativity = L2 off (0)
      data # entries            = 0x0 (0)
      data associativity        = L2 off (0)
   L2 unified cache information (0x80000006/ecx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x0 (0)
      associativity     = 8-way (6)
      size (Kb)         = 0x100 (256)
   L3 cache information (0x80000006/edx):
      line size (bytes)     = 0x0 (0)
      lines per tag         = 0x0 (0)
      associativity         = L2 off (0)
      size (in 512Kb units) = 0x0 (0)
   Advanced Power Management Features (0x80000007/edx):
      temperature sensing diode      = false
      frequency ID (FID) control     = false
      voltage ID (VID) control       = false
      thermal trip (TTP)             = false
      thermal monitor (TM)           = false
      software thermal control (STC) = false
      100 MHz multiplier control     = false
      hardware P-State control       = false
      TscInvariant                   = true
   Physical Address and Linear Address Size (0x80000008/eax):
      maximum physical address bits         = 0x2e (46)
      maximum linear (virtual) address bits = 0x30 (48)
      maximum guest physical address bits   = 0x0 (0)
   Logical CPU cores (0x80000008/ecx):
      number of CPU cores - 1 = 0x0 (0)
      ApicIdCoreIdSize        = 0x0 (0)
   (multi-processing synth): multi-core (c=6), hyper-threaded (t=2)
   (multi-processing method): Intel leaf 0xb
   (APIC widths synth): CORE_width=5 SMT_width=1
   (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=0
   (synth) = Intel Xeon E5-1600/2600 (Sandy Bridge-E C2/M1), 32nm

Best Reply

The Xeon E5-1650 is a Sandy Bridge processor, not an Ivy Bridge processor.

The output of cpuid shows that this processor has a "DisplayFamily_DisplayModel" encoding of 06_2AH.

All of the performance monitoring events for this family are presented in Section 19.4 of Volume 3 of the Intel SW Developer's Manual (document 325384, revision 047 is from June 2013).

John D. McCalpin, PhD "Dr. Bandwidth"

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen