Monitoring Integrated Memory Controller Requests in the 2nd, 3rd and 4th generation Intel® Core™ processors

Authors: Roman Dementiev and Angela D. Schmid

Dear Software Tuning, Performance Optimization & Platform Monitoring community,

The recent and upcoming Intel® Core™ processors of 2nd,3rd and 4th generation (previously codenamed Sandy-Bridge, Ivy-Bridge and Haswell) expose model specific counters that allow for monitoring requests to DRAM.

The counters employ circuitry residing in the memory controller, and monitor transaction requests coming from various sources, e.g. the processor cores, the graphic engine, or other I/O agents.  The monitoring interface uses memory-mapped I/O reads from physical memory at the offsets specified in Table 1. Memory traffic metrics can be derived as follows:

  • Data read from DRAM in number of bytes:   DRAM_DATA_READS*64
  • Data written to DRAM in number of bytes:   DRAM_DATA_WRITES*64

Users and developers may take advantage of Intel tools to easily access the counters or derived memory performance metrics:

Table 1. Addresses of DRAM Counters.

The DRAM counters below are model specific meaning they will change or not be supported in the future. The BAR is available (in PCI configuration space) at Bus 0; Device 0; Function 0; Offset 048H.

DRAM_GT_REQUESTS BAR + 0x5040 Counts every read/write request entering the Memory Controller to DRAM (sum of all channels) from the GT engine. Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate GT memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined.
DRAM_IA_REQUESTS BAR + 0x5044 Counts every read/write request (demand and HW prefetch) entering the Memory Controller to DRAM (sum of all channels) from IA. Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate IA memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined.
DRAM_IO_REQUESTS BAR + 0x5048 Counts every read/write request entering the Memory Controller to DRAM (sum of all channels) from all IO sources (e.g. PCIe, Display Engine, USB audio, etc.). Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate IO memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined.
DRAM_DATA_READS BAR + 0x5050 Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64-byte data transfers from DRAM. Use for accurate memory bandwidth calculations.
DRAM_DATA_WRITES BAR + 0x5054 Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64-byte data transfers from DRAM. Use for accurate memory bandwidth calculations.

Regards,
Roman Dementiev
Senior Application Engineer
Intel Corporation

Angela D. Schmid
Performance Engineer
Intel Corporation

Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.