Merrifield Uncore Performance Monitoring Events

Using the Merrifield SoC Performance Monitoring Events

This article focuses directly on the uncore performance monitoring events for the SoC Merrifield.  For the introduction to SoC uncore performance monitoring, please see this article:

Introduction to the Merrifield SoC

Below is a block diagram, illustrating the typical Merrifield layout.  The green arrows connecting each block represent interfaces that can be monitored for requests to calculate bandwidth.  The gray arrows in the south cluster represent interfaces that are not supported for uncore performance monitoring.  As shown below, all analysis will be focused in the north cluster.

Available Groups

The following table documents the available groups for Merrifield.  The group name refers to the pre-determined set of events that will be programmed by the software monitoring tools.  The event column documents how many events that group contains.  The clock column documents if the group includes counts from the SoC source clock.

Merrifield Uncore Event Group Table

Group Name

Events

Clock

Description

UNC_SOC_Memory_DDR_BW

5

Yes

Counts memory read and write requests to memory channel 0 and 1. Determine memory bandwidth by multiplying event count by 32 bytes.

UNC_SOC_DDR_Self_Refresh

5

Yes

Counts the number of cycles that memory channel 0 and 1 are in self-refresh.

UNC_SOC_All_Reqs

6

Yes

Counts the number of requests per memory agent. Counts can be used to identify high demand agents or to estimate bandwidth for an agent by multiplying each request by 64 bytes.

UNC_SOC_Module0_BW

7

Yes

Counts bandwidth events for Silvermont module 0. Determine bandwidth between Silvermont module 0 and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Module0_Snoops

3

Yes

Counts the number of snoop requests and replies for Silvermont module 0.

UNC_SOC_Graphics_BW

7

Yes

Counts graphic controller bandwidth events. Determine bandwidth between the graphic controller and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Display_BW

7

Yes

Counts display controller bandwidth events. Determine bandwidth between the display controller and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Imaging_BW

7

Yes

Counts imaging controller bandwidth events. Determine bandwidth between the imaging controller and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_LowSpeedPF_BW

7

Yes

Counts bandwidth events for the low speed peripheral fabric. Determine the aggregate memory bandwidth by multiplying event count by the request size (32 or 64 bytes).

 

UNC_SOC_Memory_DDR_BW

UNC_SOC_Memory_DDR_BW group provides counts to compute the total memory bandwidth as seen by the SoC memory controller.  The events provide a break-down of per channel bandwidth. While these events do not provide insight on which agent is demanding memory, it is the most accurate way to determine how much actual memory bandwidth is being consumed.

The number of memory channels for Merrifield is SKU dependent and may be either one or two channels.  If there is only one channel, all counts associated with the second channel will be zero.  For more information on memory channel architecture, please see: http://en.wikipedia.org/wiki/Multi-channel_memory_architecture.

The image below illustrates the traffic flow being monitored for this group.

The table below documents the events contained in the UNC_SOC_Memory_DDR_BW group.

Name

Counter

Description

DDR_Chan0-Read32B

0

Counts memory read requests to memory channel 0.

DDR_Chan0-Write32B

1

Counts memory write requests to memory channel 0.

DDR_Chan1-Read32B

2

Counts memory read requests to memory channel 1.

DDR_Chan1-Write32B

3

Counts memory write requests to memory channel 1.

Clock_Counter

4

SoC clock counter

Analyzing Results

Bandwidth in terms of MB/s can be calculated for each event listed above as follows:

Event metric formula: event_count/seconds_sampled*32bytes/1000000bytes = MB/s

Events can be summed together to form the desired metric, for example:

  • Total memory bandwidth = sum of all event MB/s
  • Total read bandwidth = sum of all read event MB/s
  • Channel 0 bandwidth = sum of channel 0 event MB/s

Known behaviors

  1. If the Merrifield memory does not have two channels, the second channel counts will be zero.

UNC_SOC_DDR_Self_Refresh

UNC_SOC_DDR_Self_Refresh group provides counts of the memory hardware event self-refresh.  Self-refresh represents a low power state and can be used for power optimization of the SoC and application.

The table below documents the events contained in the UNC_SOC_DDR_Self_Refresh group.

Name

Counter

Description

DDR_Chan0_Deep_Self_Refresh

0

Counts the number of cycles that memory channel 0 is in deep self-refresh.

DDR_Chan0_Shallow_Self_Refresh

1

Counts the number of cycles that memory channel 0 is in shallow self-refresh.

DDR_Chan1_Deep_Self_Refresh

2

Counts the number of cycles that memory channel 1 is in deep self-refresh.

DDR_Chan1_Shallow_Self_Refresh

3

Counts the number of cycles that memory channel 1 is in shallow self-refresh.

Clock_Counter

4

SoC clock counter

Analyzing Results

Event metric formula: ((event_count * 100)/(time_interval * base_DRAM_frequency)= DDR Self-refresh Residency

Known behaviors

  1. If the Merrifield memory does not have two channels, the second channel counts will be zero.
  2. Counters 0, 1, 2, 3 may be running at a different source clock frequency than counter 4.

UNC_SOC_All_Reqs

The per agent requests count events contained in the group UNC_SOC_All_Reqs measures the total number of requests for the available SoC agents in a single, concurrent sampling.  The fact that it captures all agents concurrently makes this a key metric for studying any non-static or “bursty” workload.  Unlike the other bandwidth events which are sampled one or two at a time, this metric provides insight to all agents in a single time window.

Per agent bandwidth can be estimated by multiplying each request count by 64 bytes and total bandwidth can be estimated by summing the bandwidth of all agents.  It is critical to understand that the end result is only an estimate based on the assumption that each request is 64B and will over count when transaction are 32 bytes or partial sized requests.  The other disadvantage is that there is no read vs. write breakdown per agent.

For an exact bandwidth measurement with read and write breakdown, per agent bandwidth metric must be used one or two agents at a time.

Name

Counter

Description

Mod0_Reqs

0

Counts the number of requests from Silvermont module 0.

Disp_Reqs

1

Counts the number of requests from the display controller.

GFX_Reqs

2

Counts the number of requests from the graphics controller.

Imaging_Reqs

3

Counts the number of requests from the imaging controller.

LowSpeedPF_Reqs

4

Counts the aggregate number of requests from the low speed peripheral fabric.

Clock_Counter

5

SoC clock counter

 

Analyzing Results

Event metric formula:

  • event_count/seconds_sampled*64bytes/1000000bytes = Estimated Agent MB/s
  • sum_of_all_event counters/seconds_sampled*64bytes/1000000bytes = Estimated DDR MB/s

Known behaviors

  1. It is important to remember that these events count transactions of any request size and that multiplying them by 64 bytes to calculate a MB/s metric is a generalization and may over count actual bandwidth observed at the memory channels.

UNC_SOC_Module0_BW

UNC_SOC_Module0_BW group provides counts to compute the bandwidth of processor module zero as seen by the system agent.  The events provide a break-down of per request type.

The image below illustrates the traffic flow being monitored for this group.

Name

Counter

Description

Mod0_ReadPartial

0

Counts all module 0 read transactions of any data size request. This event count is inclusive of partial, 32 byte and 64 byte transactions.

Mod0_Read32B

1

Counts memory read requests of size 32 bytes from Silvermont module 0.

Mod0_Read64B

2

Counts memory read requests of size 64 bytes from Silvermont module 0.

Mod0_WritePartial

3

Counts all module 0 write transactions of any data size request. This event count is inclusive of partial, 32 byte and 64 byte transactions.

Mod0_Write32B

4

Counts memory write requests of size 32 bytes from Silvermont module 0.

Mod0_Write64B

5

Counts memory write requests of size 64 bytes from Silvermont module 0.

Clock_Counter

6

SoC clock counter

 

Analyzing Results

Read and write bandwidth can be calculated for the 32 byte and 64 byte events, but the partial event is problematic for bandwidth computations since a partial request has an unknown payload size.  It is also vital to understand that the partials events for this group represent the sum of 64 byte, 32 byte and partial requests.  This partial event may also be thought of as total read or write count.

Event metric formula:

  • partial_requests - 32_byte_requests - 64_byte_requests = actual partial request count
  • (Mod0_Read32B_count * 32_bytes / seconds_sampled) + (Mod0_Read64B_count * 64_bytes / seconds_sampled) = Read MB/s
  • (Mod0_Write32B_count * 32_bytes / seconds_sampled) + (Mod0_Write64B_count * 64_bytes / seconds_sampled) = Write MB/s

Known behaviors

  1. The module 0 and module 1 partial event count include 32 byte, 64 byte and partial requests and can be thought of as a total request count.

UNC_SOC_Module0_Snoops

UNC_SOC_Module0_Snoops counts the number of snoop requests and snoop replies for Silvermont module 0, as seen by the system agent.  These counts can be used to confirm other traffic counts and correlate core snoop event counts.  Unlike the core based snoop events, the uncore snoop counts are not able to distinguish between cores within a module and count the total for the module rather than per core.

Name

Event

Description

Mod0_Snoop_Replies

0

Counts the number of snoop replies received from module 0.

Mod0_Snoop_Reqs

1

Counts the number of snoop requests sent to module 0.

Clock_Counter

2

SoC clock counter

Analyzing Results

Analyzing snoop results are usage model specific.

Known behaviors

  1. Snoop counts are per processor module totals with no per core break-down available.

UNC_SOC_Graphics_BW

UNC_SOC_Graphics_BW group provides counts to compute the bandwidth of the graphics controller, as seen by the system agent.  The events provide a break-down of per request type.

The image below illustrates the traffic flow being monitored by this group.

Name

Counter

Description

GFX_ReadPartial

0

Counts graphics controller read transactions with partial sized data requests.

GFX_Read32B

1

Counts memory read requests of size 32 bytes from the graphic controller.

GFX_Read64B

2

Counts memory read requests of size 64 bytes from the graphic controller.

GFX_WritePartial

3

Counts graphics controller write transactions with partial sized data requests.

GFX_Write32B

4

Counts memory write requests of size 32 bytes from the graphic controller.

GFX_Write64B

5

Counts memory write requests of size 64 bytes from the graphic controller.

Clock_Counter

6

SoC Clock Counter

Analyzing Results

Read and write bandwidth can be calculated for the 32 byte and 64 byte events, but the partial event is problematic for bandwidth computations since a partial request has an unknown payload size.

Event metric formula:

  • (GFX_Read32B_count * 32_bytes / seconds_sampled) + (GFX_Read64B_count * 64_bytes / seconds_sampled) = Read MB/s
  • (GFX_Write32B_count * 32_bytes / seconds_sampled) + (GFX_Write64B_count * 64_bytes / seconds_sampled) = Write MB/s

Known behaviors

  1. Graphics bandwidth computation is problematic because the Merrifield Graphics Unit does mostly partial writes, which have an unknown payload size.

UNC_SOC_Display_BW

The UNC_SOC_Display_BW group provides events to determine the bandwidth of the display controller and is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to Disp.

UNC_SOC_Imaging_BW

The UNC_SOC_Imaging_BW group provides events to determine the bandwidth of the imaging controller (imaging signal processor) and is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to Imaging.

UNC_SOC_LowSpeedPF_BW

The UNC_SOC_LowSpeedPF_BW group provides events to determine the bandwidth of the low speed peripheral fabric, representing aggregate bandwidth for all south cluster units such as: USB3, USB2, EMMC, Comms, Crypto and audio.  It is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to LowSpeedPF.

有关编译器优化的更完整信息,请参阅优化通知