Baytrail Uncore Performance Monitoring Events

Using the Baytrail SoC Performance Monitoring Events

This article focuses directly on the uncore performance monitoring events for the SoC Baytrail.  For the introduction to SoC uncore performance monitoring, please see this artictle:

Introduction to the Baytrail SoC

Below is a block diagram, illustrating the typical Baytrail layout. The green arrows connecting each block represent interfaces that can be monitored for requests to calculate bandwidth.  The gray arrows in the south cluster represent interfaces that are not supported for uncore performance monitoring.  As shown below, all analysis will be focused in the north cluster.

Available Groups

The following table documents the available groups for Baytrail.  The group name refers to the pre-determined set of events that will be programmed by the software monitoring tools.  The event column documents how many events that group contains.  The clock column documents if the group includes counts from the SoC source clock.

Baytrail Uncore Event Group Table
Group Name Events Clock Description

UNC_SOC_Memory_DDR_BW

8

No

Counts memory read and write requests to memory channel 0 and 1, rank 0 and 1. Determine memory bandwidth by multiplying event count by 64 bytes.

UNC_SOC_Memory_DDR0_BW

5

Yes

Counts memory read and write requests to memory channel 0, rank 0 and 1. Determine memory channel 0 bandwidth by multiplying event count by 64 bytes.

UNC_SOC_Memory_DDR1_BW

5

Yes

Counts memory read and write requests to memory channel 1, rank 0 and 1. Determine memory channel 1 bandwidth by multiplying event count by 64 bytes.

UNC_SOC_DDR_Self_Refresh

3

Yes

Counts the number of cycles that memory channel 0 and 1 are in self-refresh.

UNC_SOC_All_Reqs

8

No

Counts the number of requests per memory agent. Counts can be used to identify high demand agents or to estimate bandwidth for an agent by multiplying each request by 64 bytes.

UNC_SOC_Module0_BW

7

Yes

Counts bandwidth events for Silvermont module 0. Determine bandwidth between Silvermont module 0 and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Module1_BW

7

Yes

Counts bandwidth events for Silvermont module 1. Determine bandwidth between Silvermont module 1 and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Module0_1_BW

8

No

Counts bandwidth events for Silvermont module 0 and module 1. Determine bandwidth between Silvermont module 0, module 1 and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Module0_1_Snoops

5

Yes

Counts the number of snoop requests and replies for Silvermont module 0 and 1.

UNC_SOC_Graphics_BW

7

Yes

Counts graphic controller bandwidth events. Determine bandwidth between the graphic controller and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Display_BW

7

Yes

Counts display controller bandwidth events. Determine bandwidth between the display controller and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_Imaging_BW

7

Yes

Counts imaging controller bandwidth events. Determine bandwidth between the imaging controller and memory by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_LowSpeedPF_BW

7

Yes

Counts bandwidth events for the low speed peripheral fabric. Determine the aggregate memory bandwidth by multiplying event count by the request size (32 or 64 bytes).

UNC_SOC_VED_BW

7

Yes

Counts bandwidth events for the video encode/decode controller. Determine bandwidth between the video encode/decode controller and memory by multiplying event count by the request size (32 or 64 bytes).

 

UNC_SOC_Memory_DDR_BW

UNC_SOC_Memory_DDR_BW group provides counts to compute the total memory bandwidth as seen by the SoC memory controller.  The events provide a break-down of per channel and per rank bandwidth. While these events do not provide insight on which agent is demanding memory, it is the most accurate way to determine how much actual memory bandwidth is being consumed.

The number of memory channels for Baytrail is SKU dependent and may be either one or two channels.  If there is only one channel, all counts associated with the second channel will be zero.  For more information on memory channel architecture, please see: http://en.wikipedia.org/wiki/Multi-channel_memory_architecture.

The number of memory ranks is DRAM part specific and can be determined by referring to the DRAM part number's technical specifications provided by the supplier.  If there is only one rank, all counts associated with the second rank will be zero.  For more information on memory rank architecture, please see http://en.wikipedia.org/wiki/Memory_rank.

The groups UNC_SOC_MEMORY_DDR0_BW and UNC_SOC_MEMORY_DDR1_BW are subsets of this group, collecting only memory channel 0 or 1 respectively.

The image below illustrates the traffic flow being monitored for this group.

The table below documents the events contained in the UNC_SOC_Memory_BW group.

Name Counter Description
DDR_Chan0_Rank0_Read64B 0 Counts memory read requests to memory channel 0, rank 0.
DDR_Chan0_Rank1_Read64B 1 Counts memory read requests to memory channel 0, rank 1.
DDR_Chan0_Rank0_Write64B 2 Counts memory write requests to memory channel 0, rank 0.
DDR_Chan0_Rank1_Write64B 3 Counts memory write requests to memory channel 0, rank 1.
DDR_Chan1_Rank0_Read64B 4 Counts memory read requests to memory channel 1, rank 0.
DDR_Chan1_Rank1_Read64B 5 Counts memory read requests to memory channel 1, rank 1.
DDR_Chan1_Rank0_Write64B 6 Counts memory write requests to memory channel 1, rank 0.
DDR_Chan1_Rank1_Write64B 7 Counts memory write requests to memory channel 1, rank 1.

Analyzing Results

Bandwidth in terms of MB/s can be calculated for each event listed above as follows:

Event metric formula: event_count/seconds_sampled*64bytes/1000000bytes = MB/s

Events can be summed together to form the desired metric, for example:

  • Total memory bandwidth = sum of all event MB/s
  • Total read bandwidth = sum of all read event MB/s
  • Channel 0 bandwidth = sum of channel 0 event MB/s

Known behaviors

  1. If the Baytrail memory does not have two memory ranks, the second rank counts will be zero.
  2. If the Baytrail memory does not have two channels, the second channel counts will be zero.
  3. If the Baytrail has only one memory channel, then the MB/s metric will be 2X the actual value and can be fixed by dividing result by two, or multiplying counts by 32 bytes rather than 64 bytes.

UNC_SOC_DDR_Self_Refresh

UNC_SOC_DDR_Self_Refresh group provides counts of the memory hardware event self-refresh.  Self-refresh represents a low power state and can be used for power optimization of the SoC and application.

The table below documents the events contained in the UNC_SOC_DDR_Self_Refresh group.

Name Counter Description
DDR_Chan0_Self_Refresh 0 Counts the number of cycles that memory channel 0 is in self-refresh.
DDR_Chan1_Self_Refresh 1 Counts the number of cycles that memory channel 1 is in self-refresh.
Clock_Counter 2 SoC clock counter

Analyzing Results

Event metric formula: ((event_count * 100)/(time_interval * base_DRAM_frequency)= DDR Self-refresh Residency

Known behaviors

  1. If the Baytrail memory does not have two channels, the second channel counts will be zero.
  2. Counters 0, 1 may be running at a different source clock frequency than counter 2.

 

UNC_SOC_All_Reqs

The per agent requests count events contained in the group UNC_SOC_All_Reqs measures the total number of requests for the available SoC agents in a single, concurrent sampling.  The fact that it captures all agents concurrently makes this a key metric for studying any non-static or “bursty” workload.  Unlike the other bandwidth events which are sampled one or two at a time, this metric provides insight to all agents in a single time window.

Per agent bandwidth can be estimated by multiplying each request count by 64 bytes and total bandwidth can be estimated by summing the bandwidth of all agents.  It is critical to understand that the end result is only an estimate based on the assumption that each request is 64B and will over count when transaction are 32 bytes or partial sized requests.  The other disadvantage is that there is no read vs. write breakdown per agent.

For an exact bandwidth measurement with read and write breakdown, per agent bandwidth metric must be used one or two agents at a time.

Name Counter Description
Mod0_Reqs 0 Counts the number of requests from Silvermont module 0.
Mod1_Reqs 1 Counts the number of requests from Silvermont module 1.
GFX_Reqs 2 Counts the number of requests from the graphics controller.
Disp_Reqs 3 Counts the number of requests from the display controller.
Imaging_Reqs 4 Counts the number of requests from the imaging controller.
VED_Reqs 5 Counts the number of requests from the video encode/decode controller.
LowSpeedPF_Reqs 6 Counts the aggregate number of requests from the low speed peripheral fabric.
Clock_Counter 7 SoC clock counter

Analyzing Results

Event metric formula:

  • event_count/seconds_sampled*64bytes/1000000bytes = Estimated Agent MB/s
  • sum_of_all_event counters/seconds_sampled*64bytes/1000000bytes = Estimated DDR MB/s

 

Known behaviors

  1. It is important to remember that these events count transactions of any request size and that multiplying them by 64 bytes to calculate a MB/s metric is a generalization and may over count actual bandwidth observed at the memory channels.

UNC_SOC_Module0_BW

UNC_SOC_Module0_BW group provides counts to compute the bandwidth of processor module zero as seen by the system agent.  The events provide a break-down of per request type.

The image below illustrates the traffic flow being monitored for this group.

Name Counter Description
Mod0_ReadPartial 0 Counts all module 0 read transactions of any data size request. This event count is inclusive of partial, 32 byte and 64 byte transactions.
Mod0_Read32B 1 Counts memory read requests of size 32 bytes from Silvermont module 0.
Mod0_Read64B 2 Counts memory read requests of size 64 bytes from Silvermont module 0.
Mod0_WritePartial 3 Counts all module 0 write transactions of any data size request. This event count is inclusive of partial, 32 byte and 64 byte transactions.
Mod0_Write32B 4 Counts memory write requests of size 32 bytes from Silvermont module 0.
Mod0_Write64B 5 Counts memory write requests of size 64 bytes from Silvermont module 0.
Clock_Counter 6 SoC clock counter

Analyzing Results

Read and write bandwidth can be calculated for the 32 byte and 64 byte events, but the partial event is problematic for bandwidth computations since a partial request has an unknown payload size.  It is also vital to understand that the partials events for this group represent the sum of 64 byte, 32 byte and partial requests.  This partial event may also be thought of as total read or write count.

Event metric formula:

  • partial_requests - 32_byte_requests - 64_byte_requests = actual partial request count
  • (Mod0_Read32B_count * 32_bytes / seconds_sampled) + (Mod0_Read64B_count * 64_bytes / seconds_sampled) = Read MB/s
  • (Mod0_Write32B_count * 32_bytes / seconds_sampled) + (Mod0_Write64B_count * 64_bytes / seconds_sampled) = Write MB/s

Known behaviors

  1. The module 0 and module 1 partial event count include 32 byte, 64 byte and partial requests and can be thought of as a total request count.

UNC_SOC_Module1_BW

UNC_SOC_Module1_BW is identical to the UNC_SOC_Module0_BW group except that it is counting events from the second processor module.

UNC_SOC_Module0_1_BW

UNC_SOC_Module0_1_BW group provides counts to compute the bandwidth of both processor modules concurrently.  The partial events have been dropped in order to accomplish the concurrent measurement of module 0 and module 1.

Name Counter Description
Mod0_Read32B 0 Counts memory read requests of size 32 bytes from Silvermont module 0.
Mod0_Read64B 1 Counts memory read requests of size 64 bytes from Silvermont module 0.
Mod0_Write32B 2 Counts memory write requests of size 32 bytes from Silvermont module 0.
Mod0_Write64B 3 Counts memory write requests of size 64 bytes from Silvermont module 0.
Mod1_Read32B 4 Counts memory read requests of size 32 bytes from Silvermont module 1.
Mod1_Read64B 5 Counts memory read requests of size 64 bytes from Silvermont module 1.
Mod1_Write32B 6 Counts memory write requests of size 32 bytes from Silvermont module 1.
Mod1_Write64B 7 Counts memory write requests of size 64 bytes from Silvermont module 1.

 

UNC_SOC_Module0_1_Snoops

UNC_SOC_Module0_1_Snoops counts the number of snoop requests and snoop replies for Silvermont module 0 and 1, as seen by the system agent.  These counts can be used to confirm other traffic counts and correlate core snoop event counts.  Unlike the core based snoop events, the uncore snoop counts are not able to distinguish between cores within a module and count the total for the module rather than per core.

Name Event Description
Mod0_Snoop_Replies 0 Counts the number of snoop replies received from module 0.
Mod0_Snoop_Reqs 1 Counts the number of snoop requests sent to module 0.
Mod1_Snoop_Replies 2 Counts the number of snoop replies received from module 1.
Mod1_Snoop_Reqs 3 Counts the number of snoop requests sent to module 1.
Clock_Counter 4 SoC clock counter

Analyzing Results

Analyzing snoop results are usage model specific.

Known behaviors

  1. Snoop counts are per processor module totals with no per core break-down available.

UNC_SOC_Graphics_BW

UNC_SOC_Graphics_BW group provides counts to compute the bandwidth of the graphics controller, as seen by the system agent.  The events provide a break-down of per request type.

The image below illustrates the traffic flow being monitored by this group.

Name Counter Description
GFX_ReadPartial 0 Counts graphics controller read transactions with partial sized data requests.
GFX_Read32B 1 Counts memory read requests of size 32 bytes from the graphic controller.
GFX_Read64B 2 Counts memory read requests of size 64 bytes from the graphic controller.
GFX_WritePartial 3 Counts graphics controller write transactions with partial sized data requests.
GFX_Write32B 4 Counts memory write requests of size 32 bytes from the graphic controller.
GFX_Write64B 5 Counts memory write requests of size 64 bytes from the graphic controller.
Clock_Counter 6 SoC Clock Counter

Analyzing Results

Read and write bandwidth can be calculated for the 32 byte and 64 byte events, but the partial event is problematic for bandwidth computations since a partial request has an unknown payload size.

Event metric formula:

  • (GFX_Read32B_count * 32_bytes / seconds_sampled) + (GFX_Read64B_count * 64_bytes / seconds_sampled) = Read MB/s
  • (GFX_Write32B_count * 32_bytes / seconds_sampled) + (GFX_Write64B_count * 64_bytes / seconds_sampled) = Write MB/s

Known behaviors

  1. None.

UNC_SOC_Display_BW

The UNC_SOC_Display_BW group provides events to determine the bandwidth of the display controller and is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to Disp.

UNC_SOC_Imaging_BW

The UNC_SOC_Imaging_BW group provides events to determine the bandwidth of the imaging controller (imaging signal processor) and is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to Imaging.

UNC_SOC_LowSpeedPF_BW

The UNC_SOC_LowSpeedPF_BW group provides events to determine the bandwidth of the low speed peripheral fabric, representing aggregate bandwidth for all south cluster units such as: USB3, USB2, SATA, EMMC and audio.  It is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to LowSpeedPF.

UNC_SOC_VED_BW

The UNC_SOC_VED_BW group provides events to determine the bandwidth of the video encode and decode unit and is identical to the UNC_SOC_Graphics_BW group in event-counter configuration and analysis metrics.  Event names have changed from GFX to VED.

 

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.