[PCM] QPI traffic reported all zeros

[PCM] QPI traffic reported all zeros

Zheng L. posted:

Hello everyone, I try to get some data from the Intel Xeon E5-2687W by using PCU. Beause of the project reason, we are mainly interested finding in how the multi-thread using QPI to get the reading from the PCI card may effect the system. However, The incoming data traffic of QPI are always 0 and the outgoing data traffic are always 0 too. And I get some weird reading to. Is that possible that the reading is wrong?

Also I have two screen shots of that, but I find that the forum can not upload the picture. Is there any way that I can upload the picture so someone can help me analyze that?

Thank you very much.

Might it be that you have a second instance of PCM running? It might also be one that was not cleanly shut down?

 

50 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

 a screenshot of the complete output would be very helpful. Alternatively copy&paste it in the comment text.

--

Roman

If you are running a recent version of Linux and using the "perf" interface to the QPI counters, there is an error in the definition of the predefined QPI events for cacheable and non-cacheable data blocks transferred.  In both cases, the standard distributions fail to set the "extra bit" that is needed for those events.  Fortunately, it can be set manually using the "perf" interface.

The reference for the Linux kernel patch is https://lkml.org/lkml/2013/8/2/482

To set the bit manually, note that the prededined event programmed with the command:
                 # perf -e "uncore_qpi_0/event=drs_data/"
Is the same as
                 # perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
But it should be
                 # perf -e "uncore_qpi_0/event=0x102,umask=0x08/"

This last command returns the expected number of data cache lines transferred when I run the STREAM benchmark in a cross-socket configuration.  The same change to the event number causes the "ncb_data" event to return non-zero values as well, but I don't have a test case for that event. 

John D. McCalpin, PhD "Dr. Bandwidth"

Thank you very much. I already uploaded the screenshots. Hope thoses will help.

附件: 

附件尺寸
下载 screenshot-1.png166.41 KB
下载 screenshot.png175.76 KB
Come On !!! Do the research !!!

Quote:

Roman Dementiev (Intel) wrote:

 a screenshot of the complete output would be very helpful. Alternatively copy&paste it in the comment text.

--

Roman

Hello Roman,

I aleady uploaded the screenshop, hope that you help me with me problem.

Come On !!! Do the research !!!

Quote:

Thomas Willhalm (Intel) wrote:

Zheng L. posted:

Hello everyone, I try to get some data from the Intel Xeon E5-2687W by using PCU. Beause of the project reason, we are mainly interested finding in how the multi-thread using QPI to get the reading from the PCI card may effect the system. However, The incoming data traffic of QPI are always 0 and the outgoing data traffic are always 0 too. And I get some weird reading to. Is that possible that the reading is wrong?

Also I have two screen shots of that, but I find that the forum can not upload the picture. Is there any way that I can upload the picture so someone can help me analyze that?

Thank you very much.

Might it be that you have a second instance of PCM running? It might also be one that was not cleanly shut down?

 

Hello, I aleady uploaded the screenshot. Hope that helps. I am sure that I only run one instance of the program when I take the screen shot.

Come On !!! Do the research !!!

Quote:

John D. McCalpin wrote:

If you are running a recent version of Linux and using the "perf" interface to the QPI counters, there is an error in the definition of the predefined QPI events for cacheable and non-cacheable data blocks transferred.  In both cases, the standard distributions fail to set the "extra bit" that is needed for those events.  Fortunately, it can be set manually using the "perf" interface.

The reference for the Linux kernel patch is https://lkml.org/lkml/2013/8/2/482

To set the bit manually, note that the prededined event programmed with the command:
                 # perf -e "uncore_qpi_0/event=drs_data/"
Is the same as
                 # perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
But it should be
                 # perf -e "uncore_qpi_0/event=0x102,umask=0x08/"

This last command returns the expected number of data cache lines transferred when I run the STREAM benchmark in a cross-socket configuration.  The same change to the event number causes the "ncb_data" event to return non-zero values as well, but I don't have a test case for that event. 

Hello John,

I have not try perf yet. I just tried the PCM of Intel. I will try perf to get the some result too.

Come On !!! Do the research !!!

Zheng L.,

could you please post the whole output including all the messages in the beginning just from the program invocation. There should be a couple of diagnostic messages that help to understand the issue.

Thanks,

Roman

Quote:

Roman Dementiev (Intel) wrote:

Zheng L.,

could you please post the whole output including all the messages in the beginning just from the program invocation. There should be a couple of diagnostic messages that help to understand the issue.

Thanks,

Roman

[root@cybermech IntelPerformanceCounterMonitorV2.5.1]# ./pcm.x 10

 Intel(r) Performance Counter Monitor V2.5.1 (2013-06-25 13:44:03 +0200 ID=76b6d1f)

 Copyright (c) 2009-2012 Intel Corporation

Num logical cores: 16
Num sockets: 2
Threads per core: 1
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3100000000 Hz
Package thermal spec power: 150 Watt; Package minimum power: 65 Watt; Package maximum power: 230 Watt; 
ERROR: Requested bus number 64 is larger than the max bus number 63
Can not access SNB-EP (Jaketown) PCI configuration space. Access to uncore counters (memory and QPI bandwidth) is disabled.
You must be root to access these SNB-EP counters in PCM. 
Number of PCM instances: 2

Detected Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz "Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown"

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature

 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE | TEMP

   0    0     0.12   1.43   0.08    1.00     110 K   4508 K    0.98    0.47    0.01    0.07     N/A     N/A     36
   1    0     0.06   1.29   0.05    1.00      96 K   3233 K    0.97    0.42    0.01    0.08     N/A     N/A     33
   2    0     0.02   1.20   0.01    1.00      22 K    894 K    0.97    0.46    0.01    0.08     N/A     N/A     34
   3    0     0.13   1.78   0.07    1.00      46 K   1806 K    0.97    0.68    0.00    0.03     N/A     N/A     32
   4    0     0.00   0.77   0.00    1.00    4006       55 K    0.93    0.64    0.04    0.10     N/A     N/A     34
   5    0     0.00   1.18   0.00    1.00    3611       68 K    0.95    0.50    0.03    0.10     N/A     N/A     32
   6    0     0.00   1.22   0.00    1.00    2468       54 K    0.95    0.41    0.02    0.08     N/A     N/A     31
   7    0     0.00   0.79   0.00    1.00      16 K    208 K    0.92    0.48    0.05    0.13     N/A     N/A     34
   8    1     0.00   1.18   0.00    1.00      48 K    230 K    0.79    0.47    0.10    0.08     N/A     N/A     22
   9    1     0.01   1.17   0.01    1.00      78 K    453 K    0.83    0.38    0.06    0.06     N/A     N/A     23
  10    1     0.00   1.39   0.00    1.00    7367       45 K    0.84    0.54    0.05    0.05     N/A     N/A     23
  11    1     0.00   0.74   0.00    1.00    1211     7967      0.85    0.34    0.09    0.13     N/A     N/A     22
  12    1     0.00   0.88   0.00    1.00    1002     5663      0.82    0.33    0.11    0.12     N/A     N/A     22
  13    1     0.00   0.96   0.00    1.00     818     4268      0.81    0.35    0.11    0.11     N/A     N/A     22
  14    1     0.00   0.96   0.00    1.00     779     3867      0.80    0.31    0.11    0.11     N/A     N/A     23
  15    1     0.00   1.19   0.00    1.00    7827       25 K    0.69    0.35    0.09    0.05     N/A     N/A     21
-------------------------------------------------------------------------------------------------------------------
 SKT    0     0.04   1.49   0.03    1.00     302 K     10 M    0.97    0.51    0.01    0.06    0.00    0.00     31
 SKT    1     0.00   1.18   0.00    1.00     145 K    776 K    0.81    0.42    0.07    0.06    0.00    0.00     21
-------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.02   1.47   0.01    1.00     448 K     11 M    0.96    0.50    0.01    0.06    0.00    0.00     N/A

 Instructions retired:   10 G ; Active cycles: 7274 M ; Time (TSC):   30 Gticks ; C0 (active,non-halted) core residency: 1.47 %

 C1 core residency: 98.53 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %

 PHYSICAL CORE IPC                 : 1.47 => corresponds to 36.86 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.02 => corresponds to 0.54 % core utilization over time interval

Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links):

               QPI0     QPI1    |  QPI0   QPI1  
----------------------------------------------------------------------------------------------
 SKT    0        0        0     |  -2147483648%   -2147483648%   
 SKT    1        0        0     |  -2147483648%   -2147483648%   
----------------------------------------------------------------------------------------------
Total QPI incoming data traffic:    0       QPI data traffic/Memory controller traffic: -nan

Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links):

               QPI0     QPI1    |  QPI0   QPI1  
----------------------------------------------------------------------------------------------
 SKT    0     9223372 T   9223372 T   |  -2147483648%   -2147483648%   
 SKT    1     9223372 T   9223372 T   |  -2147483648%   -2147483648%   
----------------------------------------------------------------------------------------------
Total QPI outgoing data and non-data traffic:    0  

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 549.47 Joules
 SKT    1 package consumed 560.42 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    1109.89 Joules

----------------------------------------------------------------------------------------------
 SKT    0 DIMMs consumed 0.00 Joules
 SKT    1 DIMMs consumed 0.00 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                  0.00 Joules

Come On !!! Do the research !!!

Quote:

Roman Dementiev (Intel) wrote:

Zheng L.,

could you please post the whole output including all the messages in the beginning just from the program invocation. There should be a couple of diagnostic messages that help to understand the issue.

Thanks,

Roman

Hi Roman,

I already copied all the output, hope that will be more helpful.

Come On !!! Do the research !!!

Thanks. This output is helpful. Could you please send the output of lspci command (as root) for further diagnosis?

--

Roman

Also could you specify the vendor of the system (Dell/HP/etc) and the BIOS vendor/version (seen in the output of Linux "dmidecode" command)?

Thanks,

Roman

Quote:

Roman Dementiev (Intel) wrote:

Thanks. This output is helpful. Could you please send the output of lspci command (as root) for further diagnosis?

--

Roman

Hello Roman,

I have put the all the output of the lspci into the lspci file. You can look at it.

附件: 

附件尺寸
下载 lspci.txt12.74 KB
Come On !!! Do the research !!!

Quote:

Roman Dementiev (Intel) wrote:

Also could you specify the vendor of the system (Dell/HP/etc) and the BIOS vendor/version (seen in the output of Linux "dmidecode" command)?

Thanks,

Roman

Hi Roman,

The system is Dell, and I upload output of the dmidecode command as an attachment, dmidecode.txt.

附件: 

附件尺寸
下载 dmidecode.txt40.25 KB
Come On !!! Do the research !!!

Zheng Luo,

thanks a lot for the detailed output. I have developed a patch that will allow you to show memory bandwidth from memory controller (attached). Apply it using "patch < bus_dell_patch.txt".

According to lspci output your BIOS hides the QPI performance monitoring devices therefore QPI statistics can not be available (you should see a message similar to one decribed in this article). Could you please try to find a newer BIOS for your system install it and try PCM again? It could be that you need to enable QPI/perfmon/PCM option (or similar) in the BIOS to unhide the QPI performance monitoring devices. If you can't find the option you might ask the vendor to provide such option.

Best regards,

Roman

附件: 

附件尺寸
下载 bus-dell-patch.txt1.73 KB

Quote:

Roman Dementiev (Intel) wrote:

Zheng Luo,

thanks a lot for the detailed output. I have developed a patch that will allow you to show memory bandwidth from memory controller (attached). Apply it using "patch < bus_dell_patch.txt".

According to lspci output your BIOS hides the QPI performance monitoring devices therefore QPI statistics can not be available (you should see a message similar to one decribed in this article). Could you please try to find a newer BIOS for your system install it and try PCM again? It could be that you need to enable QPI/perfmon/PCM option (or similar) in the BIOS to unhide the QPI performance monitoring devices. If you can't find the option you might ask the vendor to provide such option.

Best regards,

Roman

Hello Roman, 

Thank you very much for the help. I really apprieated it. I updated the BIOS of Dell T7600 from A5 to A9. You can see the attachement that I attached ()  dmidecode_after_BIOS_update.txt. I compared it with the previous one. It seems that there are not too many changes. The  diff_result.txt will show the difference between the previous dmidecoe result and the result after the BIOS update.

I updated the PCM by using the your patch. Thanks for you patch.  I run the PCM, the PQI reading still have the problem. I attached the new reading too (PCM_output.txt).

I checked the setting in the BIOS, I did not find anything that is related to setting the QPI. In our BIOS setting, we did disable the HyperThread, Intel TurboBoost to make things easier for our project. I think will not effect the QPI result of PCM. Am I right?

You said that If I can't find the option you might ask the vendor to provide such option, and I will try that. Really grateful to your help.

Zheng Luo

附件: 

Come On !!! Do the research !!!

Zheng Luo,

This version of BIOS seems to be much better. Your BIOS settings changes should not impact QPI perfmon device visibility. Could you please send me lspci output?

I noticed in your PCM output the line "Number of PCM instances: 2". It may happen that you run two instances of PCM or sometime ago a PCM instance has been killed unexpectedly such that PCM instance counting becomes broken (this may also break QPI statistics display in this version of PCM: to be fixed in the next release). To clean the instance counting please do the following:

1. stop all pcm instances

2. as root: rm -rf /dev/shm/sem.*Intel*

3. start pcm again

Please share the output of pcm started after you did these operation.

Thanks for your cooperation,

Roman

When we got the BIOS update from Dell that supported QPI performance counter access, the default setting was to not enable access.  We had to select an option to enable QPI performance counter access and then reboot.  Fortunately this only needed to be done once.

John D. McCalpin, PhD "Dr. Bandwidth"

Quote:

John D. McCalpin wrote:

When we got the BIOS update from Dell that supported QPI performance counter access, the default setting was to not enable access.  We had to select an option to enable QPI performance counter access and then reboot.  Fortunately this only needed to be done once.

Hello John,

Are you using the same computer as me ? I am using Dell T7600, what model are you using? Even after the update, I can still not see the QPI option. If you are using the same model as me, can you tell me where I can find the option in the BIOS that can turn on the QPI? Thank you very much. 

Zheng Luo

Come On !!! Do the research !!!

Quote:

Roman Dementiev (Intel) wrote:

Zheng Luo,

This version of BIOS seems to be much better. Your BIOS settings changes should not impact QPI perfmon device visibility. Could you please send me lspci output?

I noticed in your PCM output the line "Number of PCM instances: 2". It may happen that you run two instances of PCM or sometime ago a PCM instance has been killed unexpectedly such that PCM instance counting becomes broken (this may also break QPI statistics display in this version of PCM: to be fixed in the next release). To clean the instance counting please do the following:

1. stop all pcm instances

2. as root: rm -rf /dev/shm/sem.*Intel*

3. start pcm again

Please share the output of pcm started after you did these operation.

Thanks for your cooperation,

Roman

Thank you very much Roman,

Now the PCM seems working now, at least there is no arbitrary value in the regarding to QPI reading. However I tried to use the commands mentioned in http://software.intel.com/en-us/forums/topic/280235#comment-1755207 to generated some traffic for QPI. The commands that I used are numactl --cpunodebind=0 --membind=0 ./lat_mem_rd -t 1024 and numactl --cpunodebind=0 --membind=1 ./lat_mem_rd -t 1024. The strange thing is that all the data in the QPI field is zero, I don't why? Is that because those command does not generate any traffic for the QPI. If so, how can I generate traffic for the QPI?

By the way, how do you close the PCM instance? I just use Ctrl + C to close the program...

Zheng Luo

附件: 

Come On !!! Do the research !!!

We are running Dell DCS8000 systems, so the BIOS is likely different than yours.   I don't know what command was used to enable the PCI configuration space areas for the QPI counters and was not able to find it in the list of options that I checked.   It is probably best to follow up with Dell.

John D. McCalpin, PhD "Dr. Bandwidth"

Quote:

John D. McCalpin wrote:

We are running Dell DCS8000 systems, so the BIOS is likely different than yours.   I don't know what command was used to enable the PCI configuration space areas for the QPI counters and was not able to find it in the list of options that I checked.   It is probably best to follow up with Dell.

Thank you very much.

Come On !!! Do the research !!!

Quote:

Zheng Luo wrote:

Quote:

Roman Dementiev (Intel)wrote:

Zheng Luo,

This version of BIOS seems to be much better. Your BIOS settings changes should not impact QPI perfmon device visibility. Could you please send me lspci output?

I noticed in your PCM output the line "Number of PCM instances: 2". It may happen that you run two instances of PCM or sometime ago a PCM instance has been killed unexpectedly such that PCM instance counting becomes broken (this may also break QPI statistics display in this version of PCM: to be fixed in the next release). To clean the instance counting please do the following:

1. stop all pcm instances

2. as root: rm -rf /dev/shm/sem.*Intel*

3. start pcm again

Please share the output of pcm started after you did these operation.

Thanks for your cooperation,

Roman

Thank you very much Roman,

Now the PCM seems working now, at least there is no arbitrary value in the regarding to QPI reading. I will do more test with that. By the way, how do you close the PCM instance? I just use Ctrl + C to close the program...

Also I use the command 

numactl --cpunodebind=0 --membind=0 ./lat_mem_rd -t 1024

numactl --cpunodebind=0 --membind=1 ./lat_mem_rd -t 1024

mentioned in http://software.intel.com/en-us/forums/topic/280235#comment-1755207 to generated traffic in the QPI, but all the data that I read are zeros, this time there is no arbitrary values, but all the QPI data is zero. Is there any other way to generate traffic in the QPI so that I may see the non-zero data? 

Zheng Luo

Come On !!! Do the research !!!

Zheng Luo,

yes, using Ctrl-C to stop PCM is fine.

Could you share your current output from pcm.x?

You can also try single threaded memory bandwidth test included in PCM:

make memoptest

# reading

numactl --cpunodebind=0 --membind=1 ./memoptest 0

# writing

numactl --cpunodebind=0 --membind=1 ./memoptest 1

# writing using non-temporal streaming stores

numactl --cpunodebind=0 --membind=1 ./memoptest 2

Thanks,

Roman

Just had seen the copy of your post above with the output:

ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2). ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2). ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2). ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

Quote:

Roman Dementiev (Intel) wrote:

Just had seen the copy of your post above with the output:

ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2). ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2). ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2). ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

Hi Roman,

If I get these error message, does that mean the QPI counter still not working? Therefore, I got all the zero result. And those probems are still caused by the BIOS?

Zheng Luo 

Come On !!! Do the research !!!

correct

You can tell whether the QPI link-layer performance counters are accessible from the lspci output.   The text description may vary, but for each socket (in your case these will start with "1f:" and "3f:") the output of lspci should include lines like:

1f:08.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 0 Performance Monitor (rev 07)
1f:09.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 1 Performance Monitor (rev 07)
3f:08.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 0 Performance Monitor (rev 07)
3f:09.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 1 Performance Monitor (rev 07)

The first 7 characters are all that matters -- the first field defines the "bus" (in this case the bus number corresponds to a particular socket), the second field defines the "slot" (sometimes referred to as the "device"), and the third field defines the "function".    For each socket, slot 8, function 2 corresponds to the QPI interface 0 link layer the hardware performance counters, while slot 8, function 2 corresponds to the QPI interface 1 link layer hardware performance counters.

The QPI link layer counters have additional functionality (for "mask" and "match" functions) under devices 8/9, function 6.   I don't know if the Intel PCM or VTune tools use these, but from reading the documentation (Xeon E5-2600 uncore performance monitoring guide) they like they could be useful for some detailed analyses.

With the latest BIOS updates on our systems, we were able to use the "setupbios" utility from Dell to enable access to these PCI configuration space areas.  The text descriptions from lspci don't make any sense -- they don't include the word "performance" or the acronym "QPI" -- but the functions work fine. 

Attempts to read devices/functions in PCI configuration space that are unsupported or non-existent will result in a "master abort" response (all bits set -- i.e., 0xFFFFFFFF).   Most software recognizes this case and returns zero values for the counters rather than aborting execution of the performance monitoring program.

John D. McCalpin, PhD "Dr. Bandwidth"

Quote:

Roman Dementiev (Intel) wrote:

correct

OK, I see, I will contact Dell to see if they can give any help. Thank you very much.

Zheng Luo

Come On !!! Do the research !!!

Quote:

Roman Dementiev (Intel) wrote:

correct

Thank you for the help Roman. I will try to contact Dell to see if I can get any help from there.

Zheng Luo

Come On !!! Do the research !!!

Quote:

John D. McCalpin wrote:

You can tell whether the QPI link-layer performance counters are accessible from the lspci output.   The text description may vary, but for each socket (in your case these will start with "1f:" and "3f:") the output of lspci should include lines like:

1f:08.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 0 Performance Monitor (rev 07)
1f:09.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 1 Performance Monitor (rev 07)
3f:08.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 0 Performance Monitor (rev 07)
3f:09.2 Performance counters: Intel Corporation Sandy Bridge QPI Port 1 Performance Monitor (rev 07)

The first 7 characters are all that matters -- the first field defines the "bus" (in this case the bus number corresponds to a particular socket), the second field defines the "slot" (sometimes referred to as the "device"), and the third field defines the "function".    For each socket, slot 8, function 2 corresponds to the QPI interface 0 link layer the hardware performance counters, while slot 8, function 2 corresponds to the QPI interface 1 link layer hardware performance counters.

The QPI link layer counters have additional functionality (for "mask" and "match" functions) under devices 8/9, function 6.   I don't know if the Intel PCM or VTune tools use these, but from reading the documentation (Xeon E5-2600 uncore performance monitoring guide) they like they could be useful for some detailed analyses.

With the latest BIOS updates on our systems, we were able to use the "setupbios" utility from Dell to enable access to these PCI configuration space areas.  The text descriptions from lspci don't make any sense -- they don't include the word "performance" or the acronym "QPI" -- but the functions work fine. 

Attempts to read devices/functions in PCI configuration space that are unsupported or non-existent will result in a "master abort" response (all bits set -- i.e., 0xFFFFFFFF).   Most software recognizes this case and returns zero values for the counters rather than aborting execution of the performance monitoring program.

Thanks for you reply John. In your reply you mentioned that "Attempts to read devices/functions in PCI configuration space that are unsupported or non-existent will result in a "master abort" response (all bits set -- i.e., 0xFFFFFFFF).   Most software recognizes this case and returns zero values for the counters rather than aborting execution of the performance monitoring program". Therefore, it is the reason that I get all the zero values? 

Come On !!! Do the research !!!

Yes -- your "lspci" output did not list the device 8/9, function 2 entries, and your previous posting showed that you were getting the master abort response:

         "ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

The Intel PCM software recognizes this case and simply fills the count fields with zeros.  Perhaps an "N/A" field would be more appropriate, but that would still leave you wondering if the field was "Not Available" (as in your case) or "Not Applicable" (for example on a single-socket chip that has no QPI interfaces).

John D. McCalpin, PhD "Dr. Bandwidth"

Quote:

John D. McCalpin wrote:

Yes -- your "lspci" output did not list the device 8/9, function 2 entries, and your previous posting showed that you were getting the master abort response:

         "ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0xffffffff Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

The Intel PCM software recognizes this case and simply fills the count fields with zeros.  Perhaps an "N/A" field would be more appropriate, but that would still leave you wondering if the field was "Not Available" (as in your case) or "Not Applicable" (for example on a single-socket chip that has no QPI interfaces).

I see it. Thank you very much for your explanation. I found something in the forum may be helpful, and I will try that. http://software.intel.com/en-us/forums/topic/385194  He asked Dell to provide an experimental BIOS, and that actually works.

Zheng Luo

Come On !!! Do the research !!!

Hello!

I am trying to use Intel PCM to measure the L3 cache misses for a program. I create the PCM object in my program as instructed at https://software.intel.com/en-us/articles/intel-performance-counter-moni....

I compile my file using :
icc -O3 -o myfile myfile.cpp ../PCM/cpucounters.cpp ../PCM/msr.cpp ../PCM/pci.cpp ../PCM/client_bw.cpp -AVX -xhost

When I execute, I get the output as shown in Download pcm-output.txt

As you can see the QPI link layer performance counters are possible not accessible.

When I do a > lspci | grep QPI, I get the following output:
7f:08.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 0 (rev 07)
7f:09.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 1 (rev 07)
ff:08.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 0 (rev 07)
ff:09.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 1 (rev 07)

The model of my machine is: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz

Please suggest as to how to get around this problem.

As you can see from the prior comments in this forum thread, your BIOS is not exposing the QPI link-layer counters.  On your system these would be devices 7f:08.2, 7f:09.2, ff:08.2, ff:09.2, with associated "mask/match" controls at 7f:08.6, 7f:09.6, ff08.6, ff09.6.

Your BIOS may or may not actually support these devices.  As I noted above, once I received the BIOS upgrade that supported these devices I still could not see them.  I had to change a BIOS setting and reboot to actually make them visible.

So you need to work with your system vendor to find out if your BIOS supports these devices, and if so, what BIOS option is required to actually enable them.

John D. McCalpin, PhD "Dr. Bandwidth"

Hey!

I've got a related issue with a SM quad socket system (X9QRi-F) and 4x Xeon E5-4620 v2.
I checked already the BIOS settings for related options to enable/disable the QPI LL counters, but without success.
I'm affraid that the system does not support the QPI link-layer counters, as mentioned in previous posts, but maybe you can give me hint whether there is an issue with the pcm tool or not ... I attached the output of pcm and lspci.

The lspci output contains a line like

Performance counters: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)

for each socket. Can I assume, that the QPI LL counters are available or which kind of monitoring device is listed here?

Thank you for your help!

Kind regards,

Daniel S.

附件: 

附件尺寸
下载 lspci_output.txt25.4 KB
下载 pcm.x_output.txt13.37 KB

Daniel,

Most probably you have this issue: https://software.intel.com/en-us/articles/bios-preventing-access-to-qpi-performance-counters

It hides these registers (which are also missing on your system according to the lspci output you have provided):

QPI0 Port 0 PMON Registers D8:F2

QPI0 Port 1 PMON Registers D9:F2

QPI1 Port 2 PMON Registers D24:F2

QPI0 Mask/Match Port 0 PMON Registers D8:F6

QPI0 Mask/Match Port 1 PMON Registers D9:F6

QPI1 Mask/Match Port 2 PMON Registers D24:F6

Roman

Hi,

I was looking at the pcm-numa.x output from 2.8 release. I  want to understand what the column names Local and Remote DRAM access means. For e.g. I am running a database workload and trying to understand how much traffic crosses across sockets in my 4 socket machine. 

 Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz "Intel(r) microarchitecture codename Ivy Bridge-EP/EN/EX/Ivytown"

My question is how do I interpret these numbers for local and remote dram access. What does 268K and 187K means exactly in the first line below. It would have been better to quantify the QPI traffic, but as this post states, I am also getting 0, as the qpi traffic always, so I plan to look at the BIOS related settings today with my sys admin. 

If I can not get QPI numbers directly, Is there any way I can find out what is the cumulative traffic across sockets, using the cumulative Remote DRAM access? What is considered a DRAM access and what size granularity it is, because there could be also prefetching in sequential access case, but  in a random access case the access could be different such that only a single page is touched.  

Please help me understand these numbers to make some better interpretation of my database workload. I have two cases here, one is a scan operator, which I expect to have a sequential access if it comes from remote dram, and a hash table lookup kind of access, which could be a random remote dram kind of access during probe phase in a remote dram containing hash table. 

I tried looking at the code below, but could not understand it. 

https://github.com/erikarn/intel-pcm/blob/master/src/pcm-numa.cpp

 

Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses 
   0   0.16         79 M      485 M       268 K               187 K              
   1   0.13         59 M      455 M       262 K               139 K              
   2   0.13         60 M      452 M       263 K               137 K              
   3   0.13         58 M      447 M       257 K               135 K              
   4   0.13         59 M      448 M       255 K               138 K              

Ok. So I tried checking if BIOS has some parameters that I can enable for QPI. I saw in QPI configuration QPI Link0s and QPI Link0p, and on enabling them the server refused to even show bios menu. I checked Internet for these parameters and they seem to be harmless from readings on forums. 

Apart from this I could not understand which parameters to play with in BIOS for enabling the QPI monitoring support. The motherboard is following.

    product: X9QR7-TF+/X9QRi-F+ (070715D9)
    vendor: Supermicro

If you have any insight to enable monitoring this, that would be great. I am pasting output of lscpi below.  and ls -al /dev/mem

 ls -al /dev/mem  --->
crw-r----- 1 root kmem 1, 1 Feb 12 10:02 /dev/mem

lspci --->

 

00:00.0 Host bridge: Intel Corporation Xeon E5 v2/Core i7 DMI2 (rev 04)

00:02.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 04)
00:03.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 3a (rev 04)
00:04.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 0 (rev 04)
00:04.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 1 (rev 04)
00:04.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 2 (rev 04)
00:04.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 3 (rev 04)
00:04.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 4 (rev 04)
00:04.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 5 (rev 04)
00:04.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 6 (rev 04)
00:04.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 7 (rev 04)
00:05.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
00:05.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 IIO RAS (rev 04)
00:05.4 PIC: Intel Corporation Xeon E5 v2/Core i7 IOAPIC (rev 04)
00:11.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 06)
00:16.0 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #1 (rev 05)
00:16.1 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #2 (rev 05)
00:1a.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 (rev 06)
00:1d.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation C600/X79 series chipset LPC Controller (rev 06)
00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 06)
00:1f.3 SMBus: Intel Corporation C600/X79 series chipset SMBus Host Controller (rev 06)
00:1f.6 Signal processing controller: Intel Corporation C600/X79 series chipset Thermal Management Controller (rev 06)
01:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
04:01.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
3f:08.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
3f:09.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
3f:0a.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
3f:0a.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
3f:0a.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
3f:0a.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
3f:0b.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
3f:0b.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
3f:0c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
3f:0e.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
3f:0f.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
3f:0f.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
3f:0f.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:0f.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:0f.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:0f.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:10.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
3f:10.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
3f:10.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
3f:10.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
3f:10.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
3f:10.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
3f:10.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 2 (rev 04)
3f:10.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
3f:13.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
3f:13.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
3f:13.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
3f:13.5 Performance counters: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
3f:16.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 System Address Decoder (rev 04)
3f:16.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
3f:16.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
3f:1c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
3f:1c.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
3f:1d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Target Address/Thermal Registers (rev 04)
3f:1d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 RAS Registers (rev 04)
3f:1d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
3f:1d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
3f:1d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
3f:1d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
3f:1e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 0 (rev 04)
3f:1e.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 1 (rev 04)
3f:1e.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 0 (rev 04)
3f:1e.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 1 (rev 04)
3f:1e.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 2 (rev 04)
3f:1e.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 3 (rev 04)
3f:1e.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 2 (rev 04)
3f:1e.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 3 (rev 04)
40:01.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 1a (rev 04)
40:04.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 0 (rev 04)
40:04.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 1 (rev 04)
40:04.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 2 (rev 04)
40:04.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 3 (rev 04)
40:04.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 4 (rev 04)
40:04.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 5 (rev 04)
40:04.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 6 (rev 04)
40:04.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 7 (rev 04)
40:05.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
40:05.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 IIO RAS (rev 04)
40:05.4 PIC: Intel Corporation Xeon E5 v2/Core i7 IOAPIC (rev 04)
41:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)
7f:08.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
7f:09.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
7f:0a.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
7f:0a.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
7f:0a.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
7f:0a.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
7f:0b.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
7f:0b.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
7f:0c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
7f:0e.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
7f:0f.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
7f:0f.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
7f:0f.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:0f.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:0f.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:0f.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:10.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
7f:10.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
7f:10.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
7f:10.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
7f:10.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
7f:10.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
7f:10.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 2 (rev 04)
7f:10.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
7f:13.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
7f:13.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
7f:13.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
7f:13.5 Performance counters: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
7f:16.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 System Address Decoder (rev 04)
7f:16.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
7f:16.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
7f:1c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
7f:1c.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
7f:1d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Target Address/Thermal Registers (rev 04)
7f:1d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 RAS Registers (rev 04)
7f:1d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
7f:1d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
7f:1d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
7f:1d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
7f:1e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 0 (rev 04)
7f:1e.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 1 (rev 04)
7f:1e.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 0 (rev 04)
7f:1e.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 1 (rev 04)
7f:1e.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 2 (rev 04)
7f:1e.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 3 (rev 04)
7f:1e.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 2 (rev 04)
7f:1e.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 3 (rev 04)
80:04.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 0 (rev 04)
80:04.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 1 (rev 04)
80:04.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 2 (rev 04)
80:04.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 3 (rev 04)
80:04.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 4 (rev 04)
80:04.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 5 (rev 04)
80:04.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 6 (rev 04)
80:04.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 7 (rev 04)
80:05.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
80:05.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 IIO RAS (rev 04)
80:05.4 PIC: Intel Corporation Xeon E5 v2/Core i7 IOAPIC (rev 04)
bf:08.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
bf:09.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
bf:0a.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
bf:0a.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
bf:0a.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
bf:0a.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
bf:0b.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
bf:0b.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
bf:0c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0c.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0c.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0c.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0c.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0c.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
bf:0e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
bf:0e.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
bf:0f.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
bf:0f.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
bf:0f.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
bf:0f.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
bf:0f.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
bf:0f.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
bf:10.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
bf:10.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
bf:10.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
bf:10.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
bf:10.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
bf:10.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
bf:10.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 2 (rev 04)
bf:10.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
bf:13.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
bf:13.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
bf:13.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
bf:13.5 Performance counters: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
bf:16.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 System Address Decoder (rev 04)
bf:16.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
bf:16.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
bf:1c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
bf:1c.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
bf:1d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Target Address/Thermal Registers (rev 04)
bf:1d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 RAS Registers (rev 04)
bf:1d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
bf:1d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
bf:1d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
bf:1d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
bf:1e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 0 (rev 04)
bf:1e.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 1 (rev 04)
bf:1e.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 0 (rev 04)
bf:1e.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 1 (rev 04)
bf:1e.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 2 (rev 04)
bf:1e.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 3 (rev 04)
bf:1e.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 2 (rev 04)
bf:1e.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 3 (rev 04)
c0:02.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 04)
c0:02.2 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 2c (rev 04)
c0:03.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 3a (rev 04)
c0:04.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 0 (rev 04)
c0:04.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 1 (rev 04)
c0:04.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 2 (rev 04)
c0:04.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 3 (rev 04)
c0:04.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 4 (rev 04)
c0:04.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 5 (rev 04)
c0:04.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 6 (rev 04)
c0:04.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Crystal Beach DMA Channel 7 (rev 04)
c0:05.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
c0:05.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 IIO RAS (rev 04)
c0:05.4 PIC: Intel Corporation Xeon E5 v2/Core i7 IOAPIC (rev 04)
c1:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
c1:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
c3:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
c4:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
ff:08.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
ff:09.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
ff:0a.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
ff:0a.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
ff:0a.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
ff:0a.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
ff:0b.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
ff:0b.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
ff:0c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0c.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0c.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0c.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0c.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0c.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
ff:0e.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
ff:0f.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
ff:0f.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
ff:0f.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:10.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
ff:10.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
ff:10.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
ff:10.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
ff:10.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
ff:10.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
ff:10.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 2 (rev 04)
ff:10.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
ff:13.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
ff:13.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
ff:13.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
ff:13.5 Performance counters: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
ff:16.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 System Address Decoder (rev 04)
ff:16.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
ff:16.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
ff:1c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
ff:1c.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 1 (rev 04)
ff:1d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Target Address/Thermal Registers (rev 04)
ff:1d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 RAS Registers (rev 04)
ff:1d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
ff:1d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
ff:1d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
ff:1d.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel Target Address Decoder Registers (rev 04)
ff:1e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 0 (rev 04)
ff:1e.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 1 (rev 04)
ff:1e.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 0 (rev 04)
ff:1e.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 1 (rev 04)
ff:1e.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 2 (rev 04)
ff:1e.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 Thermal Control 3 (rev 04)
ff:1e.6 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 2 (rev 04)
ff:1e.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel 0-3 ERROR Registers 3 (rev 04)

 

 

 

 

 

 

Are you sure you have the latest BIOS installed? Sometimes the vendor adds features in newer BIOS releases but this is no guarantee. When flashing your BIOS please make sure you follow proper procedure, things can go very wrong if you do not do so.

I do not have a control over the bios upgrade as the machine I am dealing is monitored by a sys admin and he does the maintenance work. We reverted back to the default factory setting and now the machine seems to boot and I can run the pcm-numa.x command now, though earlier I would use ctrl+c to kill it to get a log, without sudo kill, but now since the reboot It needs me to do a sudo kill else it ignores the ctrl+c totally. 

so I think I will not be able to get anything out of bios since I am working towards a deadline and the sys admin schedule might not allow upgrade before that, but that is also not the main problem, even if we do an upgrade we are not sure we will resolve the qpi showing correct values in pcm tool output. So for time being, I want to see, if I can make the numbers in remote and local memory access make some sense. 

I am running some experiments using memory mapped io and trying to see the affinity of data access with respect to memory, but the numbers on local and remote are not matching my expectations. For e.g. when I expect minimal remote memory access when my task is parallelizable and I expect memory mapped io to map data affinity to sockets to get minimal remote memory access, but I do not see that. 

That is the reason, I would highly appreciate if some body could tell me the meaning of these numbers and how to interpret them, the way I asked in my first post here. Please let me know if any insights into interpreting them. 

Thank you very much. 

As I noted above in https://software.intel.com/en-us/forums/topic/473955#comment-1756915, you need devices 8.2 and 9.2 on each socket (buses 1f, 7f, bf, ff) to get the QPI link-layer counters, and those are clearly not present in the lspci output.

There are other counters that can identify local and remote DRAM accesses.  Several of these have bugs on Sandy Bridge -- I don't know if they have been fixed in Ivy Bridge.   Intel's VTune includes (partial) workarounds for the bugs on Sandy Bridge platforms -- I don't know if PCM does this (or if it is necessary).

The "local DRAM accesses" and "remote DRAM accesses" are almost certainly counting cache line transfers.

Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses 
   0   0.16         79 M      485 M       268 K               187 K              
   1   0.13         59 M      455 M       262 K               139 K              
   2   0.13         60 M      452 M       263 K               137 K              
   3   0.13         58 M      447 M       257 K               135 K              
   4   0.13         59 M      448 M       255 K               138 K              

The IPC is very low here -- indicating a lot of stalls.  The local access rate for these 5 cores is about 1.305 Million cache lines 83.52 million bytes.  Assuming 2.4 GHz operation, the 485 million cycles is 0.2 seconds, so the local bandwidth is 83.5 MB/0.2 seconds = 413 MB/s, which is very low.  Remote bandwidth is lower -- about 235 MB/s.

Another way to look at this is that these 5 threads have 1.3 M local DRAM accesses in 485 M cycles, or one access per 372 cycles.    At 2.4 GHz, this is one access every 155 ns.   I am guessing that there are other threads running at the same time, and their references need to be added into the formula.   The remote access rate of 0.74 M accesses in 485 M cycles is one access every 653 cycles, or one per 272 ns.  

All of these access rates seem slow, but the very low IPC suggests lots of stalls, so memory accesses seem like the right place to be looking.  You will want to look for remote vs local DRAM access rates across all the cores in the system to see if there is an imbalance that is slowing things down.  You might also want to look at the DRAM traffic counters on each chip to see if the accesses are close to uniformly distributed.

On the Xeon E5-46xx v1 (Sandy Bridge) systems I saw poor sustained bandwidth between sockets, but never came up with a good explanation.   These values look more latency-limited than bandwidth-limited, but more data is needed.

John D. McCalpin, PhD "Dr. Bandwidth"

Hi John,

Thank you very much for a detailed explanation. It is really very very helpful. I appreciate it a lot. 

The snippet of remote and local accesses I showed was just for a sample. My real access from a workload is as below. It gives me a bandwidth of around 10GB/sec for local access and 20GB/sec for remote access, when 48 threads execute on all 4 sockets. The local accesses being around 25M, and remote accesses being 50M. The theoretical QPI bandwidth is 32 GB/s.

Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses
   0   1.30        518 M      400 M       478 K              1488 K
   1   1.57        600 M      382 M       549 K              1732 K
   2   1.36        444 M      326 M       956 K               737 K
   3   1.32        508 M      384 M       473 K              1460 K
   4   1.39        461 M      331 M       944 K               815 K
   5   1.30        487 M      373 M       470 K              1381 K
   6   1.13        342 M      303 M       458 K               847 K
   7   1.27        473 M      374 M       387 K              1412 K
   8   1.28        378 M      295 M       847 K               592 K
   9   1.48        558 M      376 M       998 K              1124 K
  10   1.47        542 M      370 M       918 K              1143 K

 

Remote memory bandwidth on the Xeon E5-46xx (v1 "Sandy Bridge") is discussed in a forum thread at

https://software.intel.com/en-us/forums/topic/383121

I did not run cases with combinations of remote and local accesses, but I did see that remote bandwidths were in the range of ~4.4 GB/s for data on the nearest neighbor chip and ~3.8 GB/s for data on the chip in the opposite corner of the square topology.

Your results of 20 GB/s for remote accesses corresponds to ~5 GB/s per socket, which is pretty close to the ~4.4 GB/s socket-to-socket bandwidth that I saw.  This suggests that you are running into whatever "feature" is limiting sustained socket-to-socket bandwidth across the QPI interfaces, but as far as I know Intel has provided no explanation of why the sustained data bandwidth across QPI is such a low fraction of the theoretical peak data bandwidth.  Even the 2-socket Xeon E5 v1 boxes have relatively low QPI bandwidth efficiency, but this is much improved in Xeon E5 v3 when "home snoop" is enabled.

Unfortunately since I don't understand what is limiting sustained QPI data transfer bandwidth, I don't know how to look for a "signature" of this limiter using various uncore performance counters.  My guess is that there are not enough buffers for some class of transaction, but I have not been able to find anything specific for either the 2-socket or 4-socket boxes.

John D. McCalpin, PhD "Dr. Bandwidth"

Quick follow-up on my note.

Using version 2.3 of the "Intel Memory Latency Checker", I ran a set of tests on a Xeon E5-4650 system to see how the remote memory access types influenced the remote memory bandwidth.

Access Type                    Local GB/s       1 hop Avg GB/s      2 hop Avg GB/s
----------------------------------------------------------------------------------
All Reads                        ~29.0              ~4.4                ~3.8
2 Reads + 1 Write                ~28.8              ~6.0                ~5.1
3 Reads + 1 Write                ~28.6              ~5.9                ~5.1
1 Read + 1 Write                 ~29.9              ~5.5                ~4.3
----------------------------------------------------------------------------------
2 Reads + 1 streaming Write      ~21.4              ~4.4                ~3.8
1 Read + 1 streaming Write       ~19.2              ~4.4                ~3.7
----------------------------------------------------------------------------------

This suggests that an aggregate of 20 GB/s of remote bandwidth across four sockets (5 GB/s per socket) is right in the middle of the range of expected sustainable values.

This system can sustain much higher bandwidths from local memory, so any modifications to increase local accesses should help performance.

The Xeon E5-46xx systems provide the ability to configure a very high capacity memory, but I suspect that you would need to go to the Xeon E7 systems to get both very high capacity and high remote bandwidth.  (I have not tested this, since I don't have a recent Xeon E7 for testing, but they have more QPI links, so I expect the aggregate remote BW to be higher.)

John D. McCalpin, PhD "Dr. Bandwidth"

On a Intel(R) Xeon(R) CPU E7-4890 v2, I was measuring:

Intel(R) Memory Latency Checker - v2.3

Command line parameters: --bandwidth_matrix

Using buffer size of 30.000MB
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using Read-only traffic type
        Memory node
 Socket      0       1       2       3
     0  35823.2 12480.5 12649.8 12643.5
     1  12455.3 35914.9 12661.3 12649.4
     2  12434.5 12629.5 35917.5 12657.9
     3  12441.2 12649.1 12658.5 35874.3

As always, your mileage might vary with DIMM type and population.

Thanks, Thomas!

The data I pulled was from one of the "largemem" nodes on the TACC Stampede system.  These are loaded with 1 TiB of RAM composed of 32 quad-rank 1.35V 32 GiB DIMMs.   Putting 2 quad-rank DIMMs on a channel is definitely going to reduce the DRAM channel frequency, so this is not a maximum bandwidth configuration even for the Xeon E5-4600.  (The DRAM configuration will only have a significant impact on the local bandwidth -- the remote bandwidth should be approximately the same for any reasonable DRAM configuration.)

I am glad to see that my prediction about the increased remote bandwidth on the Xeon E7 family is correct!

John D. McCalpin, PhD "Dr. Bandwidth"

Thanks Thomas.

Thank you very much John for the details. I am citing here a paper, the latest state of the art database system, Hyper's, NUMA aware parallelism.  http://www-db.in.tum.de/~leis/papers/morsels.pdf

They use two 4 socket machines, one of which is, a 4-socket Nehalem EX (Intel Xeon X7560 at 2.3GHz). In Table 1 in the paper above they quote a peak local bandwidth of 82.6 GB/s for the database analytical workload, TPC-H benchmark, Query 1, where theoretical max bandwidth is 100GB/s.

I tried looking at the processor specs to check theoretical bandwidth, but the processor is discontinued. In any case, as per your comments, I think 100GB/s is then the cumulative bandwidth? which comes to around 25GB/s per socket ....because otherwise as per your and thomas's 

This paper also gives some details of % data accessed over QPI etc. for the entire 22 analytical queries in TPC-H benchmark. 

Thought of bringing it to your notice in case you find some thing interesting in their experiments. 

 

Yes, you are right. If you divide the overall traffic by 4, you get approximately the memory traffic that you .can get on a single socket. (as measured by MLC).

Intel Xeon X7560 processor ("Nehalem-EX") was followed by Intel Xeon E7 processors ("Westmere-EX"), which was followed by Intel Xeon E7 v2 processors ("Ivybridge-EX"). The latter are the ones that I was running MLC on.

Thank you Thomas for the quick reply. For the processor that I have Intel® Xeon® Processor E5-4657L v2 

The Intel ark page states following as the bandwidth. I was wondering if the earlier processor had only 25GB/sec per socket bandwidth,

then the processor I have, has 60GB/sec per socket bandwidth and 16GB/sec per socket QPI bandwidth. Is this correct? 

Max Memory Bandwidth
59.7 GB/s

 

Intel® QPI Speed
8 GT/s

登陆并发表评论。