Statistics About QPI

Statistics About QPI

Аватар пользователя s-ye

Hi there, I am working on a project related to QPI. We need to collect some statistics. There are two CPUs (CPU A and B) connecting to each other with a QPI. Each CPU has direct accesses to a RAM, a SSD and a Niantic. It is possible that CPU A wants to access RAM B which connects to CPU B. The data path is: CPU A => CPU B (through QPI) => RAM B. The statistics we need is: Time{CPU A access RAM B} / Time{CPU A access RAM A}. There are some other statistics that we are interested for this topology setup, but basicall the above example shows what we need. We want to compare the resource access over QPI with the direct access. I am just wondering did Intel tested these performance before? Are their any results related to this that we can utilize? Thanks for you time and help! Regards, Ye

3 posts / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Perry Taylor

Yes it is possible that CPU A will need to access memory connected to CPU B (remote memory access). The performance delta between local memory access and remote memory access is benchmarked at Intel for both latency and bandwidth and since usage models vary, (some) BIOS versions have options to modify the resource allocations for QPI. If you are configured for "non-numa" where 50% of all accesses will be to remote memory then you may benefit in changing the QPI resources for more remote credits.
Have you tried PTU? If may have what you need for basic statistics on the QPI link between the CPUs.

http://software.intel.com/en-us/articles/intel-performance-tuning-utility/

http://software.intel.com/en-us/articles/optimizing-applications-for-numa/

Аватар пользователя Thomas Willhalm (Intel)

Ye,

The latency depends a lot on processor type, platform and memory. Therefore, it is difficult to provide general numbers. However, it is fairly simple to measure the latencies on your system:

LMbench (available here) contains a microbenchmark "lat_mem_rd"to measure the memory latency. With the tool "numactl" (part of libnuma), you can use it to measure the latency of the local and remote memory on your system:

numactl --cpunodebind=0 --membind=0 ./lat_mem_rd -t 1024

numactl --cpunodebind=0 --membind=1 ./lat_mem_rd -t 1024

Kind regards
Thomas

Зарегистрируйтесь, чтобы оставить комментарий.