PCIe transfers vs core-to-core communication

PCIe transfers vs core-to-core communication

Hi all,

I have to get data from a card in a PCIe slot to all the cores in my (2 socket sandybridge) system. I am wondering if it would be better to have the card communicate the data directly to all the cores or have it communicate the data only to one core and then have that core do core-to-core communication to forward that data to the remaining cores?

Doing it the firrst way involves several more PCI transactions and doing it the second way relies on the performance of a single-producer-multiple-consumer queue.

Any thoughts on which might be faster?

Thanks!

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Hussam Mousa (Intel)'s picture

Typically, the core to core latency and bandwidth is orders of magnitude faster than any off chip communication.

In fact if you design your code so that the designated thread which will read the data from the PCI-E can fit in the LLC cache, you can achieve fairly fast data transfer.

However, why isn't a single shared buffer appropriate for your need?

Login to leave a comment.