Reg: How to increase Data transfer speed between host and MIC

Reg: How to increase Data transfer speed between host and MIC

Hi,

I was trying to offload some portion of my code and I set OFFLOAD_REPORT =3. The report is like this.

[Offload] [HOST]  [Tag 172] [CPU Time]        0.120382(seconds)
[Offload] [MIC 0] [Tag 172] [CPU->MIC Data]   58720296 (bytes)
[Offload] [MIC 0] [Tag 172] [MIC Time]        0.032349(seconds)
[Offload] [MIC 0] [Tag 172] [MIC->CPU Data]   58720256 (bytes)

[Offload] [MIC 0] [Tag 170] [State]   Gather copyout data
[Offload] [MIC 0] [Tag 170] [State]   MIC->CPU copyout data   0
[Offload] [HOST]  [Tag 170] [State]   Scatter copyout data
[Offload] [HOST]  [Tag 170] [CPU Time]        0.171873(seconds)
[Offload] [MIC 0] [Tag 170] [CPU->MIC Data]   75579486 (bytes)
[Offload] [MIC 0] [Tag 170] [MIC Time]        0.134238(seconds)
[Offload] [MIC 0] [Tag 170] [MIC->CPU Data]   28 (bytes)

For same amount of data MIC-CPU data transfer speed is much higher(40 times). PCI express speed is 8GB/s. But these timings are no where near to my expectation. Can some one tell me how can i increase data transfer speed.

Does this MIC Time include computation on MIC as well or just data transfer time?

Also, I want to know is there any concept like pinned memory (like in GPU) in case of xeon phi.

 

Thanks

sivaramakrishna 

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Sivaramakrishna,

Using the data from your trace, I can’t reproduce your results.

For the first offload, Tag 172, I calculate a bandwidth of 1.33GB/s.

[total bytes]/([CPU-Time]-[MIC-Time])
(.587GB+.587GB)/(0.120s – 0.032s)

For the 2nd, Tag 170, I get 2.01GB/s.

.756GB/(0.172s-0.134s)

What formula are you using? And I get a MIC-CPU / CPU-MIC ratio of 2.01/1.33 or 1.5x, not 40x.

Regards
---
Taylor

PS In the first bandwidth measurement, did you perform a dummy offload (meaning an offload that does nothing) before your bandwidth test? The first offload includes overhead associated with setting up thread pools, etc. This can skew your results, making the transfer times appear lower than they actually are.

Leave a Comment

Please sign in to add a comment. Not a member? Join today