I am working on the intel phi 511op (60c, 1.053GHz, 225W) in OFFLOAD mode. I wrote a simple additional kernel using omp, and I am maximing out at ~100 GB/s. However, the STREAM memory bandwidth test claims ~150 GB/s. The only difference is the use of NATIVE mode. The test is run using 59 threads. Am I missing something or is the bandwidth significantly less in offload mode? I use 64 byte alignment and I have tried the same compile flags as listed in the intel website which describes the STREAM test. I can provide the code but it's really straight forward. Any suggestions would be appreciated!
maximum offload bandwidth only 100 GB/s?