I am currently porting an application to the Xeon Phi that does streaming processing of large data files. I started by profiling native runs of the application on the Phi using VTune, and found that performing native file I/O on the Phi was the bottleneck.
As a workaround, I plan to use the SCIF API to stream data to the card. I have done some benchmarking of SCIF RMA transfers on my cluster using the sample code provided in this post: https://software.intel.com/en-us/forums/topic/367865 (scif-test.cpp). I have found that the benchmarking results are both very good (~ 6GB/s) and very consistent across runs (images attached).
However, the real target system for my software is the socketed Knight's Landing architecture, and I have a couple of concerns about using SCIF on that platform:
(1) Is the interface for the SCIF API stable? Will it remain the same for Knight's Landing?
(2) Is it expected that native file I/O will be greatly improved in the Knight's Landing version (thus rendering my work with SCIF pointless)?
P.S. In case it may be useful to others, I have attached my own "Hello, World!" program based on the SCIF benchmarking code mentioned above. The main difference between the benchmarking program and the "Hello, World!" program is the way that the code is run on the Phi. Two instances of the benchmarking program must be run simultaneously on the Phi and the host CPU, whereas the "Hello, World!" just needs to be run on the host CPU. (The "Hello, World!" version spawns an task on the Phi using the asynchronous version of the #offload pragma.)