I wrote a simple ping-pong program in F# that uses Intel MPI. The measured latencies are great (around 10us for the smallest messages) but I need to transfer this functionality over to a production system that is a large multi-threaded F# program and I'm having great difficulty doing so.I learned that I need to use the impimt.dll library instead of the usual impi.dll one and that I must initialize MPI using MPI_Init_thread instead of the usual MPI_Init. This works but the performance is literally 100,000s times worse. I'm seeing four second latencies!Is this to be expected?If so, how should I use Intel MPI for my latency-critical multithreaded program? The best idea I have come up with so far is to implement a token ring using my ping-pong code, sending messages back and forth that may or may not contain data. This seems hugely wasteful but I cannot see any other way to make it work.
For more complete information about compiler optimizations, see our Optimization Notice.