Using Intel MPI from .NET

Using Intel MPI from .NET

I wrote a simple ping-pong program in F# that uses Intel MPI. The measured latencies are great (around 10us for the smallest messages) but I need to transfer this functionality over to a production system that is a large multi-threaded F# program and I'm having great difficulty doing so.I learned that I need to use the impimt.dll library instead of the usual impi.dll one and that I must initialize MPI using MPI_Init_thread instead of the usual MPI_Init. This works but the performance is literally 100,000s times worse. I'm seeing four second latencies!Is this to be expected?If so, how should I use Intel MPI for my latency-critical multithreaded program? The best idea I have come up with so far is to implement a token ring using my ping-pong code, sending messages back and forth that may or may not contain data. This seems hugely wasteful but I cannot see any other way to make it work.

5 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Jon,

Of cause this is unexpected behavior. It's very difficult to identify the reason of such perfomance degradation. We will try to reproduce the issue on our servers and understand the reason (if it's reproducable).
Have you tied to use C# (or C++) instead of F#? Does F# use its own mechanism (library) to create threads?

Regards!
Dmitry

Hi Dmitry,
I have gathered some more information. The problem only manifests when using the MPI_THREAD_MULTIPLE setting (to allow arbitrarily multi-threaded programs) and not when the program only makes calls from a single thread.
However, I have worked around the problem by creating an MPI thread, initializing with MPI_THREAD_SERIALIZED and going into an infinite loop sending messages back and forth between the two machines as fast as possible, feeding sends from a concurrent queue (sending dummy data is no real data is available) and posting received data back to my application. This way any thread can send data by enqueuing it and incoming messages can be received in any way. The performance is great in my test code, I just have to graft the new code into my production system now...
The F# standard library creates threads indirectly but only using ordinary .NET calls. I was using asynchronous agents to send and receive data over MPI.Here's the code I was using:
let send = let agent = new MailboxProcessor<_>(fun inbox -> async { initialize() while true do let! (buf : byte []) = inbox.Receive() if buf.Length >= maxSize then printfn "WARNING: %d-byte message is too long for MPI" buf.Length else let nativeArray = NativeInterop.PinnedArray.of_array buf let dst = if rank() = 0 then 1 else 0 Internal.send(nativeArray.Ptr, buf.Length, Internal.MPI_BYTE, dst, 0, Internal.MPI_COMM_WORLD) |> check }) agent.Start() agent.Postlet receive = let buf = Array.create maxSize 0uy let nativeBuf = NativeInterop.PinnedArray.of_array buf let status = [|Internal.MPI_Status()|] let nativeStatus = NativeInterop.PinnedArray.of_array status let queue = System.Collections.Concurrent.ConcurrentQueue<_>() async { initialize() while true do Internal.recv(nativeBuf.Ptr, buf.Length, Internal.MPI_BYTE, Internal.MPI_ANY_SOURCE, Internal.MPI_ANY_TAG, Internal.MPI_COMM_WORLD, nativeStatus.Ptr) |> check let buf = Array.sub buf 0 status.[0].count queue.Enqueue buf } |> Async.Start queueMy "send" is an asynchronous agent that serializes messages posted to it from any thread and sends them over MPI. My "receive" is an infinite loop that sits in a thread on the thread pool waiting for any message to be received. Note that this means there will be a call to "MPI_Recv" blocking one thread while another thread is calling "MPI_Send".Cheers,Jon.

Hi Jon,

We've checked performance of mt version of the Intel MPI Library and it's just a bit (<10% for some message sizes) slower than single thread version.
You can check performance by yourself starting IMB from the installation package.
If IMB shows comparable performance of 2 libraries than probably means that's something wrong in F#'s MPI wrappers.

Regards!
Dmitry

Hi Jon,

Could you please submit a tracker at premier.intel.com
In this case I'll be able to upload an engineering version of the Intel MPI Library to check the performance issue.

Regards!
Dmitry

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!