5110P usage questions.

5110P usage questions.

I have  a dumb question on the Phi co-coprocessor. I've been passed the marketing material by our central business infrastructure team (who are looking to cut costs as usual).

We have a single threaded C++ process that perform double precision floating point calculations. Several of these run in parallel on a multi-core server and pick jobs form a queue. We run a fixed number of these processes per core on the server. Currently we're running out of cores and may need to buy more servers.

The Phi card has multiple cores. So a very naive interpretation of the marketing material is that a server with the card in will have extra cores to the server.  So we could run more single thread processes with the load spread across the cores on the server and the Phi card.

I don't believe that this is the case for a variety of reasons such as:
1. The work has to be offloaded to the co-porcessor programmatic. This does not just happen by magic.
2. The co-processor would not have access to the 100GB memory of the server, only it's own 8GB local memory. So if one process uses 4GB the card's memory could only support two processes rather than 30 or 40.

Hence my believe is that this card cannot be used to add additional cores to an existing servers. Though with some code changes it may be possible to use offload work to the co processor and run it in parallel. This is much easier to achieve using a phi rather than an GPU style api.

It would be very helpful if someone would confirm this is the case.
For my on interest I have two other questions:
1) is it possible to run multiple cards in one servers
2) If, for our sins, we are using Microsoft Visual Studio 2010 (for C++) does this have any implications using Phi cards.

Thanks in advance!

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The memory size of Intel(C) Xeon Phi(tm) is definitely a limit on the number of processes, even when running a single job on MPI.  It's usually necessary to run each job threaded parallel and vectorized, e.g. OpenMP or Cilk+, to get value for the platform, even when running multiple processes.

Typical server platforms designed for the purpose accommodate up to 4 coprocessor cards.  Several products are likely to support more than that.

You must recompile using an Intel compiler, which supports a high degree of interoperability with Microsoft compilers.  Windows host support has not yet been released.

Leave a Comment

Please sign in to add a comment. Not a member? Join today