free binutils for MIC

free binutils for MIC


I am developing my XEON PHI compiler, base on my own parallel language.

Are there free binutils that I could bundle my compiler with?



17 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I'm not entirely sure why you want to distribute binutils with a compiler; compilers don't normally come with their own copies of binutils after all. (Gcc does not install ld or objdump...)

Bearing in mind that I am not a lawyer, so you should check the licenses with your lawyer...  the k1om binutils that run on the host and handle KNC binaries are certainly generated from the GPLed sources, and we provide the modified sources, so I imagine that you can distribute them if you want to. (I.e. the ones in /usr/linux-k1om-4.7/x86_64-k1om-linux/bin). You'd likely need also to distribute header files and libraries if you're intending to enable compilation for KNC on arbitrary machines, though, so it might just be simpler to tell people to install the right package from Intel (which would also ensure that it's up to date and you don't have to keep updating your package).

If there're free binutils that available for Windows & Linux, it's not needed to bundle anything.

I have not checked the details of MIC development tools before.


Refer to Intel® Manycore Platform Software Stack (MPSS) for details on MPSS (packages, licenses, etc.) and a link to details about accessing the Beta version of Windows MPSS.

can emit assembly now.

 vector float [4] a;  float c;


vbroadcastf32x4 zmm0, [ rbp - 0x14 ]{4to8}

vaddps  zmm0, [ rbp - 0x10 ]

Is there low level api available that activates cores to run?


De Zhi T. wrote:

Is there low level api available that activates cores to run?

The simplest way to handle threads is to use pthreads. Whether you think that is a "low level API" is a matter of opinion :-)

If you want to go below ptrheads you could, of course us the clone sysetm call with a suitable set of arguments. I woudn't recomment it, though... The parallel runtimes with which I am familiar (OpenMP, TBB) use pthreads for thread creation.

any good tutorials for the pthreads?

I am designing a language with directives builtin for parallelism.

There are doubtless many pthread tutorials available on the web. I don't have any specific recommendations.

Google is your friend...

p.s. I mentioned pthreads because you asked for a "low level" interface to threads, but it may actually be much more productive to use a high level interface for parallelism such as TBB ( If you could use TBB that would save you a huge amount of effort. 

Without knowing more about the design of your language it's impossible to know whether TBB would suit your needs, but if your language is high level (and avoids the temptation to expose threads to the user), building your runtime on TBB could save you a lot of time.

I like the bird...

TBB is C++ template, seems not suitable.

I am going to implement something turns directives into raw thread API. such as MPI function calls.

The fact that TBB is implemented using C++ via template calls does not prevent you from using it to create a set of interfaces that you can then call from your generated code. (We use TBB as the parallel under-pinning of our OpenCL implementation on Xeon Phi and Xeon, for instance).

MPI has nothing to do with threads... (did you mean pthreads?).

The problem with pthreads is that if you use them in the obvious way, you will very likley find that you need to reinvent a lot of code to get decent performance (pthread creation is very expensive, for instance, so you need to create your own persistent thread pool to manage the pthreads). TBB will do all of that for you, while providing a clean interface to executing chunks of work in parallel and load balancing them on the available hardware.

Yes, I means something like pthreads. If TBB works fine, it's nice to emit them.

I need more information and sample codes to study now.


The TBB web site has a lot of code and samples, and google will no doubt find you more.

The critical thing to realise is that what you want to do is pass parallel tasks to TBB, and let it worry about threads. So your code-generation will generate functions that should be called in parallel, and your runtime will invoke TBB to do that.

Yes, the compilation comprises two stages.

First, eliminates directives, turns them into parallel functions.

Second, hybird compiling the TBB functions. with headers & libraries.

That 's not exactly what I was imgining. The flow I expected (since you're generating assembler code) was something like

  1. Compile source code to assembler
  2. Assemble to object file
  3. Invoke the linker to link object files with your runtime library (that was compiled previously and happens to be implemented with TBB)

So your runtime would have standard C ABI interfaces that allow you to pass closures that point at the code generated by your compiler and the relevant data. (In C this would likely look like a void (*)(void *) pointer and a data pointer). Then inside the runtime you'd invoke that code in the context of a TBB task.

There's no need to be compiling the runtime every time, or output code that includes TBB and gets compiled with the code generated by your compiler.

It's not sophisticated enough to invoke TBB in runtime at this moment for me.

I prefer to emit assembly code with TBB calls inside. So I can examine the code sequence.

I am back for mic dev.

a simple example: for (i = 0 ; i < 16 ; i +=1)      a[ i ] += 123;

each core executes the statement         a[ i ] += 123; that is         a[ core_idx ] += 123;

void func_000 ( core_idx ) {         a[ core_idx ] += 123; }


Leave a Comment

Please sign in to add a comment. Not a member? Join today