How to disassemble with objdump for programs running on mic

Hi guys,

I've tried the objdump on x86 and /usr/linux-k1om-4.7/bin/x86_64-k1om-linux-objdump to disassemble a program for Xeon Phi with/without the option --archietecture=l1om. However, the specific instructions for k1om can not be resolved.

I also cross-compiled a binutils tool train for Xeon Phi, and ran the objdump on mic(with or w/o --archietecture=l1om), but the result was the same.

Can anyone help, please?


Customize the uOS


for my masters thesis I am supposed to make the MIC runable with a Kernel Version of 3.13 or above. This also includes to use the Kernel mic_card Module on the MIC so that the Linux Kernel mic_host and mic_card cann communicate.

I already looked into porting the Intel Kmod Module, but this is not the goal.

Core pinning with pthread

Hi all, I did some follow-up on my last topic 

Long story short, I'm trying to emulate SIMD behavior in MIC environment,

and I actually decided to implement the offloaded segment in native code without OpenMP pragmas,

which means I'm implementing thread pinning to individual cores with pthread_attr_setaffinity_np.

The logic is all there from start to finish. Here's what I did:

MPI with hpcg application killed signal 9

I bought Intel Parallel Studio XE 2015 Cluster version for running MPI framework.

After make for co-processor only mode using -mmic option, run hpcg applicaiton.

Command that I used : mpiexec.hydra -iface mic0 -host mic0 -n 200 ./xhpcg

mic0 uses network file system sharing the same directory which I run.

But just I returned APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)

So, I used -genv I_MPI_HYDRA_DEBUG option.

TSX results - please explain

I am using Roman Dementiev's code as a base and modifying it to determine if TSX operations are behaving according to expectations.

I am counting the number of times that xbegin() returns successful, the number of times it aborts and the number of times that fallback lock is used.

Why Xeon Phi always got bad efficacy?

I tried to run a for loop 1,000,000,000 times on Xeon E5 and Xeon Phi, and measurement time to compare their efficacy, I'm so surprise I got the following result:

On E5 (1 Thread): 41.563 Sec
On E5 (24 Threads): 22.788 Sec
Offload on Xeon Phi (240 Threads): 45.649 Sec

Why I got the bad efficacy on Xeon Phi? I do nothing on the for loop. If my Xeon Phi coprocessor didn't had any problem, what work for Xeon Phi can get good efficacy? Must be vectorization? if not vectorization, can I do any thing on Xeon Phi use its threads to help me something?

Intel® Xeon Phi™ Coprocessor enabled applications on Stampede

This is intended to be an informational post -- I received the following announcement as I am on a TACC mailing list, wanted to pass the word around about the fact that several applications are readily available from their Stampede cluster -- more below (what follows is their announcement in entirety)


New Xeon Phi Applications on Stampede

Slowdown with OpenMP

I'm getting some pretty unusual results from using OpenMP on a fractional differential equations code written in fortran. No matter where I use OpenMP in the code, whether it be on an intilization loop or on a computational loop, I get a slowdown across the entire code. I can put OpenMP in one loop and it will slow down an unrelated one (timed seperately)! The code is a bit unusual, as it initalizes arrays starting at 0 (and some even negative). For example,

Suscribirse a Servidor