Unix*

VTune multithreading on multicore

Hi guys,

I am trying to run 8 different threads on 8 different cores. So my CPU time is greater than my elapsed time. One of the threads is taking a very long time when compared to the other 7. These 7 threads are taking up almost the same time with a very minute difference. I have attached a screenshot of this. Can someone please tell me why this one particular thread is taking this extra time? 

Thanks!

Achieving peak on Xeon Phi

Hi,

I am on a corei7 quad core machine with ASUS P9X79WS motherboard and Xeon Phi 3120A card installed.

Operating system is RHEL 6.4 with mpss 3.1 for phi and parallel_sutdio_2013 SP1 installed.

Just for detail, the phi card has 57 cores, with capability of about 1003 GFlops for double precision.

I am seeing some performance issues that I don't understand.

When I time MKL's parallel DGEMM on phi card, it is getting 300GFlops, which is about 30% of peak.

Note that I am doing native execution.

Closed link for libraries using TBB offloading?

Hot on the heels of my standalone success (Thanks Kevin) I tried to integrate a test of TBB offloading into our modular framework. In a nutshell, the user writes modules in C++ which get compiled into shared libraries, which are loaded at runtime into the framework when a particular module is required. For obvious reasons (the module writer is not able generally to re-link the exec) we do closed-link ("-Wl,--no-undefined" on linux) when making the shared library representing the module.

New article: Resource Guide for People Investigating the Intel® Xeon Phi™ Coprocessor

Resource Guide for People Investigating the Intel® Xeon Phi™ Coprocessor

This article identifies resources for anyone investigating the value to their organization of the Intel® Xeon Phi™ coprocessor, which is based on the Intel® Many Integrated Core (Intel® MIC) architecture. It is one of three such guides, each for people in one of the following specific roles:

Problems when trying to run symmetric MPI jobs with MPSS 3.2, MLNX HCA and ofed-3.5.1-mic-beta1

Hi,

We have been struggling to get symmetric MPI jobs running on our cluster. MPI works fine on host to host and also mic native MPI works between compute nodes. Intra node host <-> mic communication also works but internode just hangs. It won't get "PMI response: cmd=barrier_out". Is it supposed to work at all with this HW/SW combination?

Centos 6.5, MPSS 3.2, Slurm 2.6.7 and OFED 3.5.1.MIC.beta1. Mellanox ConnectX3 HCA and mpxyd is running.

I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u

I_MPI_FABRICS=shm:dapl

14.0.2 can't do fallback for offload with no mics?

   

Hi,

I've seen a few posts on this kind of subject in the past, but nothing recently that seems to be relevant (unless I missed it). I tried to run what I thought was a basic TBB / offloading test based on one of the examples (see below). It works on a machine with attached and available phis, but not otherwise.

Teaser: offload pragma:

#pragma offload target(mic) in(size) in(data:length(size)), out(result)

Compiled (clean) with:

. /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64

H.264 encoding

Hi,

I want to encode a YUV420 stream to h.264 stream. As it is live streaming to rtmp server, I am not intended to use B frames. I want my encoder to only give P frames.

Can anyone tell me what are the parameter  settings for the Main profile with level IDC 4.1.?

Also I want to know how to get decoding time stamp for the encoded frame.

Thanks and regards.

Charan

unable to ssh to mic1

hello 

i am have a system with two mic cards and i am trying ssh into them from xeon host,

i can do ssh mic0 and access mic0 and run native application,

but i cannot access mic1 via ssh,

when i do ssh mic1 the screen just hangs,

when i do miccheck it shows both mic0 and mic1 online  and also micinfo gives proper info for both the cards

I also started the MPSS service on both cards and cleared firewall

thank you for the help in advance

 

Suscribirse a Unix*