I'm using an InterlockedCompareExchange to set a variable to my id (something like "while(0 != InterlockedCompareExchange(&var, myId, 0)) ::Sleep(100);" )
I would appreciate your point of view where I might did wrong using OpenMP. I parallelized this code pretty straight forward - yet even with single thread (i.e., call omp_set_num_threads(1)) I get wrong results.
I have checked with Intel Inspector, and I do not have a race condition, yet the Inspector tool indicated (as a warning) that a thread might approach other thread stack (I have this warning in other code I have, and it runs well with OpenMP). I'm pretty sure this is not relate to the problem.
I have been looking at task depend in OpenMP 4.0 but it looks like it is too limited for what I want to do.
To do what I want it would need to take a vector subscript in the array section in the depend clause.
My code would look something like ths:
Xeon Phi has 60 cores and 4 threads per core. I am writing an experiment that will have 1 master thread on each core, and each of these will spawn 4 slave threads.
Looking at the manual https://software.intel.com/en-us/node/512835 it seems that I want to set the envars:
I'm getting some pretty unusual results from using OpenMP on a fractional differential equations code written in fortran. No matter where I use OpenMP in the code, whether it be on an intilization loop or on a computational loop, I get a slowdown across the entire code. I can put OpenMP in one loop and it will slow down an unrelated one (timed seperately)! The code is a bit unusual, as it initalizes arrays starting at 0 (and some even negative). For example,
I am new to this forum. I want to implement parallel crawling on "Intel Xeon Phi Coprocessors" as for my project. Before buying equipment, installing software and start learning about this platform I want to know that whether it is possible to somehow connect to Network and get web URLs in parallel using this technology? (I don't want to create cluster of CPUs to do. I want to do it using single card).
change other MPI environment variables, particularly any that would tune MPI for the MIC system architecture?
As a side question, has anyone written a Tuning and Tweaking guide for IMPI for Phi? For example, what I_MPI variables could one use to help tune an app targeting 480 ranks across 8 Phis?
Everyone these days has to address multi-core issues, or vertical scaling, at least on the server-side of things. And there does not seem to be a general approach, so we end up re-architecting our applications every time we add cores. At the same time, the availability of many-core processors seems to be constrained by the lack of a reasonable software technology to make good use of them.
I was searching for a zlib-compatible compressor but faster, and came cross the paper describing igzip --
High Performance DEFLATE Compression on Intel Architecture Processors
igzip looks like exactly (!) what I am looking for. Compatible with zlib, but faster.
However, the downloadable source was for Linux. I need it for a VS10 C++ project. I have successfully (I think) compiled and assembled the desired modules (common, crc, crc_utils, hufftables, hufftables_c.cpp, igzip0c_body, igzip0c_finish, init_stream) into a .lib.
I am curious as to the differences between OpenCL and Intel Cilk Plus. They are both parallel programming paradigms that are receiving wide recognition but technically speaking is one better than the other or are they simply different. Also what yardstick do I use when choosing between the two when solving an embarrassingly parallel problem. Please i need answers.