Threading on Intel® Parallel Architectures

Effeciently parallelizing the code in fortran


I am trying to parallelize a certain section of my code which is written in fortran. The code snippet looks as below:

do i=1,array(K)

j = K
... if conditions on K...
....write and reads on j...
... do lot of things ...
K = K+1

So I tried to parallelize using the below code.. which was obviously not as it should have been

do i=1,50
j = K
... if conditions on K...
....write and reads on j...
... do lot of things ...
K = K+1

Shared memory on Xeon


     Here is an observation I have. Can you help me explain it.

     Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.

An M/M/n queuing model simulation



An M/M/n queuing model simulation with Object Pascal and my Thread Pool Engine - version 1.02

You can download it from:

Read more bellow...

Author: Amine Moulay Ramdane


It's harder and sometimes impossible to get analytical results about waiting times and queue length for general interarrival and service distributions; so, it's important to be able to estimate these quantities by observing the results of simulation.

Pointers defined in modules and OpenMP

I am working with a program (which I did not write) which has a pointer to a derived type in a module;

module X
type mytype
    integer x, y, z
end type mytype
type (mytype), pointer :: p_mt
end module X 


This module is accessed in a subroutine;

subroutine Loop
use X
p_mt  => GoGetOne()
p_mt % x = 7.0

So far, so good. However, subroutine Loop is accessed from with a parallel loop in another subroutine;


2 CPUs vs num_threads

I have 2 xeon CPUs in the PC, each has 4 cores. However, I can only set num_threads to 4. If I set it to a number > 4, I get a message:

OMP: Error #136: Cannot create thread.
OMP: System error #8: Not enough storage is available to process this command.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.

Is it not possible to use all the cores in the system because they are distributed across 2 cpus or why is this happening?

(Compiler: Intel C++ 13.0
OS: Windows server 2008 R2)


Hi there,

I drop a piece of CPU-bounded code into the Linux Kernel with local interrupt disabled. The code is surrounded by RTM instructions. On average, the code commits successfully within around 100 tries. On abortion, the reason reported by PMU is RTM_RETIRED.ABORTED_MISC5  I wonder what would be the reason provided that the local interrupt has been disabled?

PS. The description of RTM_RETIRED.ABORTED_MISC5: none of the previous 4 categories (e.g. interrupt).

Thanks in advance.


Le Guan

MultiThreading with MKL library nonlinear least square solver

Hello everybody, 
I am using the intel solution for Nonlinear Least Squares Problem with Linear (Bound) Constraints

Question: what do I need to do to run the optimizer in parallel?


I have intel i3 processor on my laptop. though it has 2 cores it can run 4 threads at a time. When I see task manager I see programs with 11 threads , 40 threads. How are these threads scheduled? is it hardware implemented or managed by the host OS?

Assine o Threading on Intel® Parallel Architectures