@Patrick The article for OpenMP was published.
I wonder if there is some way to make MKL routine working as #pragma omp for schedule(dynamic) in OpenMP.
as I'm using Sparse BLAS doing SpMV, and for some reason I deside to reorder the rows of a sparse matrix by length. This results in the workload for internal threads in MKL being unbalanced, since SpMV in MKL is parallelized by row, and probably static scheduled.
if there's some function controlling the schedule strategy of internal threads that i don't know, it would be great.
Can anybody help me to understand the following situation:
I am loking for ways to detect hyper threading in c++ program on Linux. I don't intent to read /proc/cpuinfo file.
I have tried cpuid and then edx value, somehow, it returns the same value on machines with HT and without HT.
It seems some cpucounter program on web is not working for intel64.
I am using 64 bit Linux machine with :
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
Distributed Reader-Writer Mutex 1.0
I'm a student doing some research on Hyper-threading recently. I'm a little confused about the feature - L1 Data Cache Context Mode.
In the architecture optimization manual, http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia...
It was described that L1 cache can operate in two modes:
The first level cache can operate in two modes depending on a context-ID bit:
Suppose I have a FORTRAN file that doesn't have any OpenMP directives, but as a routine that will be called from within OpenMP parallel region.
If is there any difference in compiling the following file with one of these options:
If there is a difference, is there any downside to using these in combination (it simplifies some Makefiles).
i know how to put the code to use the gpu capabilities, but still i don't figure how to call the cuda compiler from the intel c/c++ compiler, my problem is this, once i have the program, it's well or wrong coded, how to compile it?
i'm using intel c/c++ composer studio under windows 7 on a cuda cappable computer of corse
I have an multi-threaded application in which runs 20% slower on my MacBook Pro with two threads than one. I checked for blocking conditions and found that this is not the problem. The application is huge and accesses a huge in memory database so the cache doesn't have that much effect on performance. So I figure the problem is that this machine does not have enough memory bandwidth to support two threads that access a lot of memory.