after parallelism, time has increased... plz help, URGENT!!!

after parallelism, time has increased... plz help, URGENT!!!

it's concerning my senior project, and i have submit it in 2 days.
i've created a new project, then enabled the support of openMP from the property page->c/c++->languages->openMP support (Generate Parallel Code (/openmp, equiv. to /Qopenmp))

i have a core 2 duo intel centrino.
i'm using Microsoft visual studio 2008.
OS: windows 7 professional.

this is a simple test program just to understand the major stuff in the parallel studio:

using namespace std;

int main()
omp_set_num_threads(2); //creat 2 threads.

LARGE_INTEGER frequency; // ticks per second //for the timer
LARGE_INTEGER t1, t2; // ticks // for the timer
double elapsedTime; //for the timer
int A[1000],B[1000],z=1;

for(int i=0;i<1000;i++)
//start timer

#pragma omp parallel for
for(int i=0;i<1000;i++)



// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
cout << elapsedTime << " ms.\\n";

return 0;

then, right click on the solution->set startup solutions
then,right click on the solution->intel parallel composer->use intel C++...

this is the result of the program timer: 0.287799ms
this is the result of the parallel amplifier:
-Elapsed time:0.283s
-CPU time: 0.202s
-Unused CPU time: 0.274s
-core count: 2
-Threads Created: 2

when i run this program without parallelism, that means when i made these 2 lines as comment:
omp_set_num_threads(2);-> //omp_set_num_threads(2);
#pragma omp parallel for ->//#pragma omp parallel for

this is the result of the program timer: 0.0069175ms
this is the
result of the parallel amplifier:
-CPU time:
-Unused CPU time: 0.243s
-core count: 2
Created: 1

it's so weird... i guess i have to modify something in my compiler? plz help

i have another couple of questions:

1-can i parallelize a loop to read from a file?
for(int i=0,i<100,i++)

can i ?????

2-check the below image, this is a reading from my parallel amplifier.
when i double click on a function, it should take me to the line where it's spending the time written, but not.
the arrows are the result from double clicking every function. these results are nothing, it asks me for a location on my laptop... as for the remaining one, where no arrow is coming out, double clicking on it doesn't do anything.
what does all this means?

thank's in advance :)
sorry for bothering :)

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


I am no expert on parallelism either ut there are a couple of thigs i noticed.

  • Don't specifie the number of threads yourself. Let omp handle this for
  • Your loop limit is 1000. That is a bit to low to notice any improvements.
  • The result of you paralellized for loop is going to be incorrect. Please use private variables in shared memory model.
  • ntdll.dll kernel32.dll are Windows system files. you can not debug or inspect those. You can ignore them pretty much. That is exactly why the collum tells you from what module a function call is coming from.


10X for the reply vrenert :)

1-specifying the number of threads is not a big deal, i have a 2 cores cprocessors and i've set the number of threads to 2. the OMP will set the number of threads according tho the number of cores (in my case it's 2)

2-this is a test program, in the output, u have numbers to compare(timer). it doesn't matter if the loop hass 1000 iterations or 10, because only the differance between the numbers will be bigger.

3-no need for private memory in such a program, because i actually want the program to over-write the value of A[i], and as for i it's set to private by default.

4-i don't know anything bout this matter...

again, 10X for ur help, i really appreciate it, but it's not the answer that i was expecting though!!
did u try my program? did parallelizing it reduce the time? plz try it and lemme know :)

First of all, there's not a lot of work being done in the parallel for loop (mostly memory fetches to pull the operands and put the result). That in combination with the small loop count may suggest a reason why you're not seeing anything significant in the hot spot analysis.

What is the purpose of incrementing z in the parallel for? It doesn't serve any function in the loop, but is a point of contention between the threads, each trying to take ownership of its cache line as they increment it.

10X to all those who replied :)
i will post a new thread, containing a much better program, vectors multiplication!
check it plz :)
thank you!

Leave a Comment

Please sign in to add a comment. Not a member? Join today