Problem in analysing with Parallel Amplifier

Problem in analysing with Parallel Amplifier

I am implementing H.264 codec software using multithreading.I am able to run that application successfully with Intel C++ compiler and got correct results,but while analysing Hotspots,concurrency check, i m having a problem in collecting the required data.Can anyone suggest me regarding the solution to this problem.
does multi threading in single core is different from multithreading in multicore?

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - coolsandyforyou
I am implementing H.264 codec software using multithreading.I am able to run that application successfully with Intel C++ compiler and got correct results,but while analysing Hotspots,concurrency check, i m having a problem in collecting the required data.Can anyone suggest me regarding the solution to this problem.
does multi threading in single core is different from multithreading in multicore?

i have implemented multithreading using windows.h (CreateThread()) commands not omp.h

Quoting - coolsandyforyou
I am implementing H.264 codec software using multithreading.I am able to run that application successfully with Intel C++ compiler and got correct results,but while analysing Hotspots,concurrency check, i m having a problem in collecting the required data.Can anyone suggest me regarding the solution to this problem.
does multi threading in single core is different from multithreading in multicore?

Would you please elaborate what exactly problem you faced with? May be some screen shots and explanations...

If you run your multithreaded application on the single core CPU, the concurrency level won't be higher than 1 anyway as your threads are being executed sequentially. This is the main difference to running on multicore CPU when maximum concurrency level could be equal up to number of available cores.

-Vladimir

Quoting - coolsandyforyou

i have implemented multithreading using windows.h (CreateThread()) commands not omp.h

Let me explain my exact problem.
My application willperform the following jobs..
1)Reads a YUV raw video file frame by frame

2)Performs motion estimation(a process that enables the compression of the video exploiting the temporal redundancy) between successive frames.This process involves searching the best match for a block of pixels in the previous frame.As multiple searches can be done simulteneously ihav used multithreading(Each thread calculates the best matches of all the blocks in some defined region).

3)performs motion compensation.

4)write these motion compensated frame values to the output video file.

Now My problem is,
The entire application is giving perfect results when u run with visual c++ or intel compiler(Multi threading is also working,i observed by giving some printf statements).Butwhen i tried to analyse this thing in ParallelComposer
the executionstops as soon as file reading is over,i.e and the motion estimationprocess is not gettin started...i cant see any error messages either but the execution simply stopped.when i close the command prompt window...it shows "Failure in collecting data".

My timing results with intel compiler with CORE 2 DUO processor:

for singlethread: time for motion estimation is----840 milli sec
for 2 threads:timefor motion estimation is ----875 milli sec (time increased why? is it using single core instead of two?)

Since the execution is struckingup with parallel composer i cant see any screenshots and all.....

Now i also want to know:

Does the structure of Parallel programing is same for both single core and multiple cores?

please suggest me,my entire project is to speed up the application.if i could do this job in less than 500msec (with core2 duo) then only i could go furthur.

also i got correct results while running ur sample programs (matrix multiplication)

Quoting - Vladimir Tsymbal (Intel)

Would you please elaborate what exactly problem you faced with? May be some screen shots and explanations...

If you run your multithreaded application on the single core CPU, the concurrency level won't be higher than 1 anyway as your threads are being executed sequentially. This is the main difference to running on multicore CPU when maximum concurrency level could be equal up to number of available cores.

give me ur email so that i can send u my files....

Quoting - coolsandyforyou

Let me explain my exact problem.
My application willperform the following jobs..
1)Reads a YUV raw video file frame by frame

2)Performs motion estimation(a process that enables the compression of the video exploiting the temporal redundancy) between successive frames.This process involves searching the best match for a block of pixels in the previous frame.As multiple searches can be done simulteneously ihav used multithreading(Each thread calculates the best matches of all the blocks in some defined region).

3)performs motion compensation.

4)write these motion compensated frame values to the output video file.

Now My problem is,
The entire application is giving perfect results when u run with visual c++ or intel compiler(Multi threading is also working,i observed by giving some printf statements).Butwhen i tried to analyse this thing in ParallelComposer
the executionstops as soon as file reading is over,i.e and the motion estimationprocess is not gettin started...i cant see any error messages either but the execution simply stopped.when i close the command prompt window...it shows "Failure in collecting data".

My timing results with intel compiler with CORE 2 DUO processor:

for singlethread: time for motion estimation is----840 milli sec
for 2 threads:timefor motion estimation is ----875 milli sec (time increased why? is it using single core instead of two?)

Since the execution is struckingup with parallel composer i cant see any screenshots and all.....

Now i also want to know:

Does the structure of Parallel programing is same for both single core and multiple cores?

please suggest me,my entire project is to speed up the application.if i could do this job in less than 500msec (with core2 duo) then only i could go furthur.

also i got correct results while running ur sample programs (matrix multiplication)

First of all I think there is missprinting here. You meant Parallel Amplifier, right?

WRT parallel programming, there is no difference in approaches in whether your programm is going to run on single core or multicore. Program parallel structure is the same (having in mind premier goal of performance improvement with moving to multicore).

Increasing run time of the application in the 2 threads mode might mean ineffective implementation of threaded code, that suffered from excessive syncronization or threads management.

I'll create a private thread for you in order to allow yousharing your project for investigetion of the problem with Amplifier.

-Vladimir

Leave a Comment

Please sign in to add a comment. Not a member? Join today