Run time accuracy

Run time accuracy

I am wondering how accurate the reported runtime is.

This morning I ran two tests, single threaded (CPU usage: ~94%) with time X and multithreaded (CPU usage: ~104%) time Y. Now Y is more than twice as long as X. How do you measure the time? Is it possible to provide more accurate results?

Rock the bits!
6 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

I suppose that real time of your multithreaded version is bigger than single threaded one because of non-optimal thread creation and probably concurrent access to resources.
I think time is measured with linux "time" command, but I may be wrong.

But how could I create them more optimal?
I use the tbb for parallelization…

I'm setting the number of threads like this:
tbb::task_scheduler_init init(params.nb_threads);

I read that usually you let the tbb handle the number of threads, but since we are required to stick to the parameter provided I try to set them like this. But this is my first time I work with the tbb, so I just googled it and don't really know much about it. So, could it cause the overhead?

Rock the bits!

Quote:

Heye (aka. slevin7) wrote:

But how could I create them more optimal?
I use the tbb for parallelization…

I'm setting the number of threads like this:
tbb::task_scheduler_init init(params.nb_threads);

I read that usually you let the tbb handle the number of threads, but since we are required to stick to the parameter provided I try to set them like this. But this is my first time I work with the tbb, so I just googled it and don't really know much about it. So, could it cause the overhead?

Yer, tbb create overhead. Actually, because of thread synchronisation, yout will always have overhead, no matter what technique you use.

hmm... I use the parallel_for. I arranged in a way, that all threads could run independent of eachother.
There is no need for synchronization, but are they still beeing synchronized?
The loop runs once for each flight, are the chunks maybe too small?
The time of the parallel portion takes about 2 seconds when run sequential on CompileScale2, would you suggest to only enable parallelization for bigger input files?

Rock the bits!

Quote:

Heye (aka. slevin7) wrote:

hmm... I use the parallel_for. I arranged in a way, that all threads could run independent of eachother.
There is no need for synchronization, but are they still beeing synchronized?
The loop runs once for each flight, are the chunks maybe too small?
The time of the parallel portion takes about 2 seconds when run sequential on CompileScale2, would you suggest to only enable parallelization for bigger input files?

Hi, no matter what you use, there is some amount of overhead, and behind parallel_for there is some mechanism (which is also overhead) for task stealing and synchronization. Actually, (I'm not 100% sure because I don't work with cilk or TBB), task scheduler is one thread which monitor and map tasks to threads. This thread is also overhead. Even if you manage to make program without synchronisation (explicit or implicit), there is synchronization on hardware level (cache and ram synchronization for example), which couldn't be avoided. Also, if you want to parallelise serial code, there is no deterministic answer (for now) which technique is the best (if there is, then this contest would have no sense), there are only guides and patterns (which you can read on Intel courses). Good optimisation depends on a given algorithm, experience, knowledge of pc architecture, and a lot of try and measure. I'll sugest to you to try and measure which chunk size iz optimal, and other parameters, because I can't tell you what would I use on your place (and even if I can, that is probably not optimal without some try and measures), without analysing your code (which doesn't make sense, since this is contest) Regards, Dusan.

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!