Intel® Threading Building Blocks

image processing doesnt scale well

im writing a function for efficiently blending two images with premultiplied alpha...

ive optimized it for SSE and tried to make it cache efficient... and now im trying to make it even faster by splitting up the work between cores with TBB...

however my code barely scales... it gets only 10-20% faster with 2 cores and scales even worse the more cores u add...

we do processes several images at the same time... but there is a limit on how much we can do this which is why i would like to make the actual "processing" scale better...

Scheduler bypass semantics?

I'm currently working on altering the scheduler decisions made, to experiment with other methods and contrast them against the work-stealing approach that is currently core to TBB. In enforcing specific task execution orders, it came to my attention that there are two places where scheduler decisions are effectively bypassed (spawn_and_wait_for_all, and returned task*s). Given that the TBB interface doesn't actually specifiy scheduling order (i.e. work-stealing is not guaranteed), are the scheduler-bypasses semantically meaningful?

pipeline vs task (vs raw thread?)

after reading a serveral articles about pipeline, some questions come to me.
is there any advantage in pipeline vs raw thread?
i.e.I have divided my job into several raw thread,they communicate with each other via queues(lock-free/concurrent,etc),
will there be any benefit from pipeline mode?

BTW,I think TBB::task pattern is easy to understand,but anyone can tell its advantages vs raw thread?
benchmark/test result comparision are appreciated.

Design patterns for good IO performance in TBB

I had a program (see far-too-long post here), which I parallelized using a TBB::parallel_for loop. Each task had to do some disk reading, then some computation, but the tasks could be executed in any order. Compared to the sequential execution of the same program, this version got really bad performance out of the disk.

Problem----std::bad_alloc at memory location 0x0012fc4c

I hadsome applicationsusing tbb.They work on my oldcomputer which hasduo cores.
While some of them do not work on my new computer which has eight cores.
The new computer has bigger memory than old one.
I 'm wonderinghow this memory problem comes.

"Main_iterate_fitting_parallel_128.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x0012fc4c.."

Any suggestion will be welcome.

Subscribe to Intel® Threading Building Blocks