Intel® Threading Building Blocks

No speedup in pipeline

This is probably a hard question to be asking in the forums, since it deals with my specific implementation of a pipeline, but I figured I'd give it a try since I'm kinda stuck in my debugging and just need some more hints on where to look.

I have a problem that lends itself greatly to a pipeline. I have a file that contains a list of objects which I just read out serially. Then I need to calculate "attributes" of those objects which get saved into another file that is just a list of attributes.

Schematically this it looks like this

Running sample application

Hello,


I have downloaded TBB library built a sample (getting started), run it successfully on my machine, now i want to try it on a machine with 4 CPU's. I have copied exe and tbb.dll onto that machine. When i try to run it I get error message "The system cannot execute the specified program.".
With Process monitor I can see that it loads my sample app than ntdll.dll and exits with a status (-1072365566).
Can anyone tell me what I am misssing?



Thanks
Tadas

grain size = 1 gives better performance than larger grain sizes

Hi,
I am trying to parallelize an ITK application. The application has a very promiment loop with 50 iterations. I parallelized this loop using parallel_reduce() and ran the parallelized application on a 8-core machine. I noticed that a grain size of 1 is giving a better performance larger grain sizes. I am puzzled at this and wondering how this could be possible or under which condition this happens. Can anybody shed some light?

thanks,
fiju

Question on CPU usage

Hi all,

I have a simple parallel_for loop that basically runs on the blocked_range<0, TASKS, 1), or in other words, creates exactly TASKS tasks. To measure scalability, I tried varying the number of tasks from 2 to 8 on an 8-core machine (2x4). I always initialize without specifying the number of threads, so I assume 8 are created.

about the number of grains

I found parallel_for always (if possible) try to adjust the grainsize to a smaller value in order to split whole range into a number of 2^N chunks. Are there any necessities that does in such a way? As I know, say, 4-cores processor with Hyper-Threading, theoretically total 8 threads are available. But it does not mean that a program can get all of these resources, depend upon the operating system situation at the time. This brings no problem, just would like to know. For the simplicity of parallel_for, it has a primitive approach to generate threads of any numbers justprogramsinvoke.

scalable allocator problem (race conditions?)

Hi all, I have been evaluating the TBB library lately and on of the thing that is really interesting to me is the scalable_allocator.

What I did was override the global new and delete operators with the scalable_allocator found in TBB. Actually, we developed our own allocator (that we have been using for years) so I simply replaced the calls to our allocator by the calls to TBB.

TBB & Parallel patterns

I'm somewhat new to TBB and as a floating question in my head I've been thinking about parallel patterns and how they map to TBB patterns. Some are easeier than others for instance:

Divide & Conquer maps well to parallel_for/while
Pipes - pipeline/filters

so for languages like Erlang, and packages like MPI, message passing is the basis of the parallel architecture which implies that there is a message pump between the "threads"

SO my question:

Páginas

Suscribirse a Intel® Threading Building Blocks