This is probably a hard question to be asking in the forums, since it deals with my specific implementation of a pipeline, but I figured I'd give it a try since I'm kinda stuck in my debugging and just need some more hints on where to look.
I have a problem that lends itself greatly to a pipeline. I have a file that contains a list of objects which I just read out serially. Then I need to calculate "attributes" of those objects which get saved into another file that is just a list of attributes.
Schematically this it looks like this
While not end of file
Calculate attribute 1
Calculate attribute 2
Calculate attribute n
Save Attributes to a different file
Each attribute calcualtion only depends on local data and the (constant) object read in from the file. So I set this up into a pipleine where I pass a pointer of the object along the pipeline until is exits at the end. Obviously the Read step (filter) is serial and the Save
step is also serial (although it need not be, it's just easier to implement that way). All the other steps can be parallel.
My test machines are a dual core AMD and a quad core Mac, neither machine showed any speedup when I used anywhere from 1 to 8 threads, it's also insensitive to the number of tokens in the pipeline (from 1 to 32).
You may say that I'm I/O bound on my two serial steps, but I did a bit of profiling on the serial verson of this code and found that the read step is 10% of the total time and the Save step is 7% of the total time. The remaining 83% of the time is consumed in the different parallel steps. So I'd expect a reasonable amount of speedup on a 2 or 4 core machine.
Are there any tweaks to TBB or things I should be on the lookout for? I find it odd that I'm seeing absolutely no speedup when I increase the number of threads.