Yesterday I wrote about a problem I was having where my pipeline didn't speed up like I thought it would:
http://software.intel.com/en-us/forums//topic/56687
Today, I've been playing around more, and have learned some more stuff that is puzzling to me and I was wondering if anyone had any hints.
I took yesterday's problem and transformed it from a pipeline to a parallel_while, as either works for me in this case. So my problem is now:
while not end of file
read object from file
calculate attributes on object
save attributes
the calculate and save steps are the body of the while loop and the read step is the pop_if_present function.
Overall the execution time for parallel while and pipeline are the same, and again, I'm getting no speedup when I use more threads. I know the I/O is a small portion of the time, but I did a quick test on a small subset of my data. I changed the parallel_while to something like this
first read all of my objects outside the parallel_while
while not at the end of my object list
calculate attributes on object
save attributes
So this is the same as the previous loop, except I read everything first and then process the list in memory (only works for a small subset of my data, as normally I have multi gigibyte files to process). This way I do get a speed up for multi threading.
Here's the timing (times in ms) on a dual core AMD running Windows
For the first way, where the file I/O is in the parallel while
Serial: 1073
1 Thread: 1173
2 Threads: 1217
3 Threads: 1228
4 Threads: 1208
For the second way, where the file I/O is done first and then the parallel while loops over the objects in memory
Serial: 882 (file I/O is accounts for the missing 191 ms from the above)
1 Thread: 955
2 Threads: 659
3 Threads: 680
4 Threads: 665
So from the timing, we see that the file I/O is a small portion of the total time (about 20% in the serial case), and everything else should be perfectly parallel. But in the example where I read the file inside the parallel_while loop, I see no speedup. So the I/O is suspicious to me. I'm using standard c buffered fread() as my I/O and since the I/O is done in the pop_if_present routene, it should be serial and not be thrashing as I increase the number of threads.
Any Hints?
Mike
Parallel work coming from serial input
For more complete information about compiler optimizations, see our Optimization Notice.



