English | 中文 | Русский | Français
2,555 Posts served
8,264 Conversations started
I thought I was ready to dig into the scheduler using the tools I’ve built so far. But I was disturbed by some of the concurrency diagrams my current data collector and post-processing program generated. Here’s a typical case:
This latter rule seemed right at the time, but I discovered that it didn’t allow for the possibility of holes in the work flow within the processing of an outer index. I opted for a simpler rule, clearing the holding buffer each time I wrote it out so that the concurrency output would be more responsive to worker inactivity. And because I expect to do more processing of these data once generated, I switched the output of my perl post-processing program to generate CSV rather than neat columns of text. A new version of the script is available (I had to change the file extension to post it, but it is a perl script)..
One final addition to the script: since I had the interval testing code already written for my previous outer index completion testing, I repurposed it to generate a population graph, inverting the data to show which workers complete which parts of the task.
|
|
Recapping, strange formations of the concurrency data led me to rethink the process to use a lighter weight mechanism for contending threads, spitting out a time-from-start which can be used to sort the concurrency data. Pumping my sample run through the pipe, sorting and feeding it through the script and then doing a little extra formatting in the spreadsheet, here’s what drops out at the back end.The idle gaps I saw before are gone, at least from this run—I’ll need to make several more trials to see whether that remains so from run to run. There’re also activity gaps: thread 7 shows work done on outer index 34 and then a gap before finally completing it; likewise on threads 5 and 6.I’m also using the October 19th open source release of TBB, so maybe the problem I was seeing before was due to the scheduler. More data collection should help answer that.There’s also the population report that gets stuck at the bottom of the CSV file. It generates a big spreadsheet so I can only show part of it here. There’re actually two tables generated. Each displays a result for each thread and each outer index. The upper table marks the thread that received the outer parallel_for task for each outer index.
The lower table shows a percentage of the inner indices on each outer index that was processed by each thread. I’m losing some resolution here to fit it all in, but it should be clear enough how some threads process outer indices to completion while others get part of the task stolen by another worker. Or several. A simple conditional-format in the spreadsheet adds the outer loop reservation data as a highlight in the work distribution table. I’ve already got the test program instrumented to compare the results using different partitioners. I’ll take up that topic next time, job permitting. Hopefully it won’t be another two months. |

Josh Bancroft (Intel)
3,492
Status Points:
2,992