I've been running experiments on MTL over the last week, and I'm seeing some strange results. I ran similar experiments on MTL last summer and was able to obtain consistent performance measurements for operations on four concurrent data structures. This time, taking performance measurements on six concurrent data structures, I'm not having much luck obtaining good results (i.e., small standard deviation, few anomalies). I'm starting to wonder if something is happening systemically.
My experiments were split up into batch jobs by the fixed number of cores the algorithms could each run on (i.e., a 1-core job, a 4-core job, 8, 12, ..., a 32-core job). I've run my entire experimental suite three times. The first time, the performance of each algorithm was good, scaling well except at 20, 24 and 32 cores, where it plummeted to around the level of single-core performance (see Figure 1 through Figure 3).
The second time, algorithms scaled well except at 16, 28 and 32 cores, where performance again plummeted to around single-core performance (see Figure 4).
Since the performance of the algorithms dropped to this uniformly poor level, the standard deviation of the runs associated with those "bad" data points was very small, so it gave me little numeric information to indicate which data were reliable. I attempted to address this by splitting the repeated trials for each experiment into two groups, that were run at different times (so that one of them being affected by this strange problem would be more likely to skew the standard deviation). I also split up the jobs (which were originally about 4 hours each) into many smaller jobs (each less than 1 minute), and re-submitted. This resulted in some well behaved data (see Figure 5), and many graphs displaying inconsistent data (see Figure 6).
For reference, when I run the same experimental suite on a 16-core Sun machine, and on a 4-core Intel Q9450, I get very consistent performance (albeit with a different ranking of algorithms on the Sun machine).
Any thoughts? Thank you for your time and attention.
Figure 1: Part of experimental suite 1, showing anomaly at 20 cores.
Figure 2: Part of experimental suite 1, showing anomaly at 24 cores.
Figure 3: Part of experimental suite 1, showing anomaly at 32 cores.
Figure 4: Part of experimental suite 2, showing anomalies at 16, 28 and 32 cores.
Figure 5: Part of experimental suite 3, showing good results.
Figure 6: Part of experimental suite 3, showing inconsistent results.