last day of the contest, pretty intensive. I would like to share some thoughts about the benchmarks offered in the contest, with the staff from intel as well as with the community.
The design of the benchmarks, and i am talking about a massive parallelization benchmark here, is a bit misleading in my opinion. This primarily addresses the benchmark released yesterday.
We entered the contest as a parallelization contest, and as such, focused on the key aspects of parallelization, including multiple levels of them. This includes what is called thread-level parallelism (balancing work on multiple physical cores, using techniques like pthreads, openMP, intel TBB, etc.) and instruction-level parallelism (SIMD instructions, vectorization techniques). In order to test our results, and measure it's performance, we assume that the target of optimization is to scale leveraging parallelism. In the benchmark, this was reflected by a growing size of the input _TOGETHER_ with a larger number of available parallel resources (allowed threads resp. physical cores).
However, the last benchmark, seems to go into a different direction. In that case, increasing the size of the input size WITHOUT allowing a parallel improvement (e.g., number of cores/threads) does not motivate parallel algorithms, it motivates algorithms with minimal ASYMPTOTIC COMPLEXITY. Or, in other terms, the best parallel algorithm is inferior to the worst implenetation of a single-threaded algorithm if the later offers better asymptotic complexity, when increasing input size but not number of parallel resources.
This benchmark makes it hard to compare any results or "grade" of solutions, because, to be honest, it feels odd when non-parallel algorithms outperform highly-parallel algorithms in a contest like this. This violates the idea of comparing results based on real-time, when scaling is focused so much more on the size of the input than the number of available parallel resources.
On the other hand, CPU% calculated by user time / wall time is a bad idea in many cases, too, as spinning threads (often found in open mp implementations) can achievement 100% usage when effectively in idle more (busy waiting, persistent (keep-alive) thread pools). In this case, the algorithm does not really scale, but user time grows linear with the number of threads, even when input size remains fixed.
I do not know how you evaluate this benchmark, or what the thoughts of the other competitors are. But, with the late addition of yesterdays benchmark, most of us do not have enough time to change our algorithm (or, would feel kinda forced to) to a O(n+m) one (suffix-trees, non-parallelizable). This feels abit against the spirit of the competition in our opinion.
This just reflects my opinion, others are valid, too. Maybe the organizers would keep that in mind for the next competition. All in all, it's a great idea for a contest, and besides this late surprise, it's real fun to take part in.
Technical University, Munich (TUM)