After the competition will be over how will you test our solutions?
I assume there will be some private benchmarks along the ones we were given in the contest period.
Regarding the benchmarks I found out that the special condition for the SUBSTRING problem according to the reference file is _NOT_ checked by any of the benchmarks.
With the special condition I mean the following: The reference implementation computes diagonals, but besides checking for larger sequences on the diagonal itself, ich has to check neighour-fields (horizontally and vertically). It is accomplished by checking L[i+1][j] and L[i][j+1]. Because of parallelization issues, this step was made independent from the rest of the algorithm in our solution. Disabling this checking in our implementation passes all benchmarks when submitted via AYC upload. This is just because this case was not checked for in the benchmark suite.
However, when using custom generated data (e.g., the pimped version *g*), there cases makes a difference. I I think this might be a bit unlucky not to be contained in the benchmark suite using the upload feature.
I agree with you. They ought to test that case, too, because the reference algorithm contains these ifs and this shifting behavior, and doing this correctly in a parallel algorithm is difficult.
I've spent quite a bit of time on the neighbor field issue. The diagonal algorithm isn't very difficult but correctly checking the neighbor diagonals is, if you want to do it fast.
We ended up using a small queue for the left, current and right diagonal to check the neighbors. I also had to adapt the way we split the input into subranges (for parallelization) so that a "repeated sequence" (AAAAAAAAAA) is not split because it would break our merging algorithm:On reading a sequence, I create a (naturally sorted) array of ranges that contain duplicated letters and then I use a binary search with the start and end position of the relevant input segment to make sure that that it doesn't start or end in a series of duplicated letters and adapt the range if it does so.
This costs a lot of time.
And it resulted in the biggest facepalm of my life ^^After the deadline, when I woke up, I looked at the code and thought to myself: "why do I keep an array and why do I binary search at all. I could just check the previous and current letter at the borders and if they are equal, move to the right..."It took me 3 minutes to take out the old code and put in that approach...: works like a charm and I got 30% better performance in all my benchmarks. But instead I spent the whole night from Tuesday to Wednesday microoptimizing shit and not knowing what else to do; we would have been on par with candreolli and andreas86 with this change :(