I've attached larger input file. Parameters for generation: number of segments = 400'000, coords in rage 0..400, max segment length in each dimention = 40. My program finds 12648 intersections in 10 seconds.
I've attached larger input file. Parameters for generation: number of segments = 400'000, coords in rage 0..400, max segment length in each dimention = 40.
I couldn't unzip this file. Can you check to see if it is OK when you try to unzip it?
I couldn't unzip this file. Can you check to see if it is OK when you try to unzip it?
Sorry for that, I used 7z's zip+bzip2 compression method. Reattached file with plain old zip+deflate compression method (now it is slightly bigger - 4.1MB versus 3.6MB).
Here is my performance on the 400,000 row dataset on a 2.2GHz AMD Opteron 8354:
cores runtime speedup speedup/cores
1 25.59s 1 1
2 12.82s 1.99 .995
4 6.50s 3.94 .985
8 3.34s 7.67 .959
12 2.34s 10.96 .913
14 2.02s 12.67 .905
16 1.98s 12.92 .808
c++ 23.08s 1.11 (the C++ code with no Cilk++ constructs)
The machine is busy running some other code, so only about 13 or 14 cores are available, hence the falloff in speedup/cores at the top end.
My 1.6GHz laptop is somewhat slower (about 42.9s on two cores). The clock rate differnce is much less than the speed difference, so I think this says the 16-core machine has a better memory architecture than my laptop. (What kind of machine did you run on?)
The C++ code is about 11% faster than the Cilk++ code on one processor. I'm still looking into that.
I've attached larger input file. Parameters for generation: number of segments = 400'000, coords in rage 0..400, max segment length in each dimention = 40. My program finds 12648 intersections in 10 seconds.
I get 12648 in 41.1 sec. Too many segments in a small domain leads sweeping algorithm inefficiently. 8040132448 calls to judge intersections that I think is not much less than brutal search. Only 81332496 of them pass box test. The length of potential list even reach to 23843. As O(logn) structure is not used in my code, I have to compare all the segments in the list before insert a new segment.
But I found Clay will use a different method to generate data which result in uniform distribution for endpoints. It can make the average length of the list to 0.
I personally hope Clay could use different distribution patterns.
I get 12648 in 41.1 sec. Too many segments in a small domain leads sweeping algorithm inefficiently. 8040132448 calls to judge intersections that I think is not much less than brutal search. Only 81332496 of them pass box test. The length of potential list even reach to 23843. As O(logn) structure is not used in my code, I have to compare all the segments in the list before insert a new segment.
But I found Clay will use a different method to generate data which result in uniform distribution for endpoints. It can make the average length of the list to 0.
I personally hope Clay could use different distribution patterns.
I think algorithm which Clay described is worser for both algorithms. I generated input data according to Clay's description. My implementation take 4 seconds for 20'000 segments. And for 100'000 segments I aborted computation. If somebody interested I may upload the input data.
The interesting thing is that there is 0 intersections for 20'000 segments. However in 2D case (projection to XY) there is ~10^7 intersections (~4% of all possible intersections). You use projection to some 2D plane in your sweeping line, right? If so, you will have to handle that large number of potential intersections, and processing of each intersection is several logN operations on linked data structures.
I think algorithm which Clay described is worser for both algorithms. I generated input data according to Clay's description. My implementation take 4 seconds for 20'000 segments. And for 100'000 segments I aborted computation. If somebody interested I may upload the input data.
The interesting thing is that there is 0 intersections for 20'000 segments. However in 2D case (projection to XY) there is ~10^7 intersections (~4% of all possible intersections). You use projection to some 2D plane in your sweeping line, right? If so, you will have to handle that large number of potential intersections, and processing of each intersection is several logN operations on linked data structures.
Btw, bounding box is very unefficient here too.
Interesting! Could you upload the input file?
I agree that box-test is not as useful as before. Maybe I should test whether the pairs are in the same plane first in this case.
I agree that Dmitriy's numbers are impressive. I'd like to know what people's techniques are. Attached is my submission so you can see my tricks. Also attached is my string matching solution.
I agree that Dmitriy's numbers are impressive. I'd like to know what people's techniques are. Attached is my submission so you can see my tricks. Also attached is my string matching solution.
My write-ups for Line Intersection and String matching are attached. They are not quite as elegant as yours but I think they do the job. I'm new to writing long descriptions, feedback on the same will be appreciated.
My write-ups for Line Intersection and String matching are attached. They are not quite as elegant as yours but I think they do the job. I'm new to writing long descriptions, feedback on the same will be appreciated.
Your writeups seem nice. I especially like the line segment writeup.
I think one main difference between your algorithm and mine is that I did the categorization recursively (I.e. subdivide each category), but looking at your performance it seems like that may have been more work than was really needed. Breaking 400,000 lines into 27 categories gives an average of about 15,000 per category, wihch may be small enough to be efficient.
I expressed my categorization 1 dimension at a time, whereas you did all three dimensions at once, but I don't see that as an essential difference.
On the string matching problem, I think that my implementation Rabin-Karp string matching also seems like overkill for the case where the data is real DNA. I can see why your code is faster, since looking at the first few (or maybe 32) characters will quickly weed out false matches. I like the trick of shifting the same data by 2 bits to avoid shifting later in the runtime.
Your writeups seem nice. I especially like the line segment writeup.
I think one main difference between your algorithm and mine is that I did the categorization recursively (I.e. subdivide each category), but looking at your performance it seems like that may have been more work than was really needed. Breaking 400,000 lines into 27 categories gives an average of about 15,000 per category, wihch may be small enough to be efficient.
I expressed my categorization 1 dimension at a time, whereas you did all three dimensions at once, but I don't see that as an essential difference.
On the string matching problem, I think that my implementation Rabin-Karp string matching also seems like overkill for the case where the data is real DNA. I can see why your code is faster, since looking at the first few (or maybe 32) characters will quickly weed out false matches. I like the trick of shifting the same data by 2 bits to avoid shifting later in the runtime.
Nice job.
-Bradley
Thanks Bradley!
I know my documentation skills need work. But, the fact that the documents weren't hard to understand, is encouraging.