Loading...
You are not logged-in Login/Register





  • Posts   Search Threads
  • Dmitriy VyukovJuly 1, 2009 4:57 AM PDT   
    Larger Test Data

    I've attached larger input file. Parameters for generation: number of segments = 400'000, coords in rage 0..400, max segment length in each dimention = 40.
    My program finds 12648 intersections in 10 seconds.


    ---------------------------------------------
    All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
    www.1024cores.net
     Attachments 

    BradleyKuszmaulJuly 1, 2009 5:12 AM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - Dmitriy Vyukov
    I've attached larger input file. Parameters for generation: number of segments = 400'000, coords in rage 0..400, max segment length in each dimention = 40.
    I couldn't unzip this file.  Can you check to see if it is OK when you try to unzip it?



    Dmitriy VyukovJuly 1, 2009 5:33 AM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - BradleyKuszmaul
    I couldn't unzip this file.  Can you check to see if it is OK when you try to unzip it?


    Sorry for that, I used 7z's zip+bzip2 compression method.
    Reattached file with plain old zip+deflate compression method (now it is slightly bigger - 4.1MB versus 3.6MB).



    ---------------------------------------------
    All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
    www.1024cores.net
     Attachments 

    BradleyKuszmaulJuly 1, 2009 4:10 PM PDT
    Rate
     
    Re: Larger Test Data

    Here is my performance on the 400,000 row dataset on a 2.2GHz AMD Opteron 8354:

    cores  runtime  speedup  speedup/cores
     1     25.59s     1        1
     2     12.82s     1.99     .995
     4      6.50s     3.94     .985
     8      3.34s     7.67     .959
    12      2.34s    10.96     .913
    14      2.02s    12.67     .905
    16      1.98s    12.92     .808
    c++    23.08s     1.11          (the C++ code with no Cilk++ constructs)
    

    The machine is busy running some other code, so only about 13 or 14 cores are available, hence the falloff in speedup/cores at the top end.

    My 1.6GHz laptop is somewhat slower (about 42.9s on two cores). The clock rate differnce is much less than the speed difference, so I think this says the 16-core machine has a better memory architecture than my laptop.  (What kind of machine did you run on?)

    The C++ code is about 11% faster than the Cilk++ code on one processor. I'm still looking into that.



    Dmitriy VyukovJuly 1, 2009 4:36 PM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - BradleyKuszmaul

    What kind of machine did you run on?


    Intel dual core P9500 2.5GHz, 1GHz memory bus, 6MB L2$.
    What number of intersections do your program find? (my - 12648)



    ---------------------------------------------
    All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
    www.1024cores.net

    BradleyKuszmaulJuly 1, 2009 4:48 PM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - Dmitriy Vyukov
    What number of intersections do your program find? (my - 12648)
    Yes 12648


    haojnJuly 2, 2009 2:24 AM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - Dmitriy Vyukov
    I've attached larger input file. Parameters for generation: number of segments = 400'000, coords in rage 0..400, max segment length in each dimention = 40.
    My program finds 12648 intersections in 10 seconds.

    I get 12648 in 41.1 sec.
    Too many segments in a small domain leads sweeping algorithm inefficiently. 8040132448 calls to judge intersections that I think is not much less than brutal search. Only 81332496 of them pass box test. The length of potential list even reach to 23843. As O(logn) structure is not used in my code, I have to compare all the segments in the list before insert a new segment.

    But I found Clay will use a different method to generate data which result in uniform distribution for endpoints. It can make the average length of the list to 0.

    I personally hope Clay could use different distribution patterns.

    Dmitriy VyukovJuly 2, 2009 2:59 AM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - haojn
    I get 12648 in 41.1 sec.
    Too many segments in a small domain leads sweeping algorithm inefficiently. 8040132448 calls to judge intersections that I think is not much less than brutal search. Only 81332496 of them pass box test. The length of potential list even reach to 23843. As O(logn) structure is not used in my code, I have to compare all the segments in the list before insert a new segment.

    But I found Clay will use a different method to generate data which result in uniform distribution for endpoints. It can make the average length of the list to 0.

    I personally hope Clay could use different distribution patterns.


    I think algorithm which Clay described is worser for both algorithms.
    I generated input data according to Clay's description. My implementation take 4 seconds for 20'000 segments. And for 100'000 segments I aborted computation.
    If somebody interested I may upload the input data.

    The interesting thing is that there is 0 intersections for 20'000 segments. However in 2D case (projection to XY) there is ~10^7 intersections (~4% of all possible intersections). You use projection to some 2D plane in your sweeping line, right? If so, you will have to handle that large number of potential intersections, and processing of each intersection is several logN operations on linked data structures.

    Btw, bounding box is very unefficient here too.



    ---------------------------------------------
    All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
    www.1024cores.net

    haojnJuly 2, 2009 3:30 AM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - Dmitriy Vyukov


    I think algorithm which Clay described is worser for both algorithms.
    I generated input data according to Clay's description. My implementation take 4 seconds for 20'000 segments. And for 100'000 segments I aborted computation.
    If somebody interested I may upload the input data.

    The interesting thing is that there is 0 intersections for 20'000 segments. However in 2D case (projection to XY) there is ~10^7 intersections (~4% of all possible intersections). You use projection to some 2D plane in your sweeping line, right? If so, you will have to handle that large number of potential intersections, and processing of each intersection is several logN operations on linked data structures.

    Btw, bounding box is very unefficient here too.


    Interesting! Could you upload the input file?

    I agree that box-test is not as useful as before. Maybe I should test whether the pairs are in the same plane first in this case.

    Dmitriy VyukovJuly 2, 2009 4:13 AM PDT
    Rate
     
    Re: Larger Test Data

    Quoting - haojn
    Interesting! Could you upload the input file?

    I agree that box-test is not as useful as before. Maybe I should test whether the pairs are in the same plane first in this case.

    Here it is (N=30'000).
    My program finds 0 intersections in 8.6 seconds.



    ---------------------------------------------
    All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
    www.1024cores.net
     Attachments 

    haojnJuly 2, 2009 4:53 AM PDT
    Rate
     
    Re: Larger Test Data

    I got 0 too in 4.2 sec (single thread). This pattern benefits my code.


Forum jump:  

Intel Software Network Forums Statistics

17,025 users have contributed to 48,317 threads and 172,754 posts to date.

In the past 24 hours, we have 9 new thread(s) 56 new posts(s), and 52 new user(s).

In the past 3 days, the most popular thread for everyone has been How to manage rounding by IVF ?? The most posts were made to Most likely, the issue is that The post with the most views is Optimalization of sine function\'s taylor expansion

Please welcome our newest member redfruit83


For more complete information about compiler optimizations, see our Optimization Notice.