Comparing our results

Comparing our results

Hello !

During the contest last year, an important aspect was to share the results of our codes.
It boosts the competition and allow everyone to give the best of his algorithm.

I propose to share our results on the recently updated large example given here : http://software.intel.com/fr-fr/forums/showthread.php?t=104659&o=a&s=lr (threshold = 22)
on :
1 thread
2 threads
16 threads
with the time command.

I will update my own results as soon as I will have a first version working without bugs...

VinceRev

59 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

We can compare results based on refseq.txt and input.txt till we complete ,,gready" part with **L :)
Every code is tested on a 12 cores machine or if the code is fast enough, it will be tested on machine with more cores?

These results won't be comparable because they are executed on different machines. If you wish to compare results, it's much better to share benchmark results with the appropriate benchmark ID and the benchmark server name (e.g. 12-core machine).

Best regards,
Nenad

Ok, here is my best result so far :-) What do you have on this machine?

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
	real:1.43	user:10.36	sys:0.69	CPU:772.727%

Is this 1.43 seconds to compare the large sequence recently provided on the forum ?

no, this is an intel benchmark, we don't know the file size corresponding to this benchmark, we just have the unique key. (you will understand all of this when you will be registered :) )

After optimizing the code and using OpenMP, the result is this:

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.18 user:2.06 sys:0 CPU:1144.44%

People might be reserved when speaking about times, that's why I think this thread isn't popular. But I am curious about theoretical complexities of the used algorithms. For example I've used some preprocessing of O(N log^2N) time complexity (which with some effort can be reduced also to O(N)) which helps me skip some pairs of potential matches, but still in worst case the algorithm I am using is O(N^2) although it performs better in practice.I remember the organizers adviced us to focus on parallelism and not only on algorithms, but I am curious, have anyone come up with an algorithm that have guaranteed o(N^2) (lower than Theta(N^2)) time complexity? I might be wrong, but i think this is impossible given that the output might be O(N^2) if the minMatchingTreshold is small.

Your preprocessing has the complexity of O(N * log(N) * log(N)) or did I misinterpreted log^2N?

I recommend to read about the longest common subsequence problem in the wikipedia. The problem can't be solved in less than O(N^2) if I read it correctly.

The problem you are refering to is about subsequence, not a substring. So it is important that we have substrings in this problem.

You may find more details here:
http://en.wikipedia.org/wiki/Subsequence#Substring_vs._subsequence

As yvanko said, our case is about substrings, which can also be solved in O(N) (again, wikipedia). But our problem isn't about longest common substring and but about maximal (which cannot be extended) substrings and I was wondering which is the lower bound for the complexity. I couldn't find this on wikipedia :)

i finally made it to the last machine *yeah*

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
real:22.6 user:320.1 sys:0.38 CPU:1418.05%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
real:14.51 user:328.61 sys:0.63 CPU:2269.06%

there is still so much work to do ^^ ... unfortunally there is so little time left :/

Would you be so kind to share your results on other benchmarks (i.g. 12 cores) so we can approximately know what is a boundary we need to cross to advance to 40 cores machine?
I personally did not finish my algorithm yet(and my current results are little embarrassing :) ), but when I do, I will share back my results.

Regards,
Nemanja

here you are :-) i still dont know why cpu usage is so small at the first test though :(

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
	real:21.34	user:308.35	sys:0.38	CPU:1446.72%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
	real:13.78	user:308.91	sys:0.39	CPU:2244.56%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
	real:9.55	user:156.6	sys:0.08	CPU:1640.63%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584:
	real:6.06	user:156.9	sys:0.08	CPU:2590.43%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584:
	real:23.49	user:131.74	sys:0.05	CPU:561.047%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584:
	real:13.48	user:147.2	sys:0.07	CPU:1092.51%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584:
	real:12.48	user:242.81	sys:0.06	CPU:1946.07%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
	real:0.41	user:1.38	sys:0	CPU:336.585%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132:
	real:0.21	user:1.38	sys:0	CPU:657.143%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915:
	real:0.2	user:0.26	sys:0	CPU:130%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
	real:0.2	user:0.43	sys:0	CPU:215%

Nice post, very appreciated. Very courageous of you to post your results :)

Best regards,
Nenad

Thank you very much dieter84. Now we know what to expect,
and how much we need to improve our code :)

Regards,
Nemanja

Pretty good. Well, this is one of discussions, Our time for example for benc ..187 is better than yours, BUT we scale much worse.

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187: real:8.46 user:59.21 sys:0.43 CPU:704.965%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187: real:7.21 user:83.24 sys:0.47 CPU:1161.03%

from 1 thread to 2 or 3 we have 1/2 or 1/3 of time, but after that it is not much faster. so on 20 or 40 threads we do not have much improved our time...

We're having nearly exactly the same results. We have more CPU, but the same real time. I like that, the race is getting close ;-) I am really excited now!

what time is now the most importnat ? real or user ?

real is the important time. It is the real physical execution time. user is the accumulated (from all active cores) time spend in user mode.

real-time is also called wall clock time (the time your program take when you would have measured it with a stopwatch/wall clock), and in terms of user perspective, the "most important".

user time is the accumulated time spend by all processors, i.e., it measures the efficiency of the code when run on a single-core cpu. The ratio given as CPU utility is calculated by dividing the real time by the user time, and shows thereby how efficient your code is on a multi-core cpu.

Last (and least), the system time is quite unimportant, it just shows what amount of time the system was busy doing something else than executing the user's program (e.g. interleaved programs, other stuff). The only case this time is important, if it is really high, and thus falsifying the shown real-time.

Cheers,
Jakob / Team Zubrowka

The system time is not unimportant. Your program consumes system time when it uses system calls (for memory allocation, file I/O, and other stuff).

Are you using suffix trees or dynamic programming?

I doubt that someone has good performances in parallel with suffix trees.I don't see how to parallelize efficiently this algorithm.Although, after the contest if I am wrong I will be happy to study the solution.

I wonder, are you really a student when you have that much free time? I have a lot of projects,tests, Android app to develop, and no time to check all the algorithms and solutions, and switch from one idea to antoher. just asking, no hard feelings.

well it's different from country to country. in russia they have tests at the end of may, in germany in most universities our semester just began one moth ago. and sure in some uni-s u have less to do than in others + different priorities, for a month I don't really do anything for the uni stuff...

Finally come to 40 cores :), but it still needs to be improved, and so little time left. Here are the results:

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
real:39.51 user:716.51 sys:0.22 CPU:1814.05%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
real:23.04 user:747.87 sys:0.25 CPU:3247.05%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
real:14.77 user:270.16 sys:0.05 CPU:1829.45%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584:
real:10.06 user:303.01 sys:0.13 CPU:3013.32%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584:
real:36.71 user:209.64 sys:0.04 CPU:571.18%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584:
real:18.82 user:209.75 sys:0.04 CPU:1114.72%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584: real:13.13 user:282.34 sys:0.04 CPU:2150.65%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.48 user:2.83 sys:0 CPU:589.583%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132: real:0.3 user:3.47 sys:0 CPU:1156.67%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915: real:0.09 user:0.55 sys:0 CPU:611.111%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915: real:0.05 user:0.53 sys:0 CPU:1060%

cool thank you :-) how the f do you get so small times at the first tests? :P

You are wellcome, but it's not so fast as you think :)
Take a look at this thread http://software.intel.com/fr-fr/forums/showthread.php?t=105122&o=a&s=lr
I would like know how they get zero time at the firs test...

Regards,
Nemanja

Well their performance on the first test is not so important.Scalability and performances and greater test are more useful to measure the efficiency of your code.

You are right. As we already discussed in the mentioned thread, the bigger examples are much more important, because just there scalability issues come into play. And looking at your results, you manage to get the 40 cores running, don't you? ;-)

Since the contest is nearly over... our scores are :on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
real:17.32 user:313.36 sys:0.74 CPU:1813.51%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
real:9.84 user:318.87 sys:0.95 CPU:3250.2%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
real:9.04 user:165.92 sys:0.68 CPU:1842.92%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584:
real:5.22 user:172.38 sys:0.98 CPU:3321.07%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584:
real:21.58 user:127.02 sys:0.37 CPU:590.315%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584:
real:11.98 user:136.72 sys:0.37 CPU:1144.32%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584:
real:9.58 user:212.51 sys:0.56 CPU:2224.11%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.4 user:1.29 sys:0 CPU:322.5%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.2 user:1.31 sys:0 CPU:655%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.2 user:0.25 sys:0 CPU:125%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.2 user:0.25 sys:0 CPU:125%

Hey Grigore, I have a good new and a bad news for you.The bad one is that our solutions seems to be faster

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
real:6.65 user:95.93 sys:6.51 CPU:1540.45%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584:
real:4.91 user:136.96 sys:12.49 CPU:3043.79%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584:
real:12.82 user:57.49 sys:1.7 CPU:461.7%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584:
real:8.72 user:72.3 sys:2.12 CPU:853.44%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584: real:6.86 user:123.63 sys:2.49 CPU:1838.48%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132: real:0.04 user:0.18 sys:0.02 CPU:500%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132: real:0.03 user:0.21 sys:0.02 CPU:766.667%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915: real:0.01 user:0.05 sys:0 CPU:500%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915: real:0.01 user:0.06 sys:0.02 CPU:800%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-7570939485803595530:
valid submission

The good one is that on the 87 benchmark it fails and we couldn't make it work yet.

error on a 40-cores HT machine :
error during benchmark: unexpected error during runtime.

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
invalid benchmark

I assume the same O(n*m) but more likely coupled with some better heuristics...Also i feel SSE could still be better adjusted in my case (i only use full SSE_16 above a minMatchLength threshold) - more details in the article to come... Anyway gg :)

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
real:4.18 user:36.62 sys:0.09 CPU:878.23%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584: real:3.14 user:45.65 sys:0.12 CPU:1457.64%

sure not the best, our russian collegues are MUCH faster. i mean really much.

Sure our times will not change a lot this evening so i share them with you :
on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035:
real:5.46 user:89.4 sys:0.36 CPU:1643.96%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-13105960671904317035:
real:4.01 user:116.21 sys:0.43 CPU:2908.73%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
real:1.87 user:22.48 sys:0.46 CPU:1226.74%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
real:2.63 user:73.71 sys:0.51 CPU:2822.05%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584: real:8.6 user:34.99 sys:3.45 CPU:446.977%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584: real:9.46 user:62.59 sys:4.41 CPU:708.245%
on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584: real:7.65 user:15.91 sys:3.32 CPU:251.373%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584: real:7.14 user:18.8 sys:4.85 CPU:331.232%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584: real:6.97 user:24.47 sys:7.72 CPU:461.836%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132: real:0.08 user:0.12 sys:0 CPU:150%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132: real:0.08 user:0.16 sys:0 CPU:200%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915: real:0.03 user:0.06 sys:0 CPU:200%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915: real:0.03 user:0.08 sys:0 CPU:266.667%

Don't look our times on the 84 ! ^^.

Maybe they have different benchmarks :p.

you get worse time with more threads, but WOW !

We passed a week to check the 84 benchmark and we are agree we have some parallel problems with some benchmarks but no times to do better.No times to correctly balance our threads too.(And our russian friends have already better times than us on the 35 benchmark :) ).

and mainly, not enough data to do it :). On my computer with 8 cores using generator from the forum, we got better paralelization
I think some people will get better time than us.

But but but, don't be affraid by our scalabilty problems and share your times with us too please :-).

We are not afraid of your problems, but of your results ! :D

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035: real:13.83 user:108.38 sys:0.45 CPU:786.913%

=(

I know some people kept their times until the end. They are right. Tomorrow I am convinced we will have some surprises. I am waiting this impatiently.

We just decide to publish the time of one of our last benchmark. Maybe it's a mistake you never know what some people are able to realize. It stays still 10 hours before the end of contest. We won't touch our code until tomorrow morning. We only work on the documentation.

Does anybody have some insight into benchmark 87, like how big would be the reference sequence and how does it compare to the input sequences? We just passed it and didn't have time to do "black-box" testing for such things (but for the 84 one I can tell that N>60M and M

Okay, I call it a night. I've spent way too much time trying different hash functions etc etc and I think I probably have to change our algorithm to get better results.

Here are our results:

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035:
real:1.64 user:4.15 sys:0.55 CPU:286.585%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-13105960671904317035: real:1.63 user:4.24 sys:0.98 CPU:320.245%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187: real:1.39 user:4.69 sys:0.48 CPU:371.942%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187: real:1.36 user:4.64 sys:0.51 CPU:378.676%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584: real:0.59 user:1.34 sys:0.09 CPU:242.373%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584: real:0.61 user:1.32 sys:0.11 CPU:234.426%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584: real:0.54 user:1.12 sys:0.07 CPU:220.37%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584: real:0.48 user:1.13 sys:0.06 CPU:247.917%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584: real:0.48 user:1.44 sys:0.12 CPU:325%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.01 user:0.01 sys:0 CPU:100%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132: real:0.01 user:0.01 sys:0.01 CPU:200%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915:
real:0 user:0 sys:0 CPU:-nan%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915: real:0 user:0 sys:0.02 CPU:inf%

Cheers,
Andreas

Better results? Are the best I've seen.

Here are our results. Good performance and scalability on all bench but
the last one - the big terrible unexpected and scarry bench - that uses a
big reference input file. I hacked some stuff two nights ago to be able to
pass the test, but I guess that we can't do much better with a O(n*m)
algorithm. (Too bad we did not thought of these kind of inputs... I
would have coded a real suffix tree version for very big inputs... for
all other inputs, the O(n*m) algorithm was more interresting because it scaled very well with the number of threads...)

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035:
real:221.32user:4422.5sys:0.5CPU:1998.46%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-13105960671904317035:
real:110.8user:4422.17sys:0.99CPU:3992.02%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
real:2.48user:47.97sys:0.29CPU:1945.97%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
real:1.33user:47.97sys:0.19CPU:3621.05%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
real:1.05user:20.72sys:0.04CPU:1977.14%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584:
real:0.63user:23.44sys:0.05CPU:3728.57%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584:
real:2.49user:14.9sys:0.02CPU:599.197%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584:
real:1.3user:15.5sys:0.04CPU:1195.38%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584:
real:1.39user:33.25sys:0.03CPU:2394.24%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.03user:0.19sys:0CPU:633.333%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.02user:0.2sys:0CPU:1000%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.01user:0.02sys:0CPU:200%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
real:0user:0.04sys:0CPU:inf%

Note that we received two mails for this single submission (the last one with a "error on a 40-cores HT machine : Can't read file" error - error which I have mentionned in another topic). I hope that Intel will not use these buggy mails to classify people (if somebody @Intel could confirm that, that would be a huge relief :) )

These are our results. Sorry for posting them so late, but we tested until the last minute. The ast submission was at 2012-05-16 08:58:30 CET :D.

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035:
real:4.24 user:61.75 sys:0.66 CPU:1471.93%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-13105960671904317035:
real:3.3 user:60.63 sys:6.67 CPU:2039.39%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187:
real:1.24 user:7.74 sys:0.83 CPU:691.129%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187:
real:1.22 user:5.41 sys:7.87 CPU:1088.52%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584:
real:2.4 user:22.99 sys:3.98 CPU:1123.75%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584:
real:2.4 user:28.55 sys:7.03 CPU:1482.5%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584:
real:2.17 user:9.75 sys:0.98 CPU:494.47%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584:
real:1.56 user:10.9 sys:1.66 CPU:805.128%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584:
real:1.49 user:14.9 sys:3.19 CPU:1214.09%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.06 user:0.22 sys:0.07 CPU:483.333%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132:
real:0.05 user:0.24 sys:0.12 CPU:720%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.02 user:0.05 sys:0.02 CPU:350%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.01 user:0.06 sys:0.02 CPU:800%

Good luck to everyone, this will be a tough call :)

Best regards,
Nenad

Here are ours, I'm late too cause I add my complexity test this morning. I only slept 4 hours :)

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035: real:1.36 user:18.35 sys:2.01 CPU:1497.06%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-13105960671904317035: real:1.45 user:36.01 sys:3.45 CPU:2721.38%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187: real:0.83 user:7.4 sys:2.9 CPU:1240.96%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187: real:0.8 user:10.91 sys:4.08 CPU:1873.75%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584: real:0.39 user:5.2 sys:0.59 CPU:1484.62%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584: real:0.27 user:1.98 sys:1.4 CPU:1251.85%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584: real:0.19 user:0.48 sys:0.04 CPU:273.684%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584: real:0.18 user:0.6 sys:0.08 CPU:377.778%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584: real:0.17 user:0.78 sys:0.12 CPU:529.412%
on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132: real:0 user:0.01 sys:0 CPU:inf%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132:
real:0 user:0.02 sys:0 CPU:inf%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915: real:0 user:0 sys:0 CPU:-nan%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
real:0 user:0.01 sys:0 CPU:inf%

I'm sorry to point this out to you:
The problem is symmetric, so you don't have to care which is your reference sequence and which your input.
Ie if input < ref, you can simply swap the two, if your algorithm prefers that.

I think I might have even written about that before because I had exactly that problem at the beginning: I assumed that ref << input and test_input_1 was the other way around and wouldn't finish on my workstation.

Cheers,
Andreas

I think there is nothing to add. Your results are just amazing ...

Here our last benchmark we get a 11 o'clock after dropping the parallelization on the index construction.

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-13105960671904317035:
real:5.51 user:89.86 sys:0.37 CPU:1637.57%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-13105960671904317035: real:3.93 user:108.6 sys:0.41 CPU:2773.79%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-9510636300373457187: real:1.85 user:21.48 sys:0.45 CPU:1185.41%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-9510636300373457187: real:1.71 user:35.03 sys:0.48 CPU:2076.61%

on a 40-cores HT machine, using 20 worker threads, running benchmark AE12CB-14929312560780125584: real:4.72 user:14.89 sys:0.3 CPU:321.822%

on a 40-cores HT machine, using 40 worker threads, running benchmark AE12CB-14929312560780125584: real:5.66 user:40.75 sys:0.32 CPU:725.618%

on a 12-cores HT machine, using 6 worker threads, running benchmark AE12CB-14929312560780125584: real:4.5 user:3.78 sys:0.17 CPU:87.7778%

on a 12-cores HT machine, using 12 worker threads, running benchmark AE12CB-14929312560780125584: real:3.42 user:3.78 sys:0.18 CPU:115.789%

on a 12-cores HT machine, using 24 worker threads, running benchmark AE12CB-14929312560780125584: real:3.42 user:4.18 sys:0.19 CPU:127.778%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-10353053912364647132:
real:1.05 user:0.08 sys:0 CPU:7.61905%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-10353053912364647132: real:0.05 user:0.12 sys:0 CPU:240%

on a 12-cores machine, using 6 worker threads, running benchmark AE12CB-16325737234926730915: real:1.01 user:0.04 sys:0 CPU:3.9604%

on a 12-cores machine, using 12 worker threads, running benchmark AE12CB-16325737234926730915:
real:0.01 user:0.06 sys:0 CPU:600%

For benchs with 6 threads, we have to add a sleep of one second to pass the scenario 84 and unblock 87.

Pages

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui