New input files

112 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It seems that you are not finding the good result for one of the cities in the play hard file.

A large CPU usage means that you are working on different CPU. Are you using threads without trying to limit the number of threads used ? You absolutely need to take the nbthreads parameter in consideration. For the final examination, we will check that you are not using more threads that you are allowed to.

Quote:

Cédric ANDREOLLI (Intel) wrote:

It seems that you are not finding the good result for one of the cities in the play hard file.

A large CPU usage means that you are working on different CPU. Are you using threads without trying to limit the number of threads used ? You absolutely need to take the nbthreads parameter in consideration. For the final examination, we will check that you are not using more threads that you are allowed to.

Hi!

This does not seem right to me - you should limit the number of threads by other means, not check them manually (as this does not guarantee anything). I am sure everybody here is able to obfuscate thread creation in order to make it impossible to determine the maximum number of threads they use. Moreover, sometimes may be useful to create more threads than the physical limit as this does not necessarily means all of them are running at the same time.

Andrei

I probably miss explained.

I was of course talking about the number of running threads. You can do almost what ever you want but you have to respect the number of running threads. But that's the way this contest has always worked.

Quote:

Cédric ANDREOLLI (Intel) wrote:

I probably miss explained.

I was of course talking about the number of running threads. You can do almost what ever you want but you have to respect the number of running threads. But that's the way this contest has always worked.

So you are saying that if we receive from command line number of threads=4 we are not allowed to create/run more than 4 threads in our program?

If nbthreads=4 you are not allowed to have more that 4 running threads at a time.

Hi,

how many threads are there on the "CompileScale5" please?

thanks!

Axel Shaïta, U.F.R Sciences Reims

6 on compileScale4
12 on compileScale5

Hi!

When can we expect bigger tests?

Thanks,
Andrei

I generated one for you: 288 Mo flights.txt, 5 000 000 flights, 100 airports, 500 airlines, 200 alliances, 50 vacation airports.
I haven't tested it yet, so there might be two or more flights with same cost, take_off_time, land_time, and discount from the same airport to the same airport, but as long as you don't compare it with reference solution it should work... ;)

If you want to make it smaller you can probably just remove flights from the file if your algorithm allows unreachable home/conference/vacations.
$ grep "1;" flights.txt > flights2.txt

Attachments: 

AttachmentSize
Downloadapplication/zip scenario13.zip63.69 MB
Rock the bits!

Hi!

Many thanks, will test it now and provide feedback! :D

Andrei

Hi slevin7,

Hardcore! Thanks.

Andrei

I generated one for you: 288 Mo flights.txt, 5 000 000 flights, 100 airports, 500 airlines, 200 alliances, 50 vacation airports.
I haven't tested it yet, so there might be two or more flights with same cost, take_off_time, land_time, and discount from the same airport to the same airport, but as long as you don't compare it with reference solution it should work... ;)

If you want to make it smaller you can probably just remove flights from the file if your algorithm allows unreachable home/conference/vacations.
$ grep "1;" flights.txt > flights2.txt

It's awesome! But why are you using incorrect format of dates?

We think you must to edit 00000000100000 to 01012012100000 for example else judge's program can't parse dates correctly.

As far as I tested it the date doesn't really matter, doing it like you suggested is just a nicer way of putting it. I am still using the reference programs parse and output functions, so it didn't make any difference.
I used this format, because just the last 6 digits significance is descending towards the end and therefore the numbers are easier to generate.
You can replace the leading zeros by correct dates and upload the file again (with an editor that can handle large files, or simply use sed(1)).
I also just noticed that the vacation_time_max is actually greater than departure_time_min and therefore he can never be on vacation for the maximum amount of time. Which may cause problems for some algorithms. But that's easy to change in the parameters of script.sh.
If it doesn't make it über complex I might edit the script to use correct values for the days and stuff in a later version (maybe producing files that contain > 10 Mio flights which would be as many as there are in one year in the real world, use correct values for the dates and nicer city/airline names) but I think it's just cosmetics...

Rock the bits!

Hello,
Is it possible to obtain the data/validation programs used for ComplieUnitTest ? I can't go beyond this step and I don't know why...

No. The output is compared with the solution using diff(it compares byte-by-byte) and you should not adapt your program to the solution. But you can ask Cédric to look into the diff. Maybe it's just a tiny loss of precision that makes your fligths cost a little less or more than they should.

Rock the bits!

I don't want to adapt my program to any solution. I just would like to be able to understand why I have a segmentation fault on a set of data wheras my program works perfectly with all the example scenarios provided. I'm not asking for CompileScale 1 to 5. Just CompileUnitTest.
It is very frustrating to work 3 weeks full time on a program just to see your code refused for a reason you can't know.

Most probably Cedric will help you with that test, so either wait until he come to this topic, or write an email to him personally. As he mentioned several times, the goal of contest is to improve performance, so organizers will help you with such problems wherever it's possible.

Hello Cyrille,
I will give you his email address, even if he will certainly contact you because of your problem :
cedric.andreolli@intel.com

Best regards,

Timothé Viot,
Engineer student, Insa Rennes 1
France

Thanks everyone ! Sorry for the loss of tamper, it is always very frustrating when you can't see the output of your program.
Cedric has been already helping me and have been very helpfull. I posted on teh forum though because I know that may people are also breaking their teeth on the first stage of the benchmark and coud also really use the data to figure out what's wrong.

I answered you by email.

Hi,
I have an other question.

I'm just opening the newinput3 and I see that in flights.txt we have a long hexadecimal with 64 digits.
Is there a limit for these id or will we have to consider an unlimited id size ? I'm pretty sure that yes, but I would like to have a confirmation... x)

Regards,

Timothé Viot,
Engineer student, Insa Rennes 1
France

Quote:

Tim wrote:

Hi,
I have an other question.

I'm just opening the newinput3 and I see that in flights.txt we have a long hexadecimal with 64 digits.
Is there a limit for these id or will we have to consider an unlimited id size ? I'm pretty sure that yes, but I would like to have a confirmation... x)

Judges' program use string for id. We think you have to do the same way.

But if they ensure, that it will only contain hexadecimal characters you can save RAM because you only need to store 4 bits instead of 8. So on 20 000 000 flights you save effectively 4 * 8 * 20Mio bytes. If you're short on RAM it might matter.

Rock the bits!

You should use strings. We don't know right now what we will use for the last benchmarks.

And about that, I will add new benchmarks next monday or tuesday.

Will new benchmarks be added in next few hours, or only tomorrow? I can't decide either to wait or to go sleep =)

Go to sleep :)

I will try to generate them and add them tomorrow (around this time)

What is about new scenarios?

Hi,

I finally (probably) finished serial algorithm, but there is one problem. In newinput3, in play_hard solution for MEXICO CITY, the cheapest trip is:

“Play Hard” Proposition 3 : MEXICO CITY
Price : 2026.01
Cheap Airlines-7a3e7508126da742f5fa409d24a20a05bd03804c88cf3d5a79e376dce106e972-LONDON HEATHROW (10/14 20h45min)/MIAMI (10/15 5h40min)-474.37$-70%
Cheap Airlines-9b72bdf6539d2a5be93354cd54d81041293d526435f1b2d5fbcd80bb27a22f31-MIAMI (10/15 8h0min)/GUATEMALA (10/15 10h2min)-132.47$-70%
MEXICO Airlines-b07b02877ee070d1b3a36dafa1ac110631504341aee2395df9f086a8a8b6abaa-GUATEMALA (10/15 19h25min)/MEXICO CITY (10/15 22h5min)-262.71$-70%
MEXICO Airlines-b07b02877ee070d1b3a36dafa1ac1106d06b6acba6e17a7b09d43f42069a6a5f-MEXICO CITY (11/1 20h20min)/GUATEMALA (11/1 23h0min)-268.56$-70%
Cheap Airlines-87dd4f1ae493cc2637e12a6d837f0ab2697b0569bdb2a156c09d17ce66dc0d9a-GUATEMALA (11/2 8h0min)/MIAMI (11/2 10h2min)-131$-80%
UNITED STATES Airlines-a7a11526753a4488110699260c84f58b0e8ccf5926a2a59bcc4de4ee77a5fe8a-MIAMI (11/2 18h20min)/BARCELONA (11/3 3h21min)-718.36$-80%
Cheap Airlines-a7a11526753a4488110699260c84f58b0846e263a43d74ce0084f391c4b0e11c-BARCELONA (11/3 10h0min)/NEW YORK JOHN F. KENNEDY (11/3 17h19min)-415.22$-70%
Cheap Airlines-579c5c46ddae1db6a75e976096ce2e1e2a099e2e4d78071e587aec8c520f4fe7-NEW YORK JOHN F. KENNEDY (11/11 18h35min)/LONDON HEATHROW (11/12 1h32min)-370.27$-70%

But, my program got:
Price: 2016.26
Cheap Airlines-7a3e7508126da742f5fa409d24a20a05e4a54d03c181c10fdd10d8da74163e14-LONDON HEATHROW (10 15 2012 20 45 00) //MIAMI (10 16 2012 05 40 00)-472.67$-70%
Cheap Airlines-87dd4f1ae493cc2637e12a6d837f0ab24ef3720eabd9475e32c4c3b75f3c3426-MIAMI (10 16 2012 08 00 00) //GUATEMALA (10 16 2012 10 02 00)-137.37$-70%
MEXICO Airlines-b07b02877ee070d1b3a36dafa1ac11064b145ca25775ebc6761088a5f7e77e62-GUATEMALA (10 16 2012 19 25 00) //MEXICO CITY (10 16 2012 22 05 00)-247.63$-80%
Cheap Airlines-b07b02877ee070d1b3a36dafa1ac1106bf5fc11920229743ced9ca3f2c3c71d9-MEXICO CITY (11 01 2012 06 15 00) //GUATEMALA (11 01 2012 08 55 00)-157.04$-80%
UNITED STATES Airlines-9b72bdf6539d2a5be93354cd54d810410b806f2fd4972c064b4cee5e4be8d89e-GUATEMALA (11 01 2012 09 00 00) //NEWARK (11 01 2012 13 07 00)-332.21$-80%
SPAIN Airlines-a7a11526753a4488110699260c84f58bafc1b7e3e6036d0c3f286f8a7f0ecab0-NEWARK (11 01 2012 19 05 00) //BARCELONA (11 02 2012 02 25 00)-562.15$-80%
Cheap Airlines-a7a11526753a4488110699260c84f58b0df029f17553eaa0d003051c3ba3fc02-BARCELONA (11 02 2012 11 10 00) //NEW YORK JOHN F. KENNEDY (11 02 2012 18 29 00)-415.46$-70%
Cheap Airlines-579c5c46ddae1db6a75e976096ce2e1e2a099e2e4d78071e587aec8c520f4fe7-NEW YORK JOHN F. KENNEDY (11 11 2012 18 35 00) //LONDON HEATHROW (11 12 2012 01 32 00)-370.27$-70%

Parameters are from given script.txt:
time ./run -nb_threads 6 -from LONDON\ HEATHROW -to NEW\ YORK\ JOHN\ F\.\ KENNEDY -departure_time_min 10252012000000 -departure_time_max 11042012000000 -arrival_time_min 11052012000000 -arrival_time_max 11142012000000 -max_layover 34400 -vacation_time_min 700000 -vacation_time_max 904800 -vacation_airports SEATTLE NASHVILLE MEXICO\ CITY MIAMI ISTANBUL ATLANTA GUATEMALA SPOKANE\ INTL -flights flights.txt -alliances alliances.txt -work_hard_file work_hard.txt -play_hard_file play_hard.txt

I manually checked that flights exist in flights.txt, and that it satisfies restrictions (discounts, times, etc).

Maybe I'm tired and missed something, so I'll check tomorrow. Until then, what to do if this is actually cheapest flight?

I think I figured out what your problem is.

I let you check but as the minimum vacation time is 700000 seconds, it's equal to 8 days, 9h, 33mn, 20s.
If you substract this amount of time to the 10252012000000 , you got 10/16/2012 21h33mn20s. So you can't be in a plane after this time. It seams that you arrive in Mexico at 10 16 2012 22 05 00. It seems to be too late.

I'm maybe wrong because I checked manually.

I'm still trying to generate a new benchmark but it's quite complicated to generate a benchmark that fit in time with the best solutions. For the moment, even the best solutions can't give a response on the files I'm working on.

I will try to post the new benchmark tomorrow because it's quite late now.

Quote:

Cédric ANDREOLLI (Intel) wrote:

I think I figured out what your problem is.

I let you check but as the minimum vacation time is 700000 seconds, it's equal to 8 days, 9h, 33mn, 20s.
If you substract this amount of time to the 10252012000000 , you got 10/16/2012 21h33mn20s. So you can't be in a plane after this time. It seams that you arrive in Mexico at 10 16 2012 22 05 00. It seems to be too late.

I'm maybe wrong because I checked manually.

I'm still trying to generate a new benchmark but it's quite complicated to generate a benchmark that fit in time with the best solutions. For the moment, even the best solutions can't give a response on the files I'm working on.

I will try to post the new benchmark tomorrow because it's quite late now.

Hmmm, I just tried with epochToSeconds ( original function from given source, just renamed).

cout << epochToSeconds("10252012000000") << " " << epochToSeconds("10252012000000") - epochToSeconds("10162012220500") << endl;

gives:
1351123200 698100.

Just checked manually, 700 000s = 8d * 24*60*60 + 2h * 60*60 + 26m * 60 + 40s.

Anyway, I'll recheck tomorrow my program.

Well I copied the wrong part of my sheet, 700 000s = 8d * 24*60*60 + 2h * 60*60 + 26m * 60 + 40s is what I got too.

So the rest of my post is still valid. And your test shows it. 698100 is not enough, it should be > 700000.

Quote:

Cédric ANDREOLLI (Intel) wrote:

Well I copied the wrong part of my sheet, 700 000s = 8d * 24*60*60 + 2h * 60*60 + 26m * 60 + 40s is what I got too.

So the rest of my post is still valid. And your test shows it. 698100 is not enough, it should be > 700000.


Yes, you're right. Since condition for comparing if the flight in good interval is simple, I've found error pretty quickly, but it's last thing that I would suspect to.

For some reason, epochToSeconds() return me sometimes unlogic values. For example, when I do cout<< epochToSeconds(date) << epochToSeconds(date), they are two different values :/. Does it have any relation with cygwin, because I'm on windows?

Should we expect new test today?

A new benchmark has been added.

It's quite bigger than the previous files (117mo). The timeout is set to 200 seconds.

If you are having some unexpected results, contact me on cedric.andreolli@intel.com

Hi,

how many threads are there on the new benchmark please?

thanks!

Axel Shaïta, U.F.R Sciences Reims

12 threads

Quote:

Cédric ANDREOLLI (Intel) wrote:

12 threads


Hi,

is always number of hardware threads(cores) on cpu assigned to our process less than number of logical threads?

For the current benchmarks yes. We can't use the full machine right now for a single submission.

This will probably test that later.

Can that be the reason for CPU usages like these?

CompileScale6 … (CPU usage:97%)
CompileScale5 … (CPU usage:84%)
CompileScale4 … (CPU usage:87%)
CompileScale3 … (CPU usage:57%)
CompileScale2 … (CPU usage:29%)

(All singlethreaded)

Rock the bits!

Probably it's because ratio parsing_time/computing_time is decreasing. And while you parse, CPU isn't fully loaded.

Quote:

eviltosha wrote:

Probably it's because ratio parsing_time/computing_time is decreasing. And while you parse, CPU isn't fully loaded.

We tried to submit only reading program and had "CPU usage:85%" :) You can repeat this experiment.

It probably depends what you do when you read data.

Loading the file through DMA don't use the CPU. Parsing the file is quite different cause you evaluate conditions and in this case, you use the CPU.

So 85% of CPU while just reading the file seems quite normal regarding what you really do and how you load the file into central memory.

Yeah, I first map the file to memory, then I parse it. But I didn't expect that it takes so long to map the file.

Rock the bits!

Hello,

In previous competitions, mapping the file into memory proved to be the among the fastest solutions, so I don't think you'll get better results with other methods.

Best regards,
Nenad

Cedric, could you please generate new input file with size similar to CompileScale6 (or even bigger)? It's hard to test efficiency of optimization, when the same code differs in run time about 10% from run to run.

Yes new input files with that size would be very great. We stuck in testing, too.

scenario14: 10 000 000 Flights, >700 Mo
scenario15: 2 500 000 Flights, 177 Mo
scenario15: 1 750 000 Flights, 124 Mo (benchmark 6 is about 128 Mo I think)

This time with correct timestamps, all flights in 2013!
I noticed that the scenarios I generate take a little longer (10 times as long) to run, even though they have about the same sizes as the provided ones.

If you want smaller or bigger ones let me know ;)

Attachments: 

AttachmentSize
Downloadapplication/zip scenario14.zip172.4 MB
Downloadapplication/zip scenario15.zip43.06 MB
Downloadapplication/zip scenario16.zip30.12 MB
Rock the bits!

I also uploaded a file with a lot of flights if you want to try it. I don't provide the call this time but you should be able to find some interesting ones.

ful zip added

Quote:

Heye (aka. slevin7) wrote:

scenario14: 10 000 000 Flights, >700 Mo

Thanks, but we have a problem while opening scenario14.zip. The archive is damaged.

Pages

Leave a Comment

Please sign in to add a comment. Not a member? Join today