Correctness checks

Correctness checks

Imagen de Cristian K.

It seems to me that the testing cluster does not include enough correctness checks. Effectively, a user could cheat and dramatically improve the performance of this application, at the expense of producing incorrect results in some corner cases. For example, for work_hard, one can merge the cheapest inbound flight with the cheapest outbound flight, without testing whether another combination might lead to a smaller total cost, due to discounts.

I have included a simple test-case in this post to illustrate this.

Regards,
Cristian.

AdjuntoTamaño
Descargar correctness1.zip1.09 KB
publicaciones de 11 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de eviltosha

I thought about creating similar topic but with more complicated checks, as I found some more clever (but no so effective) ways to cheat. Probably I'll add my tests later.

Imagen de Cristian K.

The above way of cheating can produce a 15 fold speed-up of CompileScale5. I know this, because I had a bug in my code, which would do something similar. :D Given the number of threads, this is more than one could gain by carefully parallelizing the code, as the contestants are supposed to.

Imagen de Tim

Yes, it won't work.
I made a mistake like that in order to improve the algorithm (but it was a more complicated one) and I think that with CompileScale4, it didn't work. Even newinput2, I think there is a problem.

It will be very hazardous to try things like that with big files. And it Will fail most of the cases (with big datas).
I can certify that it won't work and if you do that, you will lose the contest because some outputs you give are wrong. x)

Moreover, you have 30% between some flights and it is very complicated to cheat on that :
1 : A to B with Company X : 100
2 : B to A with Company Y : 200
3 : B to A with Company X : 250
Result : You take flight 1 and then flight 2 because without discounts it's better. But the best travel is flight 1 then 3.

It will even fail there. And in my mind, I think that newInput2 provide this kind of case (with Toronto, but I might be wrong and there is surely some more case where it will fail).

You can't ignore discount and I am pretty sure that if you do it, you will 100% fail the contest. But if there is some people who wants to try that, free to them. x)

I hope it answered ! ;)
Regards,

Edit : Okay, the case you sent was similar. But still a bad idea to do like that. ^^

Timothé Viot, Engineer student, Insa Rennes 1 France
Imagen de Cristian K.

Tim, I had a bug in my code, which passed all tests up to CompileScale5, but not the one I have posted above. The bug was a little bit more "discreet" than what I suggested. At any rate, anybody can make mistakes (not necessarily cheat). Hence, for insuring fairness among competitors, it is critical to set up more throughout correctness tests.

I would appreciate, if other people could post other, simple correctness tests.

Imagen de Tim

Yes, for sure, there is some more discreet errors you can do (i did not saw mine before CompileScale4) ! I didn't say that I suspected you to cheat, I did this kind of same mistake and this problem with discounts is very hard to treat because we sometimes don't think about all particular cases and it's normal.

But as we can't adapt our program because input are secret, we should maybe think that if we pass all CompileScaleX, we could say that our program is correct ! :)

Anyway, let's wait an official version of the answer ! ^^
I thought about a thing to be sure that it will fail if we miss something with discounts by having a very high expensive flight still less expensive with discounts than an other more expensive (Something like 100 000 for the first one and 150 000 for the other one but it seems complicated to do something like that with many inputs... so I will end my answer there because I haven't time to search for more complicated mistakes we could do)

But this kind of problem is still very interesting ! :)

Regards,

Timothé Viot, Engineer student, Insa Rennes 1 France
Imagen de eviltosha

It looks like you have never participated in contests (like TopCoder), where you get your scores after end of contest, accordingly to tests passed. There you can't test your solution on anything, provided by organizers, so you must test your code yourself. But in those contests judges will test your (and everyone's) code with almost every tricky scenario possible, and with many different structures of big inputs, so you can be sure that no cheating will be accepted. You can never be sure about correctness of your solution by passing 5-6 tests (or even 15-20).

So I think purpose of original post is very good. I hope organizers will consider suggested tests and won't accept cheating (or buggy) solutions.

Imagen de Cédric ANDREOLLI (Intel)

This is very risky to try to exploit a bug because you actually don't know what we will use for the final benchmarks.

I already wrote that the cluster is here to help you check the program scalability, not its correctness. You have to check the correctness on your side. We will check it on our side too but we can't afford using the cluster to check your program correctness.

Imagen de eviltosha

Here's another very simple test against finding cheapest set of flights on different parts of flight and then merging.

Adjuntos: 

AdjuntoTamaño
Descargar scenario00.tar.gz722 bytes
Imagen de eviltosha

And one more.

Adjuntos: 

AdjuntoTamaño
Descargar scenario00.tar.gz588 bytes
Imagen de Cristian K.

Thanks eviltosha!

Inicie sesión para dejar un comentario.