greedy problem

greedy problem

Can anybody give some tips about solving greedy problem ? Hash maps is good becouse we can store multiple (in our case - the same) strings at one key, but how to know which one is the same? We must use **L orvector< vector > to mark letters. This problem is well known on the internet, but all algorithms are for small inputs, max legth = 10000. None of them, and none of their optimizations can handle string of 32Mb...Thanks :)

16 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

I also am right now trying to solve this problem... I would guess we need to find a way to only keep the information where the L matrix has numbers larger than a minimum.
Of course i cant tell my whole idea to make it :-)

I think so, but you must allocate memory for L (**L), and my program crashes there... It's a really big piece of memory :) We must avoid that somehow :/

Indeed, you don't need to store the result of the comparisons. There are way too much datas to memorize if you do something like that.For example, le 65mo file contains 126 000 000 characters. If the input sequence is 1000 characters, you will need :126 000 000 000 x 2 bytes to store the L array if you store your datas into char. 252Gb to store the array. That's just not possible :)You must find a way to get the expected result without saving thoses datas.

You can use a hash table, it is a faster solution

So don't allocate long arrays

instead of allocating 100 array of size 1000000, allocate 1000000 array of size 100.

the memory usage will be the same...

We obviously must use STL map which is muuuuuuuuch slower solution than an array. I have done it and my program with refseq.txt and input.txt lasts more than 4 seconds in release mode without storing zeros. Thats terrible...

Yes it will be the same, but you will not need all of this memory to be in sequence.

The STL map is a not the best solution, but you need to use adata structure where you can allocate it before using it.

instead of L you could use a hash , to store only the answers, that have at least minmatch size.

I do not see a big problem in using a data structure which you cannot allocate in advance. If you are using the STL vector, for example, I think it will allocate chunks of memory in advance so it does not have to allocate memory every time when you are appending a value. Anyways, if you are not extending your data structure too often, I think you will be fine with a more dynamic solution (actually, I hope that, because we are doing that ;-) ). Currently, we are accumulating our matching results to achieve less allocations.

A personal advice that may help you :If you cannot allocate enough memory, then do not use any! ;-)

Hi,

we are using a multiset, that stores only the positions of substrings in the sequences rather than the substrings themself.

Maybe this helps.

We are using combination of arrays and map because of the nature of our implementation... Map collects only correct solutions.

How did you solve the problem of exceeding space, if you are using arrays? I think you do not store all possible diagonals in the arrays, do you?

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui