Pimped data generator

Pimped data generator


I have used the file generator in http://software.intel.com/fr-fr/forums/showthread.php?t=104505&o=a&s=lr as a basis to pimp it a bit for our needs.

I've attached the source code to this post.

The main differences are:
* only data is created randomly
* #sequences and #files are not random
* you can specify a prefix for data files instead of a directory
* you can specify the probability with which substrings from the reference string are copied
* lots of tiny clean-ups under the hood

Example call:

./datagen 500 4000 3 2 dataset1 0.6

* a ref sequence with 500 bases
* 2 files dataset1_input_0.txt and dataset1_input_1.txt with 3 sequences a 4000 bases each
* with a probability of 0.6 that substrings from the reference are used

Hope this proves as useful to people as candreolli's original code to us.


Fichier attachéTaille
Téléchargerapplication/zip pimped-datagen.zip3.62 Ko
4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Thanks for the improvments. Actually, I didn't want to spend too much time on this program (I will have my exams soon).Thanks for sharing back :)

Yes it was, thank you for sharing that code.
We have found it very helpfull to generate some coherent scenarios.

You're welcome :) (If I'm the target of your post ^^)

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui