LIfe Sciences

ABySS for Intel® Xeon® Processors

Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, a de novo, parallel, paired-end sequence assembler - ABySS (Assembly By Short Sequences), was designed and developed for short reads. The single-node version is useful for assembling genomes up to 100 Mbases in size. There is also a parallel version of ABySS implemented using MPI and capable of assembling larger genomes. The script abyss-pe will run a more comprehensive set of tools to process paired-end data.
  • Linux*
  • C/C++
  • Средний
  • LIfe Sciences
  • DNA Sequencing
  • Кластерные вычисления
  • Параллельные вычисления
  • The switch() statement isn't really evil, right?

    In my current position, I work to optimize and parallelize codes that deal with genomic data, e.g., DNA, RNA, proteins, etc. To be universally available, many of the input files holding DNA samples (called reads) are text files full of the characters 'A', 'C', 'G', and 'T'.

    Intel® Summary Statistics Library: how to detect outliers in datasets?

    Earlier I computed various statistical estimates like mean or variance-covariance matrix using Intel® Summary Statistics Library. In those cases I knew for sure that my datasets did not contain “bad” observations (points which do not belong to the distribution which I observed) or outliers. However, in some cases we need to deal with datasets which are contaminated with outliers.

    Подписаться на LIfe Sciences