In my current position, I work to optimize and parallelize codes that deal with genomic data, e.g., DNA, RNA, proteins, etc. To be universally available, many of the input files holding DNA samples (called reads) are text files full of the characters 'A', 'C', 'G', and 'T'.
Earlier I computed various statistical estimates like mean or variance-covariance matrix using Intel® Summary Statistics Library. In those cases I knew for sure that my datasets did not contain “bad” observations (points which do not belong to the distribution which I observed) or outliers. However, in some cases we need to deal with datasets which are contaminated with outliers.