Parallel I/O

Parallel I/O


About one week ago was posted a bigger test. The ref-seq was about 50-60 mb. It costs some time to read it amd put it in the buffer. What do you think about parallel I/O ? There are lots of possibilities with MPI, but what's about OpenMP, or Intel-stuff ? The data will be probably one one hard disk, so we don't have lots of options, I fear (because of only one head). What do you think? What do the organizers think? do You want us to do I/O parallel ?

8 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

You can parallelize the parsing of the file but, not the loading or maping into memory for what I'm aware of.It's probably possible to parallelize the reading on the current problem (we didn't find an interesting way to do it), but the main idea is :-> use mmap or fread for a quick load-> Once it's done, your file is accessible by a pointer, so you're free to do what you want (parallel parsing for example)

I think we can't parallelize read from file because there is only on data bus in the system

I'm sure it is not the problem. Most mustiproyessor systems have switches or ethernet or even infiniband. It is more the problem of one hard disc and what the wise man before you said. but i keep thinking about it.. =)

You could also do the I/O, in parallel with other processing, for instance: while the I/O for the second file is happening you'd already started processing the strings in the first position in the map.

The question about reading one file with parallel threads is, if it slows the process down. If we assume a drive that is not a SSD, multithreaded reading could lead to slower reading times because the disk has to jump from one location of the file to another location, because another thread wants to read from another position. That takes time and reading the file just sequentially from the disk would be faster. Of course, this problem does not arise, if the disks in the benchmark servers are SSDs. Do we have any information about this? If we have SSDs, I think that reading one file in parallel is a good idea.

Basti / Team Zubrowka (TU Munich)

From my point of view mapping the memory and accessing it in parallel is your best shot, ie letting the OS handle all the mess, but i doubt that you can improve HDD reads through access by multiple cores and also i doubt that SSDs are being used - overall parsing in parallel is a good approach if done right. I personally had little to no succes speeding up HDD I/O in a multicore environment.

Grigore, I think you are absolutely right. After doing some more research, I think parallel access is not a suitable option for speeding up I/O, because still the drive is the bottleneck in the whole construction. But if anyone manages to speed up I/O by parallelizing it, I would be interested in it (not only for the contest, I would also appreciate that information afterwards).

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui