The fastest input

The fastest input

Hi,I was thinking about what is the fastest input.I am using C++ and I use CIN the most of the time, but sometimes I use SCANF because a noticed that it is much faster than the cin command.Now I am interested if there is any fastert solution than this two...I was also wondering if it is possible to parallelize the input of one file?

19 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi :)
You can map files directly into memory.
Here's a short reference, in a book (it's about the mmap() function):
http://my.safaribooksonline.com/book/operating-systems-and-server-admini...

Or on wikipedia:
http://en.wikipedia.org/wiki/Mmap

Let me know if it helps you :)

Regards,
andrei

Thank you a lot :DI will test this 3 metods using large files (up to 100 MB) and I willl say you
the resoults ;)

There are numerous ways of reading the input (cin, scanf, fread, mmap, etc.). Tou should explore their capabilities and test them find out which one best suits your needs. I don't think anyone can tell you the BEST way.

Best regards,
Nenad

thx :Di have never heard for fread before...but does anyone have an idea how to parallelie the input of a single file?

thx :Di have never heard for fread before...but does anyone have an idea how to parallelie the input of a single file?

I think it's easier to parallelize with nmap, because you can read from multiple places easily (as if the file was in ram).

can you plizz explain that?the only way of parallelizing the input that i noticed is to read all the files in a for loop and than parallelize the for loop, but it would be much better if there would be some chance to parallelize the input of only one file

Quoting v.kristijan
can you plizz explain that?the only way of parallelizing the input that i noticed is to read all the files in a for loop and than parallelize the for loop, but it would be much better if there would be some chance to parallelize the input of only one file

I think building one particular sequence is inherently a serial operation.

That said, if the file is mapped/loaded to memory and you have figured out where each sequence starts and ends, you might be able to parallelize the processing of each one.

thx nickraptis :D i unerstand now :D :D :D

You could also try reading blocks with fscanf. There is a thread on the "fscanf vs mmap" matter here: http://stackoverflow.com/questions/45972/mmap-vs-reading-blocks , and, as you can see opinions differ in which one is the best, but fscanf is clearly easier to implement.

I think that fread is a better choice than fscanf, it is simple like fscanf but faster.

I don't think that changing the input worth the effort. I've changed it to one using ifstream::read (that reads data in blocks) and there was no substantial gain whatsoever.

I first thought that there could be an big improvement in changing char-by-char reading to a data-block reading because, in matlab, changing this is a huge improvement but it aint the case in c++, at least no to this problem.

I think you re wrong , becouse i ve got a huge speed improvement when i have optimized my input ...I am using getc() and fgets() and my program reads 2 input files (homo_sapiens....intel and input.fa) for about 3000 milliseconds.. Fstream needs about 40 000 ms.On my machine, improvement is huge, but i didnt test it on Intel's clusters becouse i didnt manage to solve the L array problem :)

what i'm saying is that in THIS problem the reading is not a critical spot.
it is not that you can't have a good improvement in relation to the input time of the input used by the sample code, but that, when comparing the time for input with the time for comparing the sequences, the input time, is not that of a problem.

try doing this: use the input from here http://software.intel.com/fr-fr/forums/showthread.php?t=104659&o=a&s=lr on the original sample code, and you will be able to see that the time reading the sequences from the files is nothing compared to the one in the iteration over the sequences.

You should really take in consideration the time your program needs to parse the file. The given method is (I guess volontary) bad.There are already some threads on the forum about the reading process but you should take a look at fread or mmap. Those functions are really easy to use and can give real good results.It's really important to have in mind that each part of your algorithm that can be optimized must be optimized, including the reading part. :)

I agree with that, but you must use every optimization you can to improve your program's execution speed. I told you in my previous post that i ve got almost 12 times faster input reading . If nothing, that improves my work on a ,,gready " part of the sample code cause i dont have to wait for >40 seconds.

Have you tried to solve that problem and maybe tested speed of that program with large reference sequence?

Improving the file input makes sense once the program is fast and parallelized, because the file input can't be parallelized.

I think both bdrung and candreolli are right. You should leave your advanced input optimizations for the end when your core code is well optimized and parallelized because with large inputs and without good parallelization, the time needed for input is quite small compared to the time needed for sequence processing. Once you've inproved your processing code, input optimizations will be worth the time you spend on it.
On the other hand, the given method can be greatly improved so at least some essential optimizations are in order even before optimizing the rest of the program.

Best regards,
Nenad

Leave a Comment

Please sign in to add a comment. Not a member? Join today