May 6, 2009 2:25 PM PDT
Final info about keysearch file format
Hi all,
I read tons of information about file format but I don't find an answer. Correct me if I'm wrong but in that contest, we don't count the input database file reading. We only count the "keysearch" file reading, the search, and the "result" file outputing.
My question: is the keysearch.txt file a "windows like" or "linux like" file? I mean, is each file line exactly 17 bytes (15 char key + 0x0d + 0x0a) or 16bytes (15 char key + 0x0d)?
So if I want to do windows contest submission, could I rely on a 16 or 17 bytes per line in the "keysearch.txt" file?
I read tons of information about file format but I don't find an answer. Correct me if I'm wrong but in that contest, we don't count the input database file reading. We only count the "keysearch" file reading, the search, and the "result" file outputing.
My question: is the keysearch.txt file a "windows like" or "linux like" file? I mean, is each file line exactly 17 bytes (15 char key + 0x0d + 0x0a) or 16bytes (15 char key + 0x0d)?
So if I want to do windows contest submission, could I rely on a 16 or 17 bytes per line in the "keysearch.txt" file?
Thanks in advance Arnaud
You should assume that the input data files were generated with a text editor or a program compiled and run on the target OS. For a windows submission, I guess that would mean 17 bytes per line.
You should assume that the input data files were generated with a text editor or a program compiled and run on the target OS. For a windows submission, I guess that would mean 17 bytes per line.
--clay
Thanks Clay. So now it's sure linux entries will have a great advantage on windows entry (because of 16 bytes per line). In this challenge, if the files are big (and they should if you want to do "almost" 10 seconds processing), then the reading and writing result file is not small. (about the same time as the search in my first try, with a test file of 1000000 entry and 300000 search keys).
Too bad linux competitors will have such a great advantage over windows one! :-( (they can read the keysearch file with some large binary chunk and just patch "0" byte at the end of each string)
Thanks Clay. So now it's sure linux entries will have a great advantage on windows entry (because of 16 bytes per line). In this challenge, if the files are big (and they should if you want to do "almost" 10 seconds processing), then the reading and writing result file is not small. (about the same time as the search in my first try, with a test file of 1000000 entry and 300000 search keys).
Too bad linux competitors will have such a great advantage over windows one! :-( (they can read the keysearch file with some large binary chunk and just patch "0" byte at the end of each string)
Please correct me if I'm wrong. Here's the way I understand this: What you're saying is that linux users can read the input file into memory in large chunks, thus, cutting down on the input time, then replace every 16th byte with '\0' and use a formula like i*16 to get the ith search string. But I fail to see why you can't do the same on a windows machine using i*17 instead...
Please correct me if I'm wrong. Here's the way I understand this: What you're saying is that linux users can read the input file into memory in large chunks, thus, cutting down on the input time, then replace every 16th byte with '\0' and use a formula like i*16 to get the ith search string. But I fail to see why you can't do the same on a windows machine using i*17 instead...
Well, 16 will easily align into a cache line since it is a multiple of 4 (or 8), while 17 would end up acrossing two cache lines... I guess.
Please correct me if I'm wrong. Here's the way I understand this: What you're saying is that linux users can read the input file into memory in large chunks, thus, cutting down on the input time, then replace every 16th byte with '\0' and use a formula like i*16 to get the ith search string. But I fail to see why you can't do the same on a windows machine using i*17 instead...
Please correct me if I'm wrong. Here's the way I understand this: What you're saying is that linux users can read the input file into memory in large chunks, thus, cutting down on the input time, then replace every 16th byte with '\0' and use a formula like i*16 to get the ith search string. But I fail to see why you can't do the same on a windows machine using i*17 instead...
For windows users input files will be slightly bigger - additional 64MB for 1GB input file. Also linux users may use single MOVNTDQ to read input data, while windows users will have to use 2 unaligned 64-bit MOVs.
For windows users input files will be slightly bigger - additional 64MB for 1GB input file. Also linux users may use single MOVNTDQ to read input data, while windows users will have to use 2 unaligned 64-bit MOVs.
Appreciate the response Dmitriy. Although, I'm afraid, that's ASM lingo that I don't quite understand (yet)... :)
Goes to show I still have a long way to go when it comes to optimizing code. I'll definitely google MOVNTDQ and respond if I can figure it out.
This competition is turning out to be quite a learning experience. :)
Appreciate the response Dmitriy. Although, I'm afraid, that's ASM lingo that I don't quite understand (yet)... :)
Goes to show I still have a long way to go when it comes to optimizing code. I'll definitely google MOVNTDQ and respond if I can figure it out.
This competition is turning out to be quite a learning experience. :)
If I understand this right, MOVNTDQ is used to move 16 bytes from memory to an XMM register and vice-versa. I guess this would come into the picture while performing comparisons (please feel free to correct me). Theoretically, this should be possible irrespective of the operating system so long as the compiler supports SSE2. On windows, even if a line is 17 bytes long, the strings will still be 16 bytes long (including the null terminating character).
My question, I guess, is why would a windows based solution require 2 instructions? Does the vc++ compiler not support SSE2?
If I understand this right, MOVNTDQ is used to move 16 bytes from memory to an XMM register and vice-versa. I guess this would come into the picture while performing comparisons (please feel free to correct me). Theoretically, this should be possible irrespective of the operating system so long as the compiler supports SSE2. On windows, even if a line is 17 bytes long, the strings will still be 16 bytes long (including the null terminating character).
My question, I guess, is why would a windows based solution require 2 instructions? Does the vc++ compiler not support SSE2?
MSVC does support SSE2. And there is always an option to use asm, however MSVC does not support asm in 64-bit mode, though ICC does support. But that's not the point, the point is that aligned moves are always faster, and on Windows platform the data is not aligned.
the point is that aligned moves are always faster, and on Windows platform the data is not aligned.
Hi Dmitriv. Totatlly agree with you, aligned data are faster (and so there is two different opcodes, load aligned and load unaligned).
Btw did you read a paper about Core I7, saying that now (on core i7) unaligned read is as fast as aligned read. I read this on a kind of "marketing" powerpoint. As I read somewhere else that thread-contest entries are run on a coreI7 machine, maybe the 17bytes windows alignment is not a problem.
What's your opinion? Do you think it's only "marketing" stuff (i can't believe un-aligned is as fast as aligned myself)
Hi Dmitriv. Totatlly agree with you, aligned data are faster (and so there is two different opcodes, load aligned and load unaligned).
Btw did you read a paper about Core I7, saying that now (on core i7) unaligned read is as fast as aligned read. I read this on a kind of "marketing" powerpoint. As I read somewhere else that thread-contest entries are run on a coreI7 machine, maybe the 17bytes windows alignment is not a problem.
What's your opinion? Do you think it's only "marketing" stuff (i can't believe un-aligned is as fast as aligned myself)
Arnaud
It's only a marketing play on words. They say "unaligned accesses are as fast as aligned", but mean "unaligned accesses to aligned locations are as fast as aligned". Google gives a plenty of references by "i7 aligned unaligned".
My question: is the keysearch.txt file a "windows like" or "linux like" file? I mean, is each file line exactly 17 bytes (15 char key + 0x0d + 0x0a) or 16bytes (15 char key + 0x0d)?
So if I want to do windows contest submission, could I rely on a 16 or 17 bytes per line in the "keysearch.txt" file?
Btw, Clay, what about providing an option for contestant to say "I am ready and prefer to read unix line endings", in which you case you will provide files with unix line endings as input. IMHO it would be nice (fair?).
Btw, Clay, what about providing an option for contestant to say "I am ready and prefer to read unix line endings", in which you case you will provide files with unix line endings as input. IMHO it would be nice (fair?).
Yes, if there are instructions included in a README file, in the entry text file or some other obvious place (not the write-up) that requests both input files have UNIX/Linux end of lines, the judges will accomodate that request.
(This is what I get for trying to be clever without thinking things through. I should have specified 18- or 13-bit keys.)
Yes, if there are instructions included in a README file, in the entry text file or some other obvious place (not the write-up) that requests both input files have UNIX/Linux end of lines, the judges will accomodate that request.
(This is what I get for trying to be clever without thinking things through. I should have specified 18- or 13-bit keys.)
--clay
thanks clay this is a nice feature !
btw I suppose we could generated "unix" style file at output? Am I right?
Seriously ... assembly ... do you realize the sort of choking cough I would develop trying to dust off my copy of MASM ?!
Have you heard something about compiler intrinsics? Here is what one may do with ICC/MSVC compiler intrinsics (GCC has them in slightly different form):
Have you heard something about compiler intrinsics? Here is what one may do with ICC/MSVC compiler intrinsics (GCC has them in slightly different form):
I refuse to use any compiler feature that is actually easier to understand when written in assembly :-D
The problem I have with ICC is it awful at compiling my C#.
Last year I used C/C++ for the contest and by its end I was ready to chew my own fingers off to stop the pain :-D
This year I'm using C# and while I don't expect it to really compete with highly optimized C (or C++ or C++ w/intrinsics or C/C++ w/assembly) I am curious to see how it compares with such solutions. I'm also enjoying learning the MS (beta) parallel library. My goal is learning and hoping that I can get my solutions pretty close in speed to the solutions that are "cut closer to the bone".
I refuse to use any compiler feature that is actually easier to understand when written in assembly :-D
IMVHO, intrinsics is not one of such features. Intrinsics is a nice way to connect C/C++ and assembly. Though if one need full control, assembly is still the only answer.
The problem I have with ICC is it awful at compiling my C#.
ICC even worser in compiling your C# than MASM? I've mentioned ICC only because you've mentioned MASM. I've only wanted to say that today you don't have to use MASM to get access to asm.
Last year I used C/C++ for the contest and by its end I was ready to chew my own fingers off to stop the pain :-D
This year I'm using C# and while I don't expect it to really compete with highly optimized C (or C++ or C++ w/intrinsics or C/C++ w/assembly) I am curious to see how it compares with such solutions. I'm also enjoying learning the MS (beta) parallel library. My goal is learning and hoping that I can get my solutions pretty close in speed to the solutions that are "cut closer to the bone".
Btw, have you considered usage of C++/CLI? It's a natural choice for such type of task. It will allow you to get advantages of both worlds. For example you may write high-level code which uses TPL/PLINQ in "pure CLI" (CLI subset of C++/CLI, or possibly C# or F#), then intermidiate C++/CLI level will pin down pointers, convert data types, etc; then pure C/C++ code will schred cabbage with a help of templates, asm, intrinsics, Intel auto-parallelization auto-vectorizing processor-model aware C++ compiler, etc. Last level may be added on-demand if you will have time/desire/idea/need.
IMVHO, intrinsics is not one of such features. Intrinsics is a nice way to connect C/C++ and assembly. Though if one need full control, assembly is still the only answer.
The problem I have with ICC is it awful at compiling my C#.
ICC even worser in compiling your C# than MASM? I've mentioned ICC only because you've mentioned MASM. I've only wanted to say that today you don't have to use MASM to get access to asm.
Last year I used C/C++ for the contest and by its end I was ready to chew my own fingers off to stop the pain :-D
This year I'm using C# and while I don't expect it to really compete with highly optimized C (or C++ or C++ w/intrinsics or C/C++ w/assembly) I am curious to see how it compares with such solutions. I'm also enjoying learning the MS (beta) parallel library. My goal is learning and hoping that I can get my solutions pretty close in speed to the solutions that are "cut closer to the bone".
Btw, have you considered usage of C++/CLI? It's a natural choice for such type of task. It will allow you to get advantages of both worlds. For example you may write high-level code which uses TPL/PLINQ in "pure CLI" (CLI subset of C++/CLI, or possibly C# or F#), then intermidiate C++/CLI level will pin down pointers, convert data types, etc; then pure C/C++ code will schred cabbage with a help of templates, asm, intrinsics, Intel auto-parallelization auto-vectorizing processor-model aware C++ compiler, etc. Last level may be added on-demand if you will have time/desire/idea/need.
Well let me say that my post was meant to be "tongue in cheek" and hope that it was received as such.
I have no real issues with intrinsics or ICC or MASM. I don't like C++ because it's become a rather strange and disordered dumping ground of features. I swear you can put 'const' between every compiler token and it will compile (*joking*). If it were a language that I used all the time, I'm sure I would feel more forgiving toward it, but I don't. I have thought about C++/CLI (still C++ syntax) and would love to find more time to play about with F# (immutable data great for threading, not so much for performance/memory). I have little doubt that good C (or great C++) can be coded much closer to the CPU bone than any managed language.
There is an old adage:
There is no design problem that can't be fixed by adding a layer of abstraction ... and no performance problem that can't be fixed by removing a layer of abstraction.