First of all well done on an excellent tool! I have just a couple of suggestions:
- My first comment is on the C vs. Assembly orientation of the tool. The documentation seems to suggest the use of IACA with Visual C, using the IACA_START/END macros. This is all well and good, however, although one can find the corresponding code markers for assembly code in the header file, this not documented in the manual. I think this should be added, given that the tool is just as useful (and arguably more so) for assembly code.
- On the examples, e.g. the throughput example in section 4.1, I think it would be useful if the original example code was available (I'm assuming it's .asm as I don't see how you could get this level of control over the instructions otherwise), or if not, at least some high level explanation of the program. E.g. it seems that the addresses of the input arrays are in rax and rcx, and the output address in rdx, but it takes a bit of poring over to figure out what's going on...