Challenge
Locate code in an assembly language listing that corresponds to a specific passage of source code. The importance of interpreting compiler-produced assembly language code is the fact that what may look like perfectly reasonable high-level code on the surface may translate to something quite unexpected: code that may execute slower than expected or code that is unnecessarily large. Such information can be found relatively easily in the associated assembly code.
Assembly-language listings produced by compilers often look confusing, and it is easy to be disoriented without the source code as a guide. Frequently, the only items recognizable from the text are the names of global procedures and variables. Consider the case of the small C language module, containing just one function, shown in the following source code for a simple checksum function:
|
By using the -S or command line switch, the compiler generates an assembly language listing of the C/C++ source code:
ecl -S checksum.c |
The assembly language file produced by the command line above has 128 lines of code, most of it standard startup declarations that were omitted for clarity. Notice that no optimization switches were used. The core assembly code for this function is shown in the following compiler-generated assembly language listing:
|
Solution
Analyze the assembly language in terms of global names and line numbers, placement of alloc calls and br.ret instructions, loop controls, register rotation and predication, and registers r32 and r8. The assembly code given in the Challenge section above demonstrates most of the important details involved in interpreting compiler output:
- Global Names & Line Numbers: Compilers usually do not preserve the names of local or static variables or function names in assembly-language listings, but they must generally keep global names of variables and functions so that other modules can have access to those global variables. Another way to locate code in an assembly-language listing is to use the line numbers at the end of comment fields. The compiler attempts to match each line of functional assembly code (as opposed to NOPs) to a line in the original source code. The following line, for instance, shows that the return value of the function is loaded into register r8 and correctly associates it with the tenth line of the source code, the return statement:
mov r8=r34 //0: 10 |
- alloc and br.ret: Like bookends, most assembly language procedures start with an alloc call to set up local and rotating registers, and end with a br.ret instruction to return to the caller. This is yet another route to locating code.
- Loop Controls: Most software loops are controlled by one of the following five branch completers:
ctop wtop cexit wexit cloop |
When looking for an inner loop, try looking for these completers. All but the cloop completer also control register rotation, as well as predication. Loops that run off a simple counter variable, like the count-- variable in the example, usually use the processor's loop counter register, ar.lc, to count the loop iterations. Therefore, the routine must save the old ar.lc value at startup, load it with a new value, and restore it before exit. There are usually at least three references to the ar.lc register in procedures with counted loops.
- Register Rotation and Predication: Register rotation can usually be detected by finding references to the epilogue count register (ar.ec), references to the rotating predicates (pr.rot), or use of any of the first four register rotation completers mentioned above. The compiler very frequently will use predicated register rotation on loops, as it did in the example. Predication and register rotation produce great efficiency by allowing the loop to be modulo scheduled without requiring separate prologue and epilogue code.
- Registers r32 and r8: Arguments passed to a procedure are generally loaded in contiguous registers starting at r32. In this example, the assembly code uses the two passed arguments in r32 and r33 to initialize itself. Returned values are usually passed in r8, as also demonstrated in the sample code.
Source
Recognizing Efficient Use of Caches in Code for the Itanium® Processor Family
