ecc -tpp (target processor) option

ecc -tpp (target processor) option

We noticed that -ttp2 (target Itanium 2 processor) is the default for the ecc and efc compilers on all Itanium-based systems. We are trying to decide whether to change the default to -tpp1 on our Itanium 1 cluster, by adding it to the configuration files (efc.cfg , ecc.cfg).

Is there any empirical data that show the -tpp1 option typically produces faster code for the Itanium 1 processor? Also, is there data that show the -tpp1 code will run slower on the Itanium 2 processor?

Dave McWilliams
National Center for Supercomputing Applications
University of Illinois

2 帖子 / 0 全新

There's no doubt about it. -tpp2 schedules pipelined loops with lower latencies than the Itanium 1 can support. For example, most floating point reads from L2 cache would stall the pipeline. Also, when the -tpp2 code attempts more memory access per cycle than Itanium 1 supports, you have a pipeline stall which the compiler didn't expect. I've run a few tests that way, and switching to -tpp1 typically doubled the speed of pipelined loops. Since I had to move my office, and the value of the Itanium 1 didn't justify the freight, I could not longer try that.

-tpp1 code sometimes exceeds the speed of -tpp2 code when run on the Itanium2. Where this happens, it may indicate either bank stalls or L2 misses, where the extra latency allowance of the -tpp1 scheduling happens to be enough to alleviate the situation. The compiler team works hard to fix situations like that. Also, -tpp1 doesn't attempt to take advantage of the additional bandwidth of the Itanium2. There have been studies of this; there were a significant number of applications (many of those where Itanium 1 performance is fairly good) where -tpp2 was less than 15% faster than -tpp1. The interest in this has waned, and that figure may no longer be valid.