My application contains a spare linear system solver which relies
heavily on the fabs() function. One of the biggest surprises I
have gotten from VTune on Linux is how significant a fraction of
my runtime is spent in __fabs. In a recent profile run I found
that 1 billion calls were made to this function with a Self Time
of 126 million. This is about as much time as is being spent
doing other things in the linear solve, which I find outrageous.
I should point out that I have profiled this code on many other
platforms, e.g. SGI, and __fabs has never popped on my radar.
My only recollection is that a colleague once told me that he
had seen fabs show up on an NT, but I don't remember how bad it
I am using GCC 3.2 and my app is C++.
Can anyone help me understand this? Why is __fabs even showing
up as a function call? Would it be unreasonable to assume that
this could be in-lined? If my NT recollection is correct, then
this raises the question whether this is an x86 thing. Do MIPS
chips have some kind of fabs hardware which Intel lacks.
Thanks for any tips.