Not all arithmetic operations take the same amount of time. Divides and square roots, both performed by the DIV unit at execution port 1, take considerably longer than integer or floating point addition, subtraction, or multiplication.
The DIV unit is active for a significant portion of execution time. Locate the hot long-latency operation(s) and try to eliminate them. For example, if dividing by a constant, consider replacing the divide by a product of the inverse of the constant. If dividing an integer, see whether it is possible to right-shift instead.