cblas_drotg vs dlartg vs dlartgp vs my own using sqrt

cblas_drotg vs dlartg vs dlartgp vs my own using sqrt

Hello,

I have an application where I use these primitives heavily to generate Givens rotations. In my experiments, using an AVX-enabled environment and using Intel MKL 11 beta update 2 I have observed the following points below. I was invoking these primitives hoping that MKL was doing something really smart and get a special speed up over a plain sqrt version, why is not that so? is there any documentation on number of flops or better cycles needed for these routines?

  • cblas_drotg leads to non-convergence of my algorithm (too many round errors) I haven't tried setting CBWR to COMPATIBLE though .. need to try that.
  • dlartg is slow
  • dlartgp is faster than dlartg I was actually puzzled by this, since I expected that dlartgp gives more guarantees namely positiveness of the diagonal elements.
  • my own plain sqrt version (see below) outperforms all above and has no errors and also gives positiveness of the diagonal elements (needed for updating a Cholesky decomposition => need positiveness of the trace i.e. eigenvalues to compute log of the trace).

my own:

inline void genrot_sqrt(double *x, double *y, double *c, double *s, double *d) {
    double h = sqrt((*x)*(*x) + (*y)*(*y));
    *c = (*x) / h;
    *s = (*y) / h;
}

TIA,
Best regards,
Giovanni Azua

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.