Any plans to inline LSAME ?

Any plans to inline LSAME ?

This issue got to do mostly with small size problems, in the case I encountered 70% of the time of DTRSM (with a 3x3 matrix with 36 rhs vectors to solve) was spent in the input error checking, but probably affect most mkl functions for small problems .

The input error checking code call LSAME many times with different parameters, so the branch predictor has no chance. Inlineing it would prevent that, and probably it can be made to have no conditional branches at all, since all the non standard ascii coding testing can be eliminated.

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

You are welcome to compile with the public source code of those functions you choose, and experiment with in-line and the like. In-lining itself isn't likely to help branch prediction. Straightening out your favored paths by PGO or simply using your knowledge of branches taken in your cases may help. BLAS isn't generally suited to high performance with such small matrices.

3x3 is not the typical size to use in blas, but that what PARDISO is using, so this issue got some relevance for some MKL users. Since most of these functions are called with the same input type in many cases, inlineing will have 100% correct prediction for this code, without match effort

nounit = lsame_(diag, "N");

upper = lsame_(uplo, "U");

info = 0;

if (! lside && ! lsame_(side, "R")) {

info = 1;

} else if (! upper && ! lsame_(uplo, "L")) {

info = 2;

} else if (! lsame_(transa, "N") && ! lsame_(transa, "T")

&&! lsame_(transa, "C"))

{

info = 3;

} else if (! lsame_(diag, "U") && ! lsame_(diag, "N")) {

info = 4;

} else if (*m < 0) {

info = 5;

} else if (*n < 0) {

info = 6;

I think it worth considering

OK, you're looking for dead branch code elimination. I suspect it won't happen in this case, even with in-lining. You could easily find out, by comparing with a version which you simplify manually. Then, if you believe it's important for the compiler to improve its optimization, you could file a problem report/feature request. Profile guided optimization ought to accomplish the job, but you and I probably agree it's not the cleanest way in this situation.

Of course 3x3 matrices were not in the thinking of the developers of LAPACK and because of the generality of the software there needs to be quite a bit of parameter evaluation. It's not surprisingthat you are spending more time in lsame than in the computations.

I have to agree with Tim on the limits of what inlining can do.

If the processing is always the same (i.e., the results of lsame is always the same) you might want to strip out all the unnecessary parts of dtrsm and compile it for your application.

Bruce

Message Edited by bsgreer on 11-30-2005 03:44 PM

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen