long double data types in BLAS

long double data types in BLAS

Hello. I wonder if there is a good reason to not include 'long double' data types in BLAS. I mean, in addition to:

real, single precision
complex, single precision
real, double precision
complex, double precision            

Why not include "real, long double precision" and "complex, long double precision" (as defined in ISO C99) equivalent functions? 

Many thanks.

Hector.

publicaciones de 5 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

You could read earlier posts on this subject.
You're welcome to build BLAS yourself with the compiler and data types of your choice. Hand optimization such as MKL provides would have little to offer for long double.
C99 says only that long double may be the same as double, as it is in Visual Studio. There are multiple versions of long double implemented in widely used linux compilers, so this seems to encompass a wider variety of cases than you imply.

Dear Tim,

Thank you very much for your useful and polite reply.

Quote:

TimP (Intel) wrote:
You could read earlier posts on this subject.

I could and I did. The most promising one was the next, but was not very helpful:
http://software.intel.com/en-us/forums/topic/285712

Did I miss another relevant post about this issue?

Quote:

TimP (Intel) wrote:
You're welcome to build BLAS yourself with the compiler and data types of your choice. Hand optimization such as MKL provides would have little to offer for long double.

Thanks, this is is very interesting. Then, could I modify and build MKL/BLAS to add extra and quad precision support?

Quote:

TimP (Intel) wrote:
C99 says only that long double may be the same as double, as it is in Visual Studio. There are multiple versions of long double implemented in widely used linux compilers, so this seems to encompass a wider variety of cases than you imply.

Of course, C99 says only that long double may be the same as double but, in my humble opinion, if a developer takes the effort to add the specifier 'long' he or she is asking for a higher precision than double, at least, most of the times, although a C99 compliant compiler is not supposed to be enforced to provide more than double precision. In the case of quadruple precision, I hardly believe that a developer could use this data type and accept that the compiler is 'downgrading' such demand to double precision.

Thanks again.
Hector

Hi Hector,

long double can mean different things to different people ... simply “double”on some systems, mapped to the 80-bit X87 floating point type by the Intel compilers on Linux/Windows when certain switches are applied, or even the IEEE Quad 128-bit floating point type. Presumably you're most interested in Quad BLAS support.

Quad support for BLAS is on our longer term to-do list, but quite low in priority. It is generally understood that as problems sizes grow and computational speeds increase the need for additional accuracy follows the same trend. The main challenge associated with a high performance (or MKL suitable) Quad BLAS support is essentially the speed of the underlying Quad basic floating point operations, which at this time are implemented in software because there is no direct hardware support for them. So the performance impact moving from double to Quad would likely be significant.

One thing that you could try is to port the Netlib Fortran BLAS implementation using a compiler that can automatically map doubles (or real*8) to real*16, and likewise for the complex types. That way you could see the performance implications for yourself. I don’t know if there is a corresponding implementation of the BLAS in C that would allow you to do a similar experiment with the C99 types.

Our current focus areas are optimizations for the latest/upcoming Xeon processors, optimizations for the new Intel Xeon® PhiTM coprocessor, and conditional numerical reproducibilty.

-Shane

Hi Shane,

Thank you very much indeed for your comments, suggestions and even roadmap of the optimizations.

After some preliminary testing with zgemm and zgemm3m I am afraid the precision achieved in the results is not enough for my specific application. As the performance is another goal (the final hardware target is a supercomputer), I cannot afford a generalized use of quad precision because the lack of hardware support that you pointed out. Anyway, I will try your suggestion but, in the worst case, I would develop a tailored solution.

Thanks,
Hector

Inicie sesión para dejar un comentario.