DGEEV gives inconsistent results

DGEEV gives inconsistent results

Hi,

Ilink mkl 9.1.022from Visual Studio 2005.

I use DGEEV to compute eigen values.
It gives me the results.
BUT if I runthe same code for multiple times, the resulting eigen valuesare inconsistent in last few digits.
For example, a particular valuein the first time might be
285.21513152591569,
and in the second time, it becomes
285.21513152591456.

I think DGEEV should be deterministic.
Anyone know how to solve it?

Thank you.

Here is how I use DGEEV.

iSize = iOrder*iOrder;
companionMtx = new double[iSize];
...
eigenReal = new double[iOrder];
...
eigenImag = new double[iOrder];
...
int lwork = iOrder >= 3 ? iSize : iOrder*3;
workspace = new double[lwork];
...
char job = 'N';
int ldv = 1;

DGEEV(&job, &job, &iOrder, companionMtx, &iOrder, eigenReal, eigenImag, NULL, &ldv, NULL, &ldv, workspace, &lwork, &info);

39 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

That looks like a single bit variation. I hope you recognize that double precision gives you 15 to 17 decimal digits accuracy. If you want to avoid differences due to threaded reduction issues, you can use the serial version (MKL9.1, sequential in MKL 10.0), or experiment simply by SET MKL_SERIAL or OMP_NUM_THREADS=1.

hi,

here is an excerpt from page 8-1 of the MKL user's guide:

With a given Intel MKL version, the outputs will be
bit-for-bit identical provided all the following conditions are met:
the outputs are obtained on the same platform;
the inputs are bit-for-bit identical;
the input arrays are aligned identically at 16-byte boundaries.
Unlike the first two conditions, which are under users' control, the alignment of arrays, by
default, is not. For instance, arrays dynamically allocated using malloc are aligned at
8-byte boundaries, but not at 16-byte. If you need the numerically stable output, to get
the correctly aligned addresses, you may use the appropriate code fragment below:

...

maybe related?

Yes, the order of additions in sum reduction differs between arrays placed at odd or even multiples of 8 bytes. You must be running on 32-bit linux.

tim18, you seem to be in the know. why do they do this to us? we have just switched to MKL (from netlib), and now are more-than-half wishing we hadn't.

we can no longer debug the way we used to. outputs from LAPACK calls no longer depend on just the inputs, but now on the memory alignment of inputs. this alignment can change depending on whether we are running in the IDE, vs running the EXE standalone. the alignment can change by altering runtime error-checking switches. the alignment can even be different running two identical snapshotsof the projects that were built in different directories.

this is going to make it very difficult to reproduce, in the IDE, some of thebugs that might come in from users.

further, imagine that a bug has newly been introduced into our source. one option to debug it would be to rebuild a snapshot of sourcecode fromthe last clean-build that didn't have the bug, and run it side-by-side against the buggy code in two IDE instances, looking for discrepancies. we cannot do this now that we have switched to MKL, unless we are always very careful to 16-byte align all of our inputs to the MKL calls.

it seems like a great many LAPACK calls are affected. in the last few hours, i have realigned inputs to DGEQRF, DSYEV, DORGQR, DDOT in efforts to block MKL's wanton obstruction to my attempts to debug something else. it seems i still have more to do...

You would see similar alignment dependencies, if you compile the netlib with a vectorizing compiler, using full optimizations. Current Intel compilers avoid the alignment dependent optimizations when -fp-model precise is set. gnu compilers also have options to control those optimizations. The vectorized sum reduction usually gains accuracy, the problem being the lack of control over how much is gained.
MKL and netlib don't generally use maximum accuracy as a criterion in choice of algorithms, so the alignment dependencies are generally smaller than other roundoff errors.
The alignment situation isn't nearly so bad with 64-bit OS as it is with 32-bit, since malloc() and related allocations are 16-byte aligned. It's difficult for me to imagine that a large fraction of new development with MKL is taking place in 32-bit mode, now that no 32-bit hardware platforms are manufactured, although I doubt anyone has attempted a census.

Hi,

Thanks for the new info. One question though: I infer that 64-bit operating system allocates 16-byte aligned and 32-bit allocates 8-byte aligned. Maybe this doesn't help so much when the Fortran 'array descriptor' is taken into account.

According to this help page for Intel Fortran "Handling Arrays and Fortran Array Descriptors", the array descriptor currently takes 108 bytes for IA-32 and twice that for EM64T and Itanium. Neigher of these sizes are a multiple of 16. Just guessing here, but I expect that the 64-bit OS would 16-byte-align the total memory allocation requirement of array descriptor and data together?So depending how this all works inside, the OS may end up 16-byte aligning the array descriptor, and then follow that by the [non-16-byte-aligned] array elements?

No, the 64-bit OS actually aligns the first data element of an allocated array at a 16-byte boundary. Among the reasons for this are to permit the use of vectorization without alignment adjustment, e.g. by use of VECTOR ALIGNED directive, in C functions using SSE intrinsics, or by gcc vectorized functions.

So, down to the crunch line: what you're saying is that MKL results become deterministic on a single machine, if that machine is runninga 64-bit OS, even without doing all the alignment stuff that is described on pages 8-1 and 8-2 of the User Guide?

We appear to having the same issues when using MKL - the results depend on the data alignment. Will using fp:precise definately get around this problem with any need to manually align the data? Also, is there any difference between precise and source (see extract from help below- source and precise seem to do the same thing).

Thank you!

Tony

keyword
Specifies the semantics to be used. Possible values are:

precise
Enables value-safe optimizations on floating-point data and rounds intermediate results to source-defined precision.

fast[=1|2]
Enables more aggressive optimizations on floating-point data.

strict
Enables precise and except, disables contractions, enables the property that allows modification of the floating-point environment.

source
Enables value-safe optimizations on floating-point data and rounds intermediate results to source-defined precision.

-fp:precise would eliminate many alignment-dependent optimizations in compilation of your own source, but it will not affect the pre-compiled code in MKL.

Thank you! We will do the alignment when calling MKL routines. The manual says you need to align the arrays - does this mean both integer and real (and scalars do not need to be aligned).

Tony

I don't see that DGEEV uses any integer arrays. As you mentioned, scalars don't need special alignment. The numerical differences you mentioned originally apply only to vectorizing optimizations on floating point arrays, where 16-byte alignments are recommended.

Thanks Tim - we didnt originally report the DGEEV differences - but are using other MKL routines (LAPACK and BLAS) where integer work arrays are passed in. But anyway, you are saying that we need only be concerned about double arrays.

Thanks

Tony

Alignment is one issue but it isn't helpful if the eigenvalue problem is ill-conditioned. By switching to ?geevx you compute the condition numbers of the eigenvalues and if these are OK then alignment is worth a shot, otherwise all bets are off that it's going to help.

Gerry

Hi,

Sorry to labor this issue, butnow that we understand the full impact of the requirement to 16byte align all double precision arrays (scalars and integers do not matter), I can only say that it is a great dissapointment to us and is causing us to back out use of MKLin some of our code.

The reason is that we are working with large f77 codes, where memory passed to MKL routines is paritioned fromlarge f77 arrays at the top level. Even though the first element of the large array is aligned, there is no guarantee that the first element of partitioned arrays are also aligned (a quick test reveals that even if the first element is alignment, every even element is NOT). To get around this problem by NOT changing all the partitioning code would require dynamically allocating local 16byte aligned arrays around every MKL call (if we are unlucky that the partition isnt aligned). Butthis allocation and copy costis too expensive for when we are calling the BLAS routines (but probably acceptable, but stilla nusiance, for other more numerically intensive MKL routines).

My question is: is this alignment only required for 32bit (i.e. it is NOT needed for 64bit - the documentation is not clear about this) and why did Intel enforce it anyway to get stable numerics?It is a requirement that could give a lot of users, like us, a lot of grief!

Thanks you.

Tony

Unless I misunderstand you, the only relevant difference between 32- and 64-bit OS is that the latter has better default alignment for dynamic memory allocation.
If you have a packed array of 64-bit double precision, pairs of data must necessarily have consistent 16-byte alignment.
If you are harking back to legacy Fortran, it was standard practice to place the arrays requiring the largest alignment at the beginning of a COMMON block. Any data item offset a multiple of 16 bytes from there (e.g. every other double precision element) would also be 16-byte aligned. We do have real applications which continue to use COMMON to assure 16-byte alignment.
Years ago, before SSE, Lahey made a Fortran release which changed the alignment of COMMON. When I reported it, they agreed it was a bug, and fixed it promptly.
I believe there has been a problem in linux in making 16-byte alignments possible in the main program, and significant effort has gone into fixing it.
In the MKL docs, the 16-byte alignment is not a requirement, it is a suggestion for maximizing performance and getting repeatable results. In level 2 and 3 BLAS, it might be necessary to select leading dimensions to make each column aligned. As the previous poster pointed out, it could be a concern if your algorithm is unstable enough to be destroyed by changes in order of addition provoked by changes in alignment.

>>>> In the MKL docs, the 16-byte alignment is not a requirement, it is a
suggestion for maximizing performance and getting repeatable results.

======================

Tim, I am quite disappointed and frustrated by the response to the questions raised on this thread..... It shows utter disrespect for the essence of numerical computation.

One can understand the differences between debug and optimized code but how can you expect a scientific computation giving different answers at run-time?

Abhi

If your objective is identical results regardless of alignment, at the expense of performance, compile the public source code with appropriate options (-fp-model precise, for Intel linux compilers).
Intel designed Itanium with none of these alignment effects on numerics; the market didn't buy it.

Hi Tim,

I have a followup question about the alignment requirement please. Focusing on the MKL LAPACK routines, presumably these routines internally call some of the MKL BLAS routines (as does the Fortran code version of LAPACK). Has Intel made sure in their MKL LAPACK implementation that any calls to BLAS from LAPACK have 16byte alignment also? This is obviously crucial to ensure that the alignment is carried through LAPACK to BLAS.

Thank you!

Tony

As far as I know, MKL lapack calls the BLAS functions the same way as the public source code. In the few cases I have tried, it is possible to mix Fortran compilation of public LAPACK source with MKL BLAS, and get the same results as with the MKL LAPACK.

Thanks Tim for the quick reply. If MKL lapack calls the BLAS functions in the same way as the public source code, then even if the double arrays passed into MKL lapack are 16 byte aligned, then partitions of them will be passed to the BLAS and the partitions will not necessarily be aligned (unless one is lucky that the partition starts at an odd index into the array).

What this means is that it is IMPOSSIBLE to achieve numerical stability for MKL lapack andsection 8 of the user guide isfalse (forlapack, and probably many other MKL numerical routines too that internally use the BLAS).

Can you confirm that this is the case or refer usto an MKL developer please who worked on the actual MKL code implementation. We really need a definative answer to this, as do your MKL user base as a whole. We dont want to waste our resources trying to alignment all our arrays passed into MKL LAPACK if the internal BLAS calls are not aligned.

BTW, does the alignment issue applies to both windows AND linux 32bit platforms?

Thank you!

Tony

Hi Tim,

Sorry to bug you again - we are very keen to get a response to my previous post. Have you managed to get a definative answer to the question please?

Thanks

Tony

I don't think I can answer all of your questions.
As to one of them. the alignment problem for Windows 32-bit can be more troublesome even than linux 32-bit. For example, malloc() for 32-bit Windows guarantees only 4-byte alignment, and not all C doubles are guaranteed to be 8-byte aligned, except with appropriate declspec (where 16-byte alignment may be specified). There are mm_malloc() and aligned_malloc() non-standard facilities with 16-byte alignments for 32-bit OS.
The idea of 16-byte alignment of Fortran local arrays of 16 bytes or more seems accepted as a goal, but I don't know how much confidence can be placed in it.
64-bit integers in 32-bit OS are handled with combinations of 32-bit operations, so alignment to more than 4 bytes isn't considered important.
64-bit Windows should handle alignment similar to 64-bit linux. I don't have 100% confidence in either.

Thanks Tim. What do you suggest we do to get a definate answer on this from Intel? Would us entering a premier support ticket help us?

Tony

The premier ticket looks like your best bet for further expert attention.

Tony,

I think that you misunderstand something. I think you're worried that because LAPACK will call BLAS functions on sub-arrays that may not be 16-byte aligned that they will suffer from instability from run to run. AmI summarizing you correctly?

It is not as though arrays that are offset by 8 bytes from 16-byte alignment will be unstable. It is the fact that memory allocation from run to run may or may not be 16-byte aligned that causes the instability. Or put another way, Intel MKL is deterministic and it is the instability of memory alignment that causes the instability in results.

So to your example, it's not as bad as you fear. If you always align your memory to 16-byte boundaries, then you will get stability out of the BLAS.Sayan LAPACKfunction calls a BLAS function on a sub matrix which is not 16-byte aligned. If you always 16-byte align your input matrix, then that same submatrix will always be offset from the 16-byte boundary by the same amount. That consistency will ensure the numerical consistency from run to run.

Hope this is clear.

Todd

Thank you Todd - you have answered my question perfectly! I did a test yesterday that confirmed what you are saying, but of course just because it works on a single test case doesnt mean it is true all the time! So its good to hear your opinion confirms what I was beginning to slowly realise.

thank you so much

Tony

Hello,
I have had similar issues with IMK (although we use C#). By aligning to 16 byte boundaries we found the eigen solver and pardiso to be deterministic on a given machine. We still find it annoying that different machines give different answers. Is there anything we can do about that?

Thanks

I'm afraid not. We do take accuracy very seriously and do all we can to keep the rounding error (that is inevitable in these sorts of floating point operations) to a minimum. But our primary goal is to deliver great performance. If a new processor provides new means to improve performance, we'll use those means and that will translate into small differences in performance.

Todd

tim18:
If your objective is identical results regardless of alignment, at the expense of performance, compile the public source code with appropriate options (-fp-model precise, for Intel linux compilers).

This should be written more clearly in the advertising and documentation for MKL. Also, it should be pointed out what the implications are, of results not being repeatible. Perhaps the least surprising is thatthe same code on two identical machines (eg. customer's vs developer's) may give different results.

What is not obvious from of the documentation is that the same code with slightly different compiler settings may give different results. If you want the same results when building "debug" and "release" configuration of your app, for example, bad luck.

But that's still not the full extent of it: the same code, completely identical in all respects except for being compiled in a different directory location,can still give different results. Too bad if you are trying to debug a problem by stepping through two versions of your app in two IDEs side-by-side, and looking for discrepancies - you probably will get discrepancies even if you are looking at exactly the same sourcecode in both IDEs.

I wonder whyIntel could not provide an MKL version that canproducible "repeatible" results for those customers which need that(which of course will be a bit slower, but surely could be made faster than the public source code).

Debug compilations have optimizations removed to improve compilation speed and ease of debugging. You can set the same optimizations with debug symbols as your release configuration.
The biggest advantages of MKL over ifort compilation of public source come in the threaded functions (without many zero data elements), as it is rather difficult to optimize threaded performance with portable source. (A generalization which I may have cause to regret).
BLAS always has presented a trade-off between conditionally skipping operations on zeroes and making it easy for compilers to optimize for non-sparse matrices. That choice is yours to influence when you use public source code.
For what little it's worth,performance optimization by use of alignment-dependent dot products is not as powerful on the recent and future high memory bandwidth machines as it has been in the past. I don't mean that statement to refer to MKL; I have no insight into how MKL will optimize for coming CPU architectures.

I agree that the numerical stability / 16btye alignment issue needs to be better documented and not burieddown at chapter 8 of the user guide.

But Grant: re:

"But that's still not the full extent of it: the same code, completely identical in all respects except for being compiled in a different directory location,can still give different results"

this wont be the case if you align to 16 btye boundaries (something my team has spent days doing because we didnt read chapter 8 beforehand), right?

Hi tgarratt,

Yes, if you align to 16-byte boundaries, Isuspect that you won't encounter that behaviour. I was referring to a comment by Tim18, whichstarts off "If your objective is identical results regardless of alignment...".

I meant for my whole post to be taken as feedback froma guy who is frustrated at having to align everything, and at the extent ofwhat happens if he doesn't.

Cheers, Grant

I do agree with you- I find the 16 btye boundary requirement very nasty and painful and it has cost my company at lot of resources to fix up our code after switching to MKL (just because we didnt happen to look at chapter 8 of the user guide up front).

I also dont think Intel have taken the consequences of not aligningseriously enough and they seem to be waving their hands saying"if your problem is numerically too sensitive, what do you expect?". However, from personal experience there are tough numerical problems that arise from real life models of engineering applications that are hard to solve (e.g. huge condition numbers of matrices from models of chemical distillation columns), so one cant just blame the problem being passed to the solver - the solver needs to be reliable and determinstic, especially the later.

Anyway, frustration aside, the motto is "one must align"! and this needs to go right at the top of the MKL documentation in bold! And the documentation on this issue also needs to be expanded to clarify the situation (the responses and answers on this forum topic can form a good basis for the updated documentation).

Tony

Some more info for the updated documentation (some is probably ifort-specific though):

Q. Do "work" arrays need to be aligned, or just input arrays? What about arrays that are passed to a routine in order to receive the results?

A. All arrays. [Issue467703]

---

Q. When a Fortran95 wrapper omits various array arguments, presumably the wrapper is allocating/deallocating internally. Are those arrays being 16-byte aligned for us?

A. Yes. [Issue467703]

---

Q. When I am passing in a subarray that is made up of non-contiguous memory, eg: A(2:4,2:4), in this case an "array temporary" is created during the call. As far as I'm aware, there is no way possible that I can have control over its alignment?

A. An array temporary is always aligned to 16-byte boundaries by the compiler. [Issue467438]

---

Q. When I am passing in a subarray that is made up of contiguous memory (so that no array temporary gets created) do I need to align it to 16-byte, or is it fine just to align the 'parent' array (to make sure my my subarray is consistently 16-byte or non-16-byte aligned)?

A Just align the 'parent' array. Consistently non-aligned will still give you reproducible results. [Issue467219] Another option is to use an undocumented compiler switch (/switch:fe_not_contig) that will force an array temporary to be created even when the subarray is contiguous, and such an array temporary is always 16-byte-aligned by the compiler [Issue467438].

I hope the updated documentation will also give definitive answers on the 64-bit OS vs 32-bit OS issue (is that even mentioned in the current documentation?). And that it mentions the currently-availablecompiler directives related to alignment, and states when and howthey can be applied to this problem. For example, when is it OK to just do the following, instead of using the alignment code snippet in the MKL documentation?

!dec$ attributes align:16 A

The dgeev routine from MKL is giving me different results depending on the chip that executes the attached code. The included README has a description of the flags used during compilation and the hardware on which it was run. Also included is sample output. The differences are limited to the 'left' eigenvalues.

Is this expected behavior?

The tarball is located here
http://software.intel.com/file/28673

Thanks
Dan

Attachments: 

AttachmentSize
Download dgeev.tar80 KB

Hello Dan,

What is the inconsistent results? Are you expected bit to bit correspondingly output?at the first sight at the result you sent, the biggest difference is
between

-1.22534679413879

and

-1.22534679413878

--Gennady

The largest difference is -5.54223e-13 in the sample output between

6.934683413947925E-00 (harp)
6.934683413948480E-00 (neha)

Am I crazy to expect bit-for-bit consistent output when running the same executable on all x86_64 Intel hardware?

Is the MKL library hardware aware in the sense that it changes the calculation to fit available cache? We ran into this sort of situation once before when going from R10K to R12K chips. The library would ask the kernel for the available L2 size and change it's blocking to fit.

Dan

Leave a Comment

Please sign in to add a comment. Not a member? Join today