Streaming SIMD Extensions anyone?

Streaming SIMD Extensions anyone?

I have just read about Streaming SIMD Extensions on intel cpu's.

Are these usable within the CVF compiler?

Thanks, TimH

Message Edited by on 12-09-2005 10:10 AM

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

No. The only SSE instructions CVF uses are those for data prefetch.


Retired 12/31/2016

What is the Data-Prefetch?


Prefetch is way that the compiler can tell the processor "In a little while, I'm going to touch this particular memory address, so why don't you start loading it into the cache for me if it's not already there." This is a way of reducing memory latency and can give a 10-20% boost in performance for some applications.

The compiler looks at memory reference patterns, for example, stepping through an array, and automatically inserts prefetch instructions ahead of when the data will be used, improving the chance that the data will be in the cache when needed.

This is a big help on Pentium III, but not so much on Pentium 4 where the processor itself tries to predict memory use patterns and does its own prefetching. We found, for example, that applying a Pentium III prefetch model to Pentium 4 actually made performance worse! CVF 6.6 uses a more appropriate memory system model for Pentium 4, resulting in fewer prefetch instructions issued, and better performance.

We were able to add this to CVF 6.5 because our optimizer already knew how to do prefetching for Alpha, so it was just a matter of tuning the memory model.


Retired 12/31/2016

Thanks that is a useful answer.

1. Are you saying that I don't need to do anything to utilize pre-fetch as long as the exe is run on a P4?
Do I need to, at least, turn on a P4 switch at compile time?

2. Do you anticipate the ability to utilize this feature on Intel Fxx anytime soon?


In CVF, you have to compile with /arch:pn4 for Pentium 4, or /arch:host if you want to run on the same computer you're compiling for.

Intel Fortran has /QX and /QAX switches to select architecture - the /QAX variant generates code that automatically detects the running CPU type and dispatches to the appropriate code set.


Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today