I have a question regarding the behavior of _mm_stream_ps (movntps) when the target address is already in the cache.
Does it simply write in the cache or does it schedule a non-temporal store?
The issue arrises when doing in-place processing on large amounts of data.
In order to hide the memory latency as much as possible, I'm using _mm_prefetch to asynchronously load the data in the cache. But since the processing is done in-place, I'm wondering if I should use streaming or regular stores when writing back the new data. I know that movntps is typically used to reduce cache pollution
when writing uncached data, but what if the memory was previously cached using prefetching? What type of stores should be used when doing non-temporal in-place
I would expect that movntps frees the cache line containing the previous data and directly writes the new data to memory but is it really what it is doing?
In this situation, are there any advantages in using non-temporal stores over regular
Any differences between PIII and P4?