all modern CPUs have caches to speed up processing.
I made some simple benchmarks to see the impact. It is pretty surprising how things become much faster under high load while the programs became even slower under low CPU loads.
I think that the main problem is the place of the prefetch instruction relative to the actual use of the data.
Is there anybody who has some experience what a good method is to fill the cache?
There must be some distance between the prefetch instruction and the actual use of the data. The problem is, if the distance is to big, the data gets removed before being used and is reloaded again when really being used. If the distance is to small the program is through and will have to wait until the cache is filled.