Loop unroll heuristic

Loop unroll heuristic

Is is fair to say, as a "rough cut" heurisitc, that loops in ArBBmay be profitably unrolled to the max.number of cores (or hyper-threads) typically available? Where do diminishing returns set in?
- paul

publicaciones de 4 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hello Paul,

Although a loop in Intel ArBB does not express parallelism, a loop might be subject of vectorization. Unrolling a loop is repeating the loop body a number of times ("unroll factor"). In case of Intel ArBB, this is also increasing the scope of code fusion within the loop's body. Unrolling a loop in general can be subject of complex heuristics, e.g. when depending on the register pressure. There is no simple "rough cut" heuristic, but if you like to continue using your rule you can consider replacing "number of cores" by "SIMD-width" which again shows the complexity due to the dependency on the width of the types involved. Note, that loops in Intel ArBB are subject of loop unrolling. Moreover, using for example a for-loop inside of a _for-loop is something exploiting the runtime-generation of code (JIT), and it is able to express loop-unrolling driven by user-code. Examples are given here (user guide, "Run-time Specialization Using Closure Capture") and here (forum thread).

Hans

Fair enough.

In the case of map(), is a sequence of (perhaps unrolled) calls to the same mapped operation potentially fused?

- paul

Hi Paul,

we already have something to do what you suggest, but it is not enabled yet. Your suggestion is seriously taken and it is not lost. Moreover, you said "calls to the same [...]" which is actually quite right with respect to "call" (in contrast to "inline").

Hans

Inicie sesión para dejar un comentario.