I've been trying to understand what the implicit_index intrinsic may be intended for. It's tricky to get adequate performance from it, and apparently not possible in some of the more obvious contexts (unless the goal is only to get a positive vectorization report).
It seems to be competitive for the usage of setting up an identity matrix.
In the context of dividing its result by 2, different treatments are required on MIC and host:
a[2:i__2-1] = b[2:i__2-1] + c[((unsigned)__sec_implicit_index(0)>>1)+1] * d__[2:i__2-1];
a[2:i__2-1] = b[2:i__2-1] + c[__sec_implicit_index(0)/2+1] * d__[2:i__2-1];
That is, the unsigned right shift is several times as fast as the divide on MIC (and not much slower than plain C code), while the signed divide by 2 is up to 60% faster on host (but not as fast as C code).
The only advantage in it seems to be the elimination of a for(), if in fact that is considered to be an advantage.
I didn't see documented anywhere that it is int data type, although the opt-report shows it. I can't see how it could be anything other than positive integers, so the (unsigned) cast seems valid. I guess >>1U would have the same effect with less space taken up compared with (unsigned). The notation is already cryptic from my point of view.