loop not vectorized


Problem version: 
_cproc_p_11.0.074_intel64

Environment : 
RHEL5 , Intel® 64

Problem :
With the latest version of the Intel compiler, the following loop code didn't get vectorized. If you look at the code, it is clear that there are no loop carried dependencies. The compiler should have resolved the dependencies by doing IP optimizations and or BoxMuller function inlining and then vectorizing the loop. But it doesn't. I tried using the vectorization options, #pragma ivdep, #pragma vector always. All these didn't help.

/** Original Code **/
#define MT_COUNT 4096
#define N_ RNG 2
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define PI 3.14159265358979f
void BoxMuller(float& u1, float& u2)
{
float r = sqrtf(-2.0f * logf(u1));
float phi = 2 * PI * u2;
u1 = r * cosf(phi);
u2 = r * sinf(phi);
}

int main()
{
Int i;
float *hR = (float*)malloc(MT_COUNT*N_RNG*sizeof(float));
for(i = 0; i < MT_COUNT * N_RNG; i += 2)
BoxMuller(hR[i + 0], hR[i + 1]);
return 0;
}
/****************/

When I manually inline the BoxMuller function, it vectorizes the loop.

/** Rewritten code **/
#define MT_COUNT 4096
#define N_ RNG 2
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define PI_2 (2*3.14159265358979f)
int main()
{
int i;
float *hR = (float*)malloc(MT_COUNT*N_RNG*sizeof(float));
for(i = 0; i < MT_COUNT * N_RNG; i += 2)
{
float r = sqrtf(-2.0f * logf(hR[i + 0]));
float phi = PI_2 * hR[i + 1];
hR[i + 0] = r * cosf(phi);
hR[i + 1] = r * sinf(phi);
}
return 0;
}
/**************/

Resolution : 
This bug will be fixed in later major versions

Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione