for loop works faster than cilk_for

for loop works faster than cilk_for

imagem de Zvi Vered
4 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Zvi Vered

Hello,

I wrote the attached code and built it using MSDEV 2008.

My PC is Core2Duo (E8400). O.S: Windows 7 Pro. 32bit

For some reason the for loop works faster (0.049946 sec)  than the cilk_for loop (0.067103)

Can I be sure that both cores are executing the for loop ?

Thanks,

Zvika

Anexos: 

AnexoTamanho
Download cilk1.cpp1.45 KB
Download cilk1.h117 bytes
imagem de Arch D. Robison (Intel)

The example involves going through data that is 240,000,000 bytes (10,000,000 doubles* 8 bytes/double * 3 arrays).  That's much larger that the outer-level cache.   The benchmark has a high memory-access to flop ratio (three memory accesses for each floating-point operation).  So the benchmark is really measuring how fast the memory system can feed the processors.  A single core is likely capable of using the full memory bandwidth for this benchmark. The Cilk code may be slower because the Cilk run-time takes some time to get started the first time Cilk is invoked.  (After that, the Cilk threads are parked so that they can be woken up instead of created from scratch.)  One way to see if the initial startup is part of the issue is to repeat the two benchmark loops several times and see if the Cilk times improve the second time around.

imagem de Zvi Vered

Dear Mr. Robison,

You are right !

On the second iteration, with 1000 elements (smaller than outer cache),  cilk_for was faster.

Best regards,

Zvika

Faça login para deixar um comentário.