for loop works faster than cilk_for

for loop works faster than cilk_for

4 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hello,

I wrote the attached code and built it using MSDEV 2008.

My PC is Core2Duo (E8400). O.S: Windows 7 Pro. 32bit

For some reason the for loop works faster (0.049946 sec)  than the cilk_for loop (0.067103)

Can I be sure that both cores are executing the for loop ?

Thanks,

Zvika

Anlagen: 

AnhangGröße
Herunterladen cilk1.cpp1.45 KB
Herunterladen cilk1.h117 Bytes

The example involves going through data that is 240,000,000 bytes (10,000,000 doubles* 8 bytes/double * 3 arrays).  That's much larger that the outer-level cache.   The benchmark has a high memory-access to flop ratio (three memory accesses for each floating-point operation).  So the benchmark is really measuring how fast the memory system can feed the processors.  A single core is likely capable of using the full memory bandwidth for this benchmark. The Cilk code may be slower because the Cilk run-time takes some time to get started the first time Cilk is invoked.  (After that, the Cilk threads are parked so that they can be woken up instead of created from scratch.)  One way to see if the initial startup is part of the issue is to repeat the two benchmark loops several times and see if the Cilk times improve the second time around.

Dear Mr. Robison,

You are right !

On the second iteration, with 1000 elements (smaller than outer cache),  cilk_for was faster.

Best regards,

Zvika

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen