Using the Cilk SDK from Intel, Testing the Bzip2 example. It compiles fine but the runtime is
strange. Testing on a 8-core (dual quad-core) Intel Xeon Linux
machine, and a 1GB input file.
$ bzip2-cilkstatic -z -f -k -p /vg00/lv00/tmp/bzip.in
CILK_NPROC=1 results in 35 seconds runtime.
CILK_NPROC=2 results in 1 minute 5 seconds runtime.
CILK_NPROC=4 results in 1 minute 5 seconds runtime.
CILK_NPROC=8 results in 1 minute 5 seconds runtime.
Using top, 2 cpus used 200% of the cpu, 4 cpus used 400% and 8 cpus
used 800%, but the runtimes are all the same and 2x worse than running
on 1 CPU.
If I use the '-P' instead of '-p' I get good results and good speed up (almost linear). Which is what I expect since the docs says -P is better, but the 3 running times that are the same regardless of CPU usage are weird and do not make a lot of sense.
Whywould this be happening? A bug in theBZIP example?
$ cilk++ --version
cilk++ (GCC) 4.2.4 (Cilk Arts build 8503)