parallelism degree?

parallelism degree?

Dear all,

 I try to know the number of parallel instruction involved in a OpenCL kernel regarding the kernel parameters... For instance, with 4-core Xeon, I launch 8 workgroup of 32 threads. (1 workgroup per HW thread).  We have so a parallelism degree of = 8 x parallelism degree of workgroup..

What is the parallelism degree of a workgroup? I know that the code is scalarized and vectorized to fit with the xmm registers width.. And we must consider pipeline mechanism..

Any idea?

Regards, Michael

2 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Thanks Michael for your question.

You are definately right. Each workgroup is implemented as a loop over the work-items. Then the loop is unrolled to the "float" SIMD width of the CPU. So double precision operations would need 2 SIMD egisters for each argument. This gives paralelism of 8 on today's CPUs (4 for doubles).

In addition, each CPU core can issue multiple different instructions rep cycle. The level of instuction level parallelism is dependent on the CPU model (generation) on the combination of instruction ready to execution at any given cycle and on the availeble CPU resources at that clock cycle. Hoever, the OpenCL compiler wouldn't expose such parallelism by additional loop untoling.

Arik

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!