TBB3: number of threads are different on acano01 and batch-system?

TBB3: number of threads are different on acano01 and batch-system?

Hello

I just testing my program on the MTL with TBB3 and I experienced that the number of threads are different depending on the machine I run it. The method task_scheduler_init::default_num_threads() returns 64 on the acano01 and 32 on the batch system.

I thought that both system are the same (from hardware point of view).

Maybe someone can give me some advice if the hardware is different and/or there is a switch to enable the remaining 32 threads on the batch system.

Thank you.

Best regards,
Michael

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

The log-on system has HT enabled. (32 cores, 2 threads per core)
The batch system has HT disabled. (32 cores, 1 thread per core)

Jim Dempsey

www.quickthreadprogramming.com

Understood.

However it would be somehow interesting to know how the system scales when using HT. Is there any experience so far? In case I'm using 64 thread (or TBB tasks) what is the expected maximum (ideal) speed-up? It should be somehow between 32 and 64. Is it closer to 32 or it is closer to 64? If I remember correctly I read some information from Intel that HT increases the performance by 30%. This would mean I can expect at max a speed-up by 40 using HT-enabled? Or am I completely wrong?

Any comments?

Best regards,
Michael

Michael,

The degree of performance boost of HT is highly dependent on the application and the programmers ability to coordinate the threds shareing availablecaches. On the MTL systemyou have 4 processors, 4 L3 caches, 8 L2 caches, 8 L1 caches. Where on the system with HT enabled

nThreads=64
nL3=4
nThreadsPerL3=16
CacheSize_L3=25165824
CacheLineSize_L3=64
nL2=32
nThreadsPerL2=2
CacheSize_L2=262144
CacheLineSize_L2=64
nL1=32
nThreadsPerL1=2
CacheSize_L1=32768
CacheLineSize_L1=64
L3(0) = {0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60}
L3(1) = {1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61}
L3(2) = {2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62}
L3(3) = {3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63}
L2(0) = {0,32}
L2(1) = {1,33}
L2(2) = {2,34}
L2(3) = {3,35}
L2(4) = {4,36}
L2(5) = {5,37}
L2(6) = {6,38}
L2(7) = {7,39}
L2(8) = {8,40}
L2(9) = {9,41}
L2(10) = {10,42}
L2(11) = {11,43}
L2(12) = {12,44}
L2(13) = {13,45}
L2(14) = {14,46}
L2(15) = {15,47}
L2(16) = {16,48}
L2(17) = {17,49}
L2(18) = {18,50}
L2(19) = {19,51}
L2(20) = {20,52}
L2(21) = {21,53}
L2(22) = {22,54}
L2(23) = {23,55}
L2(24) = {24,56}
L2(25) = {25,57}
L2(26) = {26,58}
L2(27) = {27,59}
L2(28) = {28,60}
L2(29) = {29,61}
L2(30) = {30,62}
L2(31) = {31,63}
L1(0) = {0,32}
L1(1) = {1,33}
L1(2) = {2,34}
L1(3) = {3,35}
L1(4) = {4,36}
L1(5) = {5,37}
L1(6) = {6,38}
L1(7) = {7,39}
L1(8) = {8,40}
L1(9) = {9,41}
L1(10) = {10,42}
L1(11) = {11,43}
L1(12) = {12,44}
L1(13) = {13,45}
L1(14) = {14,46}
L1(15) = {15,47}
L1(16) = {16,48}
L1(17) = {17,49}
L1(18) = {18,50}
L1(19) = {19,51}
L1(20) = {20,52}
L1(21) = {21,53}
L1(22) = {22,54}
L1(23) = {23,55}
L1(24) = {24,56}
L1(25) = {25,57}
L1(26) = {26,58}
L1(27) = {27,59}
L1(28) = {28,60}
L1(29) = {29,61}
L1(30) = {30,62}
L1(31) = {31,63}
 

When HT is disabled logical processors 32:63 will be omitted.

When HT is disabled, each L3 will have eight logical processors sharing the L3 - and no logical processors sharing L2 nor L1.

When HT is enabled, you have sixteen logical processors sharing each L3, two logical processors sharing each L2 and two logical processors sharing each L1.

The key to optimal performance is the programmer's skill at thread team coordination within or across available caches.

As to 30% performance boost for HT, some algorithms can drive this up by an order of magnitude or more. See http://software.intel.com/en-us/articles/superscalar-programming-101-mat...

Jim Dempsey

www.quickthreadprogramming.com

Hi Michael,

you can find an example of tbb scalability in my blog http://software.intel.com/en-us/blogs/2010/06/11/intel-tbb-30-in-intel-manycore-testing-lab/

--Vladimir

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen