parallel studio

TBB initialization, termination, and resource management details, juicy and gory.

Well, maybe more essential than juicy, and rather treacherous than gory. As I noted in my previous blog introducing a major task scheduler extension – support for task and task group priorities, TBB has been steadily evolving ever since its inception.

TBB 3.0 and processor affinity

A week ago I started telling about a couple of new helpful features in the TBB 3.0 Update 4 task scheduler, and we talked about the support for processor groups – an extension of Win32 API available in 64-bit edition of Windows 7. The main purpose of processor groups is to extend Win32 capabilities to allow applications work with more than 64 logical CPUs.

TBB 3.0, high end many-cores, and Windows processor groups

Though I wrote my previous TBB task scheduler blog just a few days after TBB 3.0 Update 4 had been released, I ignored that remarkable event, and instead delved into more than two year old past. So today I’m going to redeem that slight, and talk about a couple of small but quite useful improvements in the TBB scheduler behavior made in the aforementioned update.

Parallel Studio at IDF - catch up on-line if you weren't with us at IDF

If you weren't able to join us at these talks in San Francisco at IDF this week - here are three talks about Intel Parallel Studio 2011.

Geoff gave a great overview in a "Technical Insight talk":
Parallel Programming on Intel Architecture with Intel Parallel Studio

Geoff Lowney, Intel Fellow, Software and Services Group, Intel

I did my version of an overview:

试谈 Cilk™Plus 并行程序性能优化的几个问题

性能优化的问题,对于编写Cilk™Plus程序实现并行化也同样重要。

工作密取的调度算法能够帮助Cilk™Plus程序有效地将任务块分配到各个处理器(核)上,从而高效地利用处理器资源。但是如果没有仔细地设计算法,使得整个任务被分成少量的较大任务块,或者大量的小任务块,同样会因为缺乏足够的并行度使得所有处理器保持忙碌状态,或者频繁任务调度带来的大量额外开销,最终导致程序并行的实际效果并不理想。特别是当你使用cilk_spawn时,需要注意避免衍生出大量的小任务块。

通常情况下,Cilk™ Plus程序常见的性能隐患大致会有以下几种:
1) cilk_for的GrainSize设置
英特尔编译器和运行系统会使用一个公式来计算缺省的粒度值。你也可以通过试验不同的粒度值来进行性能调优。

2) 锁竞争
使用锁通常会降低程序并行度而影响性能。

3) 高速缓存的效率和内存带宽
多个核对总线带宽的竞争限定了内存和处理器之间进行数据传输的速度。因此在设计和实现) Cilk™Plus并行程序时,要考虑到高速缓存效率和数据/空间局部性。

TBB 3.0 task scheduler improves composability of TBB based solutions. Part 2.

Master threads isolation described in the first part of the blog was not the only change in the TBB 3.0 scheduler ameliorating composability of the code parallelized with TBB. Another tightening in the scheduler guarantees improves a popular usage model described in the TBB Reference Manual as “Letting main thread work while child tasks run”. Here is a short example of what it looks like:


页面

订阅 parallel studio