At the moment I am testing some parallel scheduling libraries like TBB, OpenMP, XKAAPI and so on. To get a first quick impression I implemented a naive matrix-matrix multiplication, first for floating point entries, then for uint64 entries. The tests I run on a NUMA with 4 nodes of 8 cores of Intel Xeon CPU E5-4620 0 @ 2.20GHz. Each node has 96 GB of RAM.
Intel® Threading Building Blocks
TBB 4.1 update 3 release is available on our OSS site.
TBB 4.1 Update 3 stable release is available for download on our site - tbb41_20130314oss
Changes (w.r.t. Intel TBB 4.1 Update 2):
Proposing the boost range interface for parallel constructs
I want to know what you guys think about the following additions to the interface of some parallel constructs.
I think almost everybody agrees that the interface for std:: algorithms are a bit verbose and that the boost::range algortihms together with their adaptors have a much cleaner interface.
std::vector<int> vec{3, 2, 1};
boost::for_each(vec, some_lambda);
boost::sort(vec);
We could introduce these interfaces to tbb, too.
ERROR: enumerable_thread_specific prematurely deleted
I encountered a strange problem with enumerable_thread_specific: a enumerable_thread_specific object got deleted prematurely when a root task using it was spawned from a variadic template function (which I added for convenience to a base of the task). See attached code. Tested using gcc 4.7.0 and 4.8 only.
size of task_list
I wonder whether there is a (legal) way to obtain the size of a tbb::task_list.
I would like to write a little template class that implements a continuation task, using CRTP to do the actual work. The simplest way to do this seems to get the size of a tbb::task_list for setting the correct ref_count, see attached sample code.
Btw, wouldn't it be nice if tbb comes with little helper classes like this?
Use TBB in Linux Kernel Space
Hi, I am trying to write a Linux device dirver. I want to know whether or not TBB libraries can be used in kernel-space other than user-space?
Thanks,
Paul
enumerable_thread_specific object creation C++11 improvement
Hello everyone,
When using enumerable_thread_specific, it can be useful to specify the parameters that will be used when constructing a new object when a thread requests for one.
Two similar possibilities exist in the current implementation :
TBB on WinRT (ARM)
Hi, guys!
I'm looking for instruction for building TBB for WinRT on ARM. I'm know, win8 support is quite experimental now, but may be you have one (or can suggest any related posts)? Please, help me.
Two questions about tbb::memory_pool< tbb::scalable_allocator<char> >
For tbb::memory_pool< tbb::scalable_allocator < char > > shared_memory_pool_ . Am I correct that, it pre allocate a subset of memory to avoid malloc system call during runtime? For example, after we called shared_memeory_pool_.malloc(15000000), it wouldn't call the system malloc again but just allocate from the pre-allocated memory until it's out of range (over 15000000), and need to extend the pool size?
Possible concurrent_queue improvement
Hi,
I was wondering why the concurrent queue is using compare and swap to get the next ticket in stead of using fetch and increment. (in concurrent_queue_base_v3::internal_pop_if_present and in concurrent_queue_base_v3::internal_push_if_not_full).
Calling compare and swap may harm the performance of the queue under high contention - the reasons for this are best explained by Dave Dice here: https://blogs.oracle.com/dave/entry/atomic_fetch_and_add_vs
