Intel® Threading Building Blocks

poor performance on linux?

Hi all,
I have this program using TBB that runs 4 time faster than its serial version on my Windows Vista. But when I compiled it on linux, it runs slower than its serial version on the same linux machine. Both windows and linux machines have 8 cores and 32G memory. I just wonder what could have caused the big difference here?

assert failed - thread has not activated a task_scheduler_init object


I got the assertion failure mentioned in the title when calling parallel_for from a separate thread. The parallel_for loop is inside a dll that initializes a task_scheduler_init object and a .net application calls processing routines from this dll using .net thread pool.
Do I need to initialize a task_scheduler_init object for every thread I create ? If someone can enlighten me on this topic, I would appreciate :)

Best regards

Mailbox attachment problem

I have executed the seimsic from the example directory.
I find out that during the execution the same mailbox would be attach to the same thread repeatly.
Does this mean that the mapping from mailboxs to threads could be altered during the execution?
Or Does the my_affinity_id of threads could be changed during the execution?



#include "tbbspin_mutex.h" breaks C++?

Very noob question here:

When I #include "tbb\spin_mutex.h" into my header file, i get a whole flood of unhelpful C++ related errors that make very little sense (the C++ compiler very rarely emits useful errors). Commenting it out fixes the problem. I am loading "tbb\tbb_stddef.h" right before "tbb\spin_mutex.h"

I'm using the Intel C++ Compiler 11.0.066 IA32-target on a x64 Windows Vista System.

Very odd, any thoughts before I start going through line-by-line?

Task Execution Related Problem


I have some questions about the task execution.
Does all the tasks in the deque or the mailbox could be executed interchangeably?
I want to steal the task in other's mailbox and execute it right away under some special circumstances.
Is there anything I should aware of?

I find out that the task's execute() methods are only called in the wait_for_all().
Does that mean only the wait_for_all methods could make the task been executed?
I am trying to keep track of all the currently running tasks and record their affinity_ids.

For-loop performance, what's wrong?


I'm new to TBB and just started experimenting with it using tutorials. My first attempt is to test performance of a simple loop over a big array of floats. Once using TBB, and without. Comparing the time required for each tech, it was surprising. Check yourself and correct me if I'm doing somethign wrong:

#include "tbb/task_scheduler_init.h"
#include "tbb/parallel_for.h"
#include "tbb/blocked_range.h"
#include "tbb/tick_count.h"

using namespace tbb;

#define BIGARRSIZE 100000

float big_arr[BIGARRSIZE];

GPU + TBB: Scheduler Patch

Hi all,

The editor ate my last message. Lesson: don't try to attach files.

I'm working on combining CPU and GPU parallelism. I have a global data structure which stores special tasks which can run on either the GPU or CPU. I have a function called pbb::execute_data_task_cpu() that 1) returns true if if a task was found, and it was executed 2) returns false if the global GPU/CPU task data structure was empty. Notice that regular tbb::tasks still are there as always, the tasks I'm talking about are special and different.

Subscribe to Intel® Threading Building Blocks