Question on Concurrent_queue

Question on Concurrent_queue


I am new to Intel TBB. I am using concurrent_queue in order to achieve fine-grained parallelism in my project. I got few doubts. This is how i am implementing.

thread_fun(arguments) {


          operation on v;  //each thread executes its own operation(seperate v for each thread)


main() {

          fill the concurrent_queue;

          create win32 threads and pass concurrent_queue as an argument; //using _beginthreadex


1. I am explicitly mentioning the number of threads. I read that TBB will create threads based on the processor core count. How can i achieve that? So that i don't need to create threads explicitly with _beginthreadex function?

2. Am i achieving fine-grained parallelism by using the concurrent_queue?

3. What do you mean by task level parallelism? How do you achieve task level parallelism with intel tbb? I am popping element from the queue. Does pop operation considered as a task? That means, all pop operations are considered as different tasks. I am popping 8 elements at a time with 8 threads. That means, i am achieving task level parallelism. Am i correct?

4. If i increase the number of threads to 32 on a quad-core processor(support 8 threads), how does the concurrent_queue work? Does only 8 threads concurrently executed on the queue or total 32 threads are executed concurrently?

I changed my code to use parallel_for as per the comment by robert-reed:

void parallel_relax( Map *m, std::vector<Vertex *> verList ) {
          tbb::parallel_for (blocked_range<int>(0, verList.size()), [=](const blocked_range<Vertex *>& r) {
                    for(Vertex *vit = r.begin(); vit != r.end(); ++vit) {
                             Vertex *v = vit;

main() {

          Map *m1;

          parallel_relax(m1, verList);


Please help me.



4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

I see a number of issues with this code sample.  To start, it's filling a concurrent_queue serially, meaning that it is paying the heavier overhead of a concurrent queue without deriving any benefit, at least as expressed by this psuedo-code.  This is followed by the explicit construction of a pool of threads (_beginthreadx) that will all contend (and further serialize at the try_pop()).  The idea of task level parallelism is to provide work to the library that is dispatched by the thread pool created by Intel TBB, and not spawn a thread per task whenever a task is required.  An approach more in keeping with the philosophies behind Intel Threading Building Blocks would be to accumulate the contents of the concurrent_queue used above into a simple arrray, v, and use an Intel TBB parallel construct to execute the function with different elements in parallel, something like this:

    parallel_for (blocked_range<int> (0,n-1), [] (const blocked_range<int> &r) {
        for (int i = r.begin(); i != r.end(); ++i) {
            // execute operations on v[i];

In this scenario, v has been typed to include whatever data is needed to carry on activities "per thread" and the parallel work is managed by the parallel_for, using the thread pool that Intel TBB will establish.  The blocked_range is used to facilitate use of the simple array  v and in particular, parallel_for can divide the work among a set of worker threads.  The first thread to execute will take the whole array, split it in half, turn over half the array for stealing by some other thread, then repeat.  What results is that multiple threads are invoked in parallel and continue to subdivide the array until they get below a threshold, at which point the "execute operations" code is invoked by multiple threads, each on their own datum.

I would recommend that you take a close look at the Intel Threading Building Blocks User Guide and tutorial document that is included with the distribution (or read the online copy at  This document goes over the basic concepts behind Intel Threading Building blocks and explains task level parallelism.

Hi Robert-reed,

I changed the implementation to use parallel_for and updated code in main question. Is that code perfect? I am running this application on Visual Studio 2010 Express edition. I am getting syntax errors for parallel_for and r.begin(). What should i do?

No, the code I supplied is just more pseudocode, trying to suggest a solution in broad strokes rather than giving a specific implementation.  I did not test my snippet via compiler before replying, so there could be some errors present.  Once again, I suggest you take some time to go through the suggested User Guide document to learn some of the general principles behind Intel Threading Building Blocks.  I have not used VS 2010 Express, but I assume that handles current C++ syntax.  Perhaps you need a compile switch to enable C++11 extensions?  The example psuedocode I provided makes use of a lambda construct, and may require a command line switch in the compiler to handle the code, else I would expect the compiler to complain about syntax.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui