Parallel for giving inconsistent results

Parallel for giving inconsistent results

I am using parallel for to measure the performance gain relative to simple for loop version , but, I get correct result only when I use simple partitioner with grainsize 1 but it takes double time.

When I don't explicitly provide any partitioner and grainsize , it gives me correct expected value of count till n = 70 ,beyond that , it starts giving random values across different runs. I tried with removing inner loop as well , but that didn't help either. Can anyone tell me what am I missing here?

#include "tbb/tbb.h"
#include <iostream> 
#include <string>
//#include <chrono>
#include <sstream>
#include <ctime>
#include <atomic>
#include <utility>

using namespace tbb;
using namespace std; 

std::atomic<int> count(0);

void foo(const tbb::blocked_range<int>& range ){
    for (int i = 0 ; i < 10000; ++i)
        string l_czTempStr;
        std::ostringstream oss;
        oss << "Test data1";
        oss << "Test data2";
        oss << "Test data3";
        l_czTempStr = oss.str();
      // ::count.fetch_add(1,memory_order_release);

int main() 

    cout <<"hello" <<std::endl;
    int n = 1000;
    clock_t  tStart = clock(); //clock start time 
     tick_count t0 = tick_count::now();
    for(int j=1;j<=n;j++) {
        ::count = 0;
     tick_count t2 = tick_count::now();
        tbb::parallel_for(tbb::blocked_range<int>(0,n,j), [&](const tbb::blocked_range<int>& range){

        tick_count t3 = tick_count::now();
     cout<< "grainsize: "<< j << " count:" <<::count << " time: "<< (t3-t2).seconds() <<endl;
    cout << "gs done" <<endl;
//  parallel_for<size_t>( 1, 10, 1, foo );

   tick_count t1 = tick_count::now();
    printf("work took %g seconds\n",(t1-t0).seconds());
    cout<<(double)(clock() - tStart)/CLOCKS_PER_SEC*1000<<endl; //wall time total
    cout << "count - " << ::count <<endl;
    cout << "is lock free - " << ::count.is_lock_free() <<endl;

    return 0; 


3 posts / 0 new

Can please anyone help me with this?


you do not know how many times foo() is called because of dynamic balancing. So for N calls of foo() you should get count=N*10000.

usually if you need the same result you need to use blocked range and not just declare it. Instead of 

void foo(const tbb::blocked_range<int>& range ){
    for (int i = 0 ; i < 10000; ++i)


void foo(const tbb::blocked_range<int>& range ){
    for (int i = range.begin() ; i < range.end(); ++i)


Leave a Comment

Please sign in to add a comment. Not a member? Join today