concurrent_unordered_map doesn't iterate in parallel

concurrent_unordered_map doesn't iterate in parallel

Hi,

I am tuning my application with VTune, and I have found that in an area of my code that does parallel traversal of a concurrent_unordered_map, I get zero parallelism.  After this part of the code completes, I get a high amount of concurrency.

In particular, I use a parallel_for, with tbb::concurrent_unordered_map::range() to do some work for each (key, value) entry within the map.  I suspect that the lack of concurrency is due to interal waits, or some form of synchronization that is occurring within the implementation.  It could also happen that the range() is not providing sufficient work for the worker threads to participate.

Any suggestions (or experience) with slow traversal via range() ?

4 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Here is a VTune-Friendly example of the problem. I compile this with:

g++ -g -O3 -DTBB_USE_THREADING_TOOLS=1 -I /opt/intel/vtune_amplifier_xe_2013/include/ --std=c++0x play.cpp -ltbb -L /opt/intel/vtune_amplifier_xe_2013/lib64/ -littnotify


#include

#include

#include

#include "ittnotify.h"
#include 
int main()

{

        tbb::concurrent_unordered_map testmap;
        const int worksize = 10000000;
        __itt_domain* vt_domain = __itt_domain_create("sample");
        const std::string event_name_build_table = "build_table";

        __itt_event       event_build_table = __itt_event_create( event_name_build_table.c_str(), event_name_build_table.size() );
        const std::string event_name_traverse_table = "traverse_table";

        __itt_event       event_traverse_table = __itt_event_create( event_name_traverse_table.c_str(), event_name_traverse_table.size() );
                // BUILD THE TABLE

        __itt_event_start(event_build_table);
        tbb::parallel_for(0, worksize, [&](unsigned int x) { testmap[ x ] = x; } );
        __itt_event_end(event_build_table);
                // TRAVERSE THE TABLE

        __itt_event_start(event_traverse_table);
        tbb::concurrent_vector interesting_numbers;
        tbb::parallel_for( testmap.range(),

        [&]( decltype( testmap)::range_type& r)

        {

                for ( auto curr_entry = r.begin(); curr_entry != r.end(); ++curr_entry)

                {

                        // We are going to do something a bit tricky / expensive

                        auto my_num = curr_entry->second;

                        auto counter = my_num;
                        for ( int i = 2; i < 100; ++i )

                        {

                                if ( i % 2  == 0 )

                                {

                                        counter = counter * i;

                                }

                                else

                                {

                                        counter = 3 * counter - i;

                                }
                        }

                        if ( counter < my_num )

                        {

                                interesting_numbers.push_back(curr_entry->second);

                        }

                }

        }

        );
        __itt_event_end(event_traverse_table);
        // Now, just do something silly to prevent optimizer from messing with us

        std::cout << "Junk: " << interesting_numbers.size() << std::endl;
}

Sigh. Looks like posting C++ code doesn't work as well as you would want. I'm attaching a text file instead.

Anlagen: 

AnhangGröße
Herunterladen play.txt1.89 KB

Can you rule out contention on interesting_numbers by omitting it or by combining it from TLS? Maybe it's just a red herring, but...

Melden Sie sich an, um einen Kommentar zu hinterlassen.