Loading...
You are not logged-in Login/Register





  • Posts   Search Threads
  • san_isnMarch 10, 2011 11:46 AM PST   
    Regarding scalability with number of threads of a simple CnC program

    Hi,
      I wrote a simple program that simple generates tags and processes them (attached below). However its performance degrades when I increase the number of threads. Would very appreciate any insight has to how/why to address the scalability issue.
    Thanks
    Sandeep
    //Stack.cnc
    
    <int l1stack>;
    <int l2stack>;
    <l1stack> ::  (l1compute);
    <l2stack> ::  (l2compute);
    env-><l1stack>;
    (l1compute)-><l2stack>;
    <int l1stack>;<int l2stack>;
    <l1stack> ::  (l1compute);<l2stack> ::  (l2compute);
    env-><l1stack>;(l1compute)-><l2stack>;
    //Stack.cpp
    #include <stdlib.h>
    #include <time.h>
    #include "stack.h"
    #include <iostream>
    stack_context c;
    int ctr = 0;
    // Create an instance of the context class which defines the graph
    int main(int argc, char** argv)
    {
      
      clock_t start, end;
      double elapsed;
      start = clock();
      for(int j = 0; j < 4; ++j)
        {
      for(int i = 0; i < 3000000; ++i)
        {
          c.l1stack.put(j*3000000+ i);
        }
        }
      
      elapsed = ((double) (end-start))/CLOCKS_PER_SEC;
      c.wait();
      end = clock();
      elapsed = ((double) (end-start))/CLOCKS_PER_SEC;
      std::cout<<"Elapsed "<<elapsed<<std::endl;
    }
    int l1compute::execute(const int & t, stack_context & c ) const
    {
      c.l2stack.put(t);
      return CnC::CNC_Success;
    }
    int l2compute::execute(const int & t, stack_context & c ) const
    {
      return CnC::CNC_Success;
    }


    Frank Schlimbach (Intel)March 10, 2011 11:05 PM PST
    Rate
     
    Regarding scalability with number of threads of a simple CnC program

    Hi Sandeep,
    not sure why performance goes down, but if the steps are doing nothing, then the scalability bottleneck is obvioulsy the conccurrent use of the tag-collection. How many threads are you using?

    There is one feature in the API (but not in the spec syntax (yet)) to reduce the overhead of CnC. We call it tag-ranges (even though the better term would be tag-sets). It is similar in spirit to TBB's ranges. Instead of putting individual tags, the API also accepts putting a bunch of tags at once and takes care for partitioning the tag-space internally.

    In your example, you could the following:

    1. When declarting the tag_collection:
    typedef tbb:blocked_range< int > int_range;
    CnC::tag_collection< int, int_range > l1stack;

    In the env-code, relace your nested loop with

    for(int j = 0; j < 4; ++j)
    {
        c.l1stack.put_range( int_range( j*3000000, j*3000000+3000000 ) );
    }

    or, even better, with just

    c.l1stack.put_range( int_range( 0, 5*3000000 ) );


    This will reduce the pressure on l1stack, but not from l2stack. A little more advance version of tag-ranges would be used to also reduce the overhead on l2stack. But first please let us know if the above would be a feasable approach in general.

     



    san_isnMarch 11, 2011 2:46 PM PST
    Rate
     
    Regarding scalability with number of threads of a simple CnC program

    Hi Frank,
     Thanks for taking a look.  Unfortunately loading tags in bulk wouldn't be an option. I thought may be code generation is too fast to keep up with the synchronization. 
    Therefore I changed the code to read tags randomly from a vector (add memory latency).  Unfortunately this didn't help me either. 
    I have now tried on two different cpus. I guess I would wait newer hardware from intel for this code to work.
    -Sandeep


Forum jump:  

Intel Software Network Forums Statistics

17,025 users have contributed to 48,319 threads and 172,758 posts to date.

In the past 24 hours, we have 11 new thread(s) 54 new posts(s), and 47 new user(s).

In the past 3 days, the most popular thread for everyone has been Optimalization of sine function\'s taylor expansion The most posts were made to Most likely, the issue is that The post with the most views is Optimalization of sine function\'s taylor expansion

Please welcome our newest member redfruit83


For more complete information about compiler optimizations, see our Optimization Notice.