Using CnC with a large number of nodes

Using CnC with a large number of nodes

Bild des Benutzers yvdriess

Hello, I generate CnC++ code with a very large amount of steps, item and tag collections. The compilation times have grown to be unusably high. Currently, I'm metering the compilation time in different artificial scenarios. Currently I have only tested a deeply pipelined graph, a long chain of produce/consume steps. I was able to make some simple measurements on compilation time for gcc: (/usr/bin/time -f "%e %M" make)

   depth  time(s)  memory(KB)
1     50  8.30  963504
2    100 16.04 1230800
3    150 24.36 1462288
4    200 32.78 2030192
5    250 47.17 2106112
6    300 51.39 2989632
7    350 61.87 3055904
8    400 74.20 3138368
9    450 82.55 3411264
10   500 93.81 4678272
1   500  93.82  4678144
2  1000 236.80  7983872
3  1500 411.79 12790112
4  2000 626.52 14383056
5  2500 940.05 21012192
where the number of nodes is thus roughly 3*depth. I'm currently running more tests on larger depth scales, but the linear trend seems to continue. The current CnC++ does not seem to scale well with the number of nodes.Most of the blame lies probably the gigantic context object, perhaps g++'s template expantion is to blame too. Does anyone know of any hacks or workarounds that would enable me to keep scaling the number of nodes?
5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Frank Schlimbach (Intel)

Hi,
thanks for sending the code to us.

7500 collections is quite sizable...

Due to the heavy use of templates, the compilation time will grow with a increasing number of prescription relations. You might consider merging several step-collections into one and use a combined tag. This is not ideal, but if possible, it should address this issue.

I'll look a bit more into your code. We'll try to find a better solution.

frank

Bild des Benutzers Frank Schlimbach (Intel)

The key issue is reducing the number of types. If different steps (or tuners)only differ in a "static" value/attribute, you can reduce the number of step-collection types by parametrizing a singlestep-collection-type: Give the step an unmutable attribute to differentiate between different "types". When creating the step-collection instances just pass a step-instance instantiated with the right parameter as an argument to the constructor fo the step-collection. This removes static differences by making them runtime options. It should have no significant effect on the runtime performance. Here's a sample code snippet:

struct my_step 
{
    int my_arg;
    my_step( int a ) : my_arg( a ) {}
    int execute(...)
    { //do the work dependent on my_arg }
};

struct my_context : public CnC::context< my-context >
{
    CnC::step_collection< my_step > m_sc1;
    CnC::step_collection< my_step > m_sc2;
    my_context() 
        : m_sc1( *this, "step1", my_step( 1 ) ),
          m_sc2( *this, "step2", my_step( 2 ) )
    { // do all the graph wiring etc. }
};

This is in the new API syntax. I recommend switching to the new release (0.7).

You can do similar thing with the tuner argument to the step-collection constructor. The difference is that the optional tuner argument must persist over the lifetime of the step-collection, while the step-instance is copied when the collection is constructed.

Another trick is to embed the parameter in the tag. While this also reduces the number of collection types, it will apparently duplicate data/information in each tag. In some cases it might be easer to implement, though.

Does this help?

frank

Bild des Benutzers yvdriess

The change to 0.7 has solved this issue for me, thanks! I am currently using the above suggestion: putting parameters in step objects. Additionally, this also allowed me to put step objects and tuner objects into a simple std::vector. This in combination with putting tag/item collections in std::vectors as well has dramatically cut down on compilation time. The old 6.30h compile has been reduced to under a minute with -O3 enabled. I think it is safe to assume this topic's issue is now resolved :p PS. I also had great results with the vector_tuner; memory use has been cut in half in my particular use-case.

Bild des Benutzers Frank Schlimbach (Intel)

Excellent!

It's paritcularly nice to hear the the vector-based item-collection leads to such great performance improvement.

Thanks for sharing your experience with us.

frank

Melden Sie sich an, um einen Kommentar zu hinterlassen.