flow graph: limiter_node in v4.1 update1 doesn't work as in previous versions

flow graph: limiter_node in v4.1 update1 doesn't work as in previous versions

Hi there!

In the transition from TBB "4.1" to "4.1 Update 1" something seems to have gone wrong.

Attached is a simple test case (inspired by http://software.intel.com/en-us/blogs/2011/09/14/how-to-make-a-pipeline-with-an-intel-threading-building-blocks-flow-graph).

The program is supposed to generate 100 numbers, let them be squared, and then print the square. The limiter node is set to an arbitrary limit of 7, to limit the number of numbers that are processed in parallel.

I use Visual Studio 2012 to compile this. With TBB V4.1 (20120718) and earlier versions, everything works fine: 100 numbers and their squares are printed. With TBB V4.1 Update 1 (20121003) the program stops after 8 numbers or so. It just terminates. This can be influenced by adjusting the limiter_node limit (7 in the example).

I have traced the problem back to the include file tbb/flow_graph.h: I can use both versions of the TBB DLL, but only the 20120718 version of tbb/flow_graph.h will produce the desired result.

For me, this is a showstopper. I have no idea how to work around this. Hope I've provided enough info! Thanks for any help or input on this - maybe I've just misunderstood the whole concept of a limiter_node :-)

Michael

41 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Michael,

>>...tbb20121003-flow-graph-bug.cpp

Thanks for the test-case and I wish everybody does the same. I'll take a look at a little bit older TBB version 4.0 Update 3 ( Commercial-Aligned Release ) in order to see how it executes your test-case. I hope that it will be done in a right way.

Best regards,
Sergey

Original posting confirmed (on Linux, for tbb41_20120718oss_src.tgz and tbb41_20121003oss_src.tgz to be explicit).

This is a quick follow up...

Hi Michael,

>>>>...tbb20121003-flow-graph-bug.cpp
>>
>>Thanks for the test-case and I wish everybody does the same. I'll take a look at a little bit older TBB version 4.0 Update 3
>>( Commercial-Aligned Release ) in order to see how it executes your test-case. I hope that it will be done in a right way.

It works in a right way ( ! ) and also it works in a wrong way ( ! ). Here are correct results:

[ This is an example of correct output ]
...
** Manually Ordered ( Edited ) output ** ( Passed )

Sending number : 0.0 -> Received number: 0.0
Sending number : 1.0 -> Received number: 1.0
Sending number : 2.0 -> Received number: 4.0
Sending number : 3.0 -> Received number: 9.0
Sending number : 4.0 -> Received number: 16.0
Sending number : 5.0 -> Received number: 25.0
Sending number : 6.0 -> Received number: 36.0
Sending number : 7.0 -> Received number: 49.0
Sending number : 8.0 -> Received number: 64.0
Sending number : 9.0 -> Received number: 81.0
Sending number : 10.0 -> Received number: 100.0
Sending number : 11.0 -> Received number: 121.0
Sending number : 12.0 -> Received number: 144.0
Sending number : 13.0 -> Received number: 169.0
...

I'll continue some time later and technical details will be provided.

>>...It works in a right way ( ! )...

[ Test-Case 1 - 'printf' was used instead of 'std::cout << ... << std::endl' ]

** Unordered output ** ( Passed )

Sending number : 0.0
Sending number : 1.0
Sending number : 2.0
Received number: 0.0
Sending number : 3.0
Received number: 1.0
Sending number : 4.0
Received number: 4.0
Received number: 9.0
Sending number : 5.0
Received number: 16.0
Sending number : 6.0
Received number: 25.0
Sending number : 7.0
Received number: 36.0
Sending number : 8.0
Received number: 49.0
Sending number : 9.0
Received number: 64.0
Sending number : 10.0
Received number: 81.0
Sending number : 11.0
Received number: 100.0
Sending number : 12.0
Sending number : 13.0
Received number: 121.0
Received number: 144.0
Received number: 169.0

** Manually Ordered output ** ( Passed )

Sending number : 0.0 -> Received number: 0.0
Sending number : 1.0 -> Received number: 1.0
Sending number : 2.0 -> Received number: 4.0
Sending number : 3.0 -> Received number: 9.0
Sending number : 4.0 -> Received number: 16.0
Sending number : 5.0 -> Received number: 25.0
Sending number : 6.0 -> Received number: 36.0
Sending number : 7.0 -> Received number: 49.0
Sending number : 8.0 -> Received number: 64.0
Sending number : 9.0 -> Received number: 81.0
Sending number : 10.0 -> Received number: 100.0
Sending number : 11.0 -> Received number: 121.0
Sending number : 12.0 -> Received number: 144.0
Sending number : 13.0 -> Received number: 169.0

>>...also it works in a wrong way ( ! )...

[ Test-Case 2 - 'std::cout << ... << std::endl' was used ]

** Unordered output ** ( Output is broken )

Sending number : 0
Sending number : Received number: 01
Sending number :
Received number: 2
Sending number : 1
Received number: 3
Sending number : 4
Received number: 4
Sending number : 9
Received number: 5
Sending number : 16
Received number: 6
Sending number : 25
Received number: 7
Sending number : 36
Received number: 8
Sending number : 49
Received number: 9
Sending number : 64
Received number: 10
Sending number : 81
Received number: 11
Sending number : 100
Received number: 12
Sending number : 121
Received number: 13
144
Received number: 169

** Manually Ordered output ** ( Output is broken / Sending and Received numbers are flipped )

Sending number : 0
Sending number : 1
Sending number : 1
Sending number : 4
Sending number : 9
Sending number : 16
Sending number : 25
Sending number : 36
Sending number : 49
Sending number : 64
Sending number : 81
Sending number : 100
Sending number : 121
Sending number : 144

Received number: 0
Received number: 2
Received number: 3
Received number: 4
Received number: 5
Received number: 6
Received number: 7
Received number: 8
Received number: 9
Received number: 10
Received number: 11
Received number: 12
Received number: 13
Received number: 169

Can you see that the output is broken?

>>...
>>Sending number : 0
>>Sending number : Received number: 01
>>Sending number :
>>Received number: 2
>>...

It is clearly a problem with 'std::cout << ... << std::endl' and it looks like it doesn't have a critical section that controls access to a CONSOLE device. So, if you save your data in an array everything should be correct.

Do you need modified sources?

Best regards,
Sergey

Note: As I already mentioned all tests are done with older TBB version 4.0 Update 3 ( Commercial-Aligned Release ) and Visual Studio 2005 in Debug and Release configurations.

Just one more note...

>>...So, if you save your data in an array everything should be correct.

Please take into account that you will need a synchronization object in order to control access to that 'magic'-array of calculated values.

Hi Sergey,

thanks a lot for taking the time to check the test case so far. I am aware that the actual output would need proper synchronizing / ordering in a real application. However, that was not the issue I wanted to report. (I deliberately kept the output statements in the test case in this simple way.)

The actual problem is that TBB stops execution of the flow graph prematurely, but this happens only in V4.1 Update 1. The version you were using (4.0 Update 3, commerical-aligned release) works fine in that respect.

Hi Michael,

>>The actual problem is that TBB stops execution of the flow graph prematurely, but this happens only in V4.1 Update 1...

I won't be able to investigate your test-case with TBB v4.1 Update 1 but I would suggest you to compare differences in source files for the Flow Graph functionality with Microsoft's Windiff utility. I think this is the simpliest way to detect what was changed. As soon as you see these differences you can try to debug in order to see how the new version of Flow Graph works and why it stops execution.

Hi Michael,

>>...In the transition from TBB "4.1" to "4.1 Update 1"...

Could you post an example of the output ( possibly partial... ) when calculating squares for 14 or 28 numbers, please?

Hi Sergey,

> Could you post an example of the output ( possibly partial... ) when calculating squares for 14 or 28 numbers, please?

Sure. I'm attaching an updated test case where the output is done with printf(), and the number of numbers is configurable through the NUMNUMBERS #define. (edit: I also changed the squaring-node concurrency from unlimited to serial, but that doesn't affect the issue).

Attached is the output of the program for the two cases NUMNUMBERS=14 and NUMNUMBERS=28, respectively, for the "good" TBB V4.1 (20120718).

Also attached is the output of the program for the two cases NUMNUMBERS=14 and NUMNUMBERS=28, respectively, for the "bad" TBB V4.1 Update 1 (20121003). You can clearly see that the flow-graph execution terminates prematurely, independent of the actual setting of NUMNUMBERS.

Anlagen: 

Thank you.

>>...
>>Attached is the output of the program for the two cases NUMNUMBERS=14 and NUMNUMBERS=28, respectively, for the "good"
>>TBB V4.1 (20120718).
>>
>>Also attached is the output of the program for the two cases NUMNUMBERS=14 and NUMNUMBERS=28, respectively, for the "bad"
>>TBB V4.1 Update 1 (20121003). You can clearly see that the flow-graph execution terminates prematurely...

I can see it and It is interesting that only 9 numbers are sent for both "bad" cases. Did you try to compare the flow graph sources?

> Did you try to compare the flow graph sources?

Yes, I did. The difference is not a simple one-liner, however - I don't see how to quickly fix it myself. I don't fully understand the flow-graph internals yet.

>>...The difference is not a simple one-liner, however - I don't see how to quickly fix it myself. I don't fully understand the flow-graph
>>internals yet...

I hope to hear from TBB software developers regarding the problem with TBB v4.1 Update 1. I'm currently investigating another issue related to memory leaks in the latest version of TBB. It looks like the the update created a set of fresh issues / problems.

@ Sergev can you put an insight what does TBB does? I am very new to it...Thanks

Abhishek Nandy

Hi,

>>...can you put an insight what does TBB does? I am very new to it...

A very short answer: It is a highly portable library of C++ template classes that allows to parallelize some processing.

Please take a look at: http://www.threadingbuildingblocks.org and http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks

If you really interested in parallel programming I recommend you to download some latest version of TBB, for example, version 4.1. The library has many test examples which demonstrate application of different techniques to parallelize some processing.

Best regards,
Sergey

Hi Michael,

>>>>Did you try to compare the flow graph sources?
>>
>>Yes, I did. The difference is not a simple one-liner, however - I don't see how to quickly fix it myself. I don't fully understand
>>the flow-graph internals yet.

As I already mentioned I'm still using TBB 4.0 Update 3 ( Commercial-Aligned Release ) but I will be to test your test-case with:

TBB: VERSION 4.1
TBB: INTERFACE VERSION 6101
TBB: BUILD_DATE Wed, 3 Oct 2012 13:45:05 UTC
TBB: BUILD_HOST FXEOWIN16
TBB: BUILD_OS

I'll post results of my investigation later.

> TBB: INTERFACE VERSION 6101

Yes, that's exactly the version where the bug occurs. Thanks for taking care of this.

Any progress, Sergey?

I had already tried to pinpoint the problem, myself, but, because of the substantial changes (not mentioned in CHANGES!), I almost immediately concluded (not unlike Michael) that I would prefer to first have Intel's TBB team have another look, instead.

Any progress, TBB team?

Hi everybody,

>>Any progress, Sergey?

I'll be able to look on Monday and I'll keep you informed.

I'll be looking forward to your solution on Monday, then... :-)

Update...

I downloaded sources for TBB 4.1 Update 1 and after a set of simple tests I couldn't reproduce the problem described by Michael with TBB 4.1 Update 1 on a computer with a Windows XP SP3 ( 32-bit ).

Here is a complete version information of the 'tbb_debug.dll' dynamic library I used for verification:

[ TBB 4.1 Update 1 ]

TBB: VERSION 4.1
TBB: INTERFACE VERSION 6101
TBB: BUILD_DATE Wed, 3 Oct 2012 13:45:05 UTC
TBB: BUILD_HOST FXEOWIN16
TBB: BUILD_OS Microsoft Windows [Version 5.2.3790]
TBB: BUILD_CL Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
TBB: BUILD_COMPILER Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.6.369 Build 20120821
TBB: BUILD_TARGET ia32
TBB: BUILD_COMMAND icl /nologo /Qvc10 /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG
/D__TBB_LIB_NAME=tbb_debug.lib /GS /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE
/D_WIN32_WINNT=0x0501 /Qstd=c++0x /D_TBB_CPP0X /D__TBB_BUILD=1 /W3 /WX
TBB: TBB_USE_DEBUG 1

Verified cases are as follows ( for numbers ): 14, 28, 56, 112, 224, 448 and 896 ( ALL tests completed without any problems ).

Also, I did additional verification with TBB 4.0 Update 3.

[ TBB 4.0 Update 3 ]

TBB: VERSION 4.0
TBB: INTERFACE VERSION 6003
TBB: BUILD_DATE Unknown
EmptyTBB: TBB_USE_DEBUG undefined
TBB: TBB_USE_ASSERT undefined
TBB: DO_ITT_NOTIFY 1

Verified cases are as follows ( for numbers ): 14, 28, 56, 112, 224, 448 and 896 ( ALL tests completed without any problems ).

Do you want me to submit all outputs as a proof that everything worked?

Did you re-compile the test case using the corresponding TBB header files? Just exchanging tbb_debug.dll doesn't trigger the bug.

>>...Did you re-compile the test case using the corresponding TBB header files? Just exchanging tbb_debug.dll doesn't trigger the bug.

I will do additional verification ( unfortunately not today... ). Thanks for the note.

>>...I use Visual Studio 2012 to compile...

What edition of Visual Studio do you use?

[ To TBB developers ]

Are there any updates regarding the problem?

> What edition of Visual Studio do you use?

Visual Studio Professional 2012, Version 11.0.51106.01 Update 1
Visual C++ 2012

[ A statement from Raf ]
>>...I had already tried to pinpoint the problem, myself, but, because of the substantial changes (not mentioned in CHANGES!), I almost
>>immediately concluded (not unlike Michael) that I would prefer to first have Intel's TBB team have another look, instead.

Michael, I put on hold any attempts to understand what is wrong until we hear anything from Intel TBB team. I downloaded TBB sources v4.1 Update 1 and I had too many liitle issues / problems from the beginning. Unfortunately, I can not commit more time on fixing some linker related errors, etc. not related to programming, and I support Raf's point of view.

Note: I use TBB v4.0 Update 3 for a long time and I see that a decision to use that version is right. Take a look at my thread related to unresolved externals in TBB v4.1 Update 1 when sources are built with VS 2008 Express Edition. Even if I have VS 2008 Professional Edition I don't have time any more.

Dear all, is there any news regarding this issue?

BTW, I am not sure about the status of this forum - is this part of the open-source TBB community, or is it the official channel through which to report TBB bugs to Intel? Also, I was wondering is there any source repository for TBB where I can track who made what changes to the open-source version?

I have no idea why the TBB team haven't picked this up yet, as they are evidently following the forum.

If you follow threadingbuildingblocks.org's Site Map (link at the bottom of the home page) through "Contribute"/"Contact Us", you will find the text "Please use the Intel TBB forum to discuss issues, for technical support, or to report bugs." and a few recommendations on how to report a bug. There does not seem to be a publically accessible source repository, but you can download several recent versions.

>>...BTW, I am not sure about the status of this forum - is this part of the open-source TBB community, or is it the official channel
>>through which to report TBB bugs to Intel?

Yes, and you see how it works, unfortunately.

Also, you could compare a quality of support ( that is a number of posts ) on Intel TBB forum with Intel Fortran forum. Take a look, please ( just for 30 seconds... ).

>>In the transition from TBB "4.1" to "4.1 Update 1" something seems to have gone wrong...

Hi Michael,

As you can see there are No any responses from Intel TBB team. I could suggest you to "rollback" version of TBB and use the working one, that is TBB v4.1 or TBB v4.0 Update 3.

Best regards,
Sergey

One more time...

[ To TBB developers ]

Are there any updates regarding the problem?

@ Sergev I have downloaded Intel Parallel Studio XE 2013 ,Is TBB avaialble as default with it? how can I access the TBB from Intel Parallel Studio?

Abhishek Nandy

>>...I have downloaded Intel Parallel Studio XE 2013 ,Is TBB avaialble as default with it?...

It has to be included and please take a look at: http://software.intel.com/en-us/intel-xe-product-comparison for more details.

Hello, Michael,

Thank you very much for the report, and especially for the test case.  The transition to scheduler bypass for flow::graph occurred between 4.1 and 4.1 update 1.  There was a bug in the conversion of the code in continue_receiver.  The fix is to apply the following patch to include/tbb/flow_graph.h:

Index: flow_graph.h
===================================================================
--- flow_graph.h        (revision 9803)
+++ flow_graph.h        (working copy)
@@ -234,6 +234,7 @@
                 my_current_count = 0;
         }
         task * res = execute();
+        if(!res) return SUCCESSFULLY_ENQUEUED;
         return res;
     }
 

I have attached the patch file to this message.  (The ".h" at the end was to make it an allowable file type to upload.)

Again, thank you for the report.  I am very happy that people are using the code.

Regards,

Chris Huson

Anlagen: 

Hello Chris,

thanks a lot for the patch. Now the flow graph works flawlessly again!

Michael

hello,

the fix is also available in the 4.1 update 2

--Vladimir

Hi Vladimir,

no, I just looked. The patch posted by Christopher Huson has not made it into tbb41_20130116_oss_src.tgz :-( Sorry...

- Michael

oops sorry, my mistake. the build was done on 01/16 but the fix dated 01/18.

I see the fix has made it into V.1 Update 3 (tbb41_20130314) - thanks!

Melden Sie sich an, um einen Kommentar zu hinterlassen.