I first have to mention that this is my first parallelization experience and choose TBB.

I wrote a little MFC/C++ application computing light interferences. I recently tried to paralellize the calculations using TBB and this works well with a strong speed improvement. Nevertheless, when I do activate the GUI refresh inside a TBB thread (a CProgressCtrl and a CStatic) my application freezes after a few steps.

Please notice that my TBB "loop" is a simple parallel_for.

I tried to use a TBB mutex around the GUI refresh code (perhaps MFC does not like threaded GUI refresh ??) but this does not work better...

Any idea ? Any Solution ... ?

Best Regards ...

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You could try a separate thread for GUI with dispatch loop and events for update requests. You could find an example in {tbb}/examples/common/gui/gdivideo.cpp which serve for all graphical examples of TBB.

ok, thanks for your help, I will try. Just a little question, is there a "conceptual" reason for which this should not work ?

Best Regards

I'm not expert in MFC, so I have only general thoughts:

  • TBB threads were not initialized to use graphics.
  • You protected only tbb threads from concurrent calls to GUI but didn't protect a main dispatch loop from collisions with calls from tbb threads.
  • A graphics refresh code can block a thread with acquired mutex. So, here is a deadlock situation possible.

It looks like you are using the following simple pattern:

1) Main application thread starts MFC dispatch loop
2) In one of the message handlers (let's call it MyMsgHandler) you start tbb::parallel_for
3) I did not understand what activation means, but anyway in the body of parallel_for you
3.1) either do something with existing GUI control (like enable/disable it or change its text)
3.2) or create new control instance with existing window (like frame or view) being its parent.

If this is so, then the problem is caused by the fact that any operation mentioned in 3.1 and 3.2 results in sending messages to both the control and its parent. Since the parent (and in case of 3.1 the control as well) was created by the main GUI thread, the messages will be routed by Windows into this thread. And the sender thread will be blocked until the message is processed.

But the main GUI thread will be able to process those messages only when it gets into its dispatch loop. But it cannot get there until MyMsgHandler returns, which cannot return until tbb:paralle_for finishes. But tbb:paralle_for will never finish because its worker threads are blocked in SendMessage calls as described above. (SendMessage is used iternally by control implementations).

Thus we have a classic deadlock caused by sending window messages using blocking style between threads.

Thanks a lot for your very clear answer. In my previous mail, "activate" means "uncomment the line in my code that asks the progress bar to change its value".

Thanks again !

Glad that my description helped. Just in case if someone ever wonders how to workaround a problem of this sort, the simplest way is to post a custom (user) message containing necessary data to the GUI thread by means of PostMessage. In contrast to SendMessage, Post Message does not block the sender thread, it acts in the fire-and-forget manner.

The handler of such custom message will be invoked in the GUI thread (or, in general, in the thread owning the window this message has been sent to), extract the information from the message, and apply it to the same or any other window (by using SendMessage or one of the numerous windowing API calls).

Jeffrey Richter's Advanced Windows has a wonderful chapter discussing how Windows messaging works and how it could be used in multithreaded environment.

I tried just now the solution you talk about in your previous message. The good news is that the application freezes no more. I use "mDialog->PostMessageA" from the TBB thread with a custom message number & data, and I added this custom message to the dialog's message map. This works "not so bad" : the application freezes no more and the custom messages are well seen by the dialog... But these messages are received by the dialog only once the TBB calculation is finished ....

I analyzed the program using the debugger and saw that the application works with 2 only threads (seems to be normal with a Core2Duo) : the application one and a new one. I suppose that's de reason why the messages are received once the calculation is finished : the event loop can't be executed if the main thread is used for the TBB calculation.

So ... what kind of solution may I use to receive the messages "in real time" ? Should I add a timer event to the Event Loop (seems to be dirty ?) May I a create new thread for the dialog only (If "yes", can I mix TBB & Windows threads ?) ? Something else ?

Thanks again,


Oh, sure, it was my fault that I mentioned only part of the standard pattern. Your last idea (about a separate thread) is very close. Usually GUI apps perform all GUI-related activities in the main GUI thread, and when they need to do a lengthy data processing, they offload it into separate background thread.

In your case you could start such a background thread in your initial MyMsgHandler and return from the handler immediately. Meanwhile the background thread will initialize TBB and start parallel_for. Since the GUI thread is not blocked anymore it will process all the status messages from the TBB worker threads. When the background thread finishes its work it can notify the GUI thread by posting or sending another custom message.

By the way, now that you start TBB algorithm from a background thread using SendMessage becomes safe. But PostMessage is still preferrable from the performance standpoint, because SendMesage would serialize TBB workers (convoying problem).

Another caveat is that the approach I described may result in oversubscription if you construct tbb::task_scheduler_init object with default settings. The degree of oversubscription depends on how much processing the GUI thread does. And if it is constantly doing heavy UI updates (e.g. involving fancy graphics) you could be better off by initializing TBB scheduler in the following way:


Yet on a dual core system I would probably tolerate some degree of oversubscription compared to potential losing almost half of computing resources. This is one of the cases when explicit thread priority control is safe and beneficial. You can boost the GUI thread's priority so that it remains responsive even in case of oversubscription.

At last, if your app uses TBB algorithms repeatedly you could save a bit if you initialize TBB scheduler only once at the app strartup (and terminate it before exit), though the saving may be not that critical for a GUI application (users will not notice a few microseconds delay).

Thanks to your advice, my application now perfectly works. The only thing I have to refresh during the calculation is a progress bar, thus the TBB default init works very well on a Core2Duo.

I have a last question : may I embed the "PostMessageA" calls into a scoped_lock ? I tried with and without scoped_lock, the program behaviour does not seem to change except that the calculation time is 10% to 15% longer with the lock.

Thanks once more, all that helped me a lot.

You are always welcome on this forum!

Regarding your question, it's a good one. Not so long ago DDJ published Herb Sutter's article "Avoid Calling Unknown Code While Inside a Critical Section". It's both short and well written, so you definitely won't waste your time if you take a look. As the spin mutex is just a variant of critical section the advice embodied in the article's title applies to your case as well.

To recap the whole matter shortly, the most serious issues with calling 3rd party APIs from inside the locked regions of your code are:

  1. Potential for a deadlock if that API uses callbacks into your application
  2. If that API call takes a long time to execute, the probability that other threads will have to wait for your lock to be released dramatically increases.
  3. If neither 1) nor 2) are the case, there is still some degree of probability that the next version of that 3rd party API will do 1) or 2). That is calling foreign functions from your locked codewill never stops being a delayed-action mine.

The issue 2 is probably what you observed in your case. Contention on the lock results in lock convoying (one or more threads waiting while the first one finishes its work in the locked region), and in case of more universal locks (like critical section or mutex on Windows) it causes latecomer threads to relinquish their time slices and go to the OS kernel (quite expensive operation) to sleep.

In any case calling PostMessage from inside a lock makes little sense because you normally uses it in either of two ways:

  1. To pass some simple data that can be safely copied by value
  2. To pass a pointer to shared data, and there is a guarantee that these shared data will not be invalidated until the message is processed

If you pass a pointer to a shared data, and there is NO guarantee that they will be intact untill processed by the message recepient, then you are in trouble. Placing PostMessage call inside a lock won't help in this case, because it will return way before the message will be processed by the destination window proc.

Very intersting article, I'm not very familiar with deadlocks and it's now much more clear in my mind.

Nevertheless my question is more linked to the idea of "threadable" and "non-threadable" library. For example, some API use internal static or global vars, and thus it's dangerous to have two parallel calls to such APIs without embedding it in a lock (this article shows that it's dangerous to use locks too, thus I'm a little bit disappointed ;) ). For example, I think (am I right ?) that memory allocation during multiple-threading may be dangerous. So "where" can I find this kind of information ? Are all Windows / MFC APIs threadable ? Is the "PostMessage" call threadable ? Is there something better than "try and pray" ?

Thanks again !

Now that's another good question. The most obvious answer is that API provider should document its thread safety properties (that's what you probably mean when say "threadable"). The most important part of a function's thread safety characteristics is whether it is safe to call it concurrently.

Unfortunately MFC, probably because it is mainly just a thin wrapper above the Win32 API, does not provide such a specification. Even worse, descriptions of the corresponding Win32 API functions also do not state thread safety properties directly, though some of them imply that they should work safely in multithreaded environment.

The lack of explicitness in the reference documemtation is compensated by technical articles like "Multiple Threads in the User Interface" (this one is as old as multithreading in Windows), and various books like already mentioned Jeffrey Richter's "Advanced Windows".

To put it shortly, windowing APIs in Win32 and beyond from the very beginning havebeen designed to work safely in multithreaded applications. So you do not need any extra locks to "help" them.

A notable exception from this rule is stated explicitly in MSDN (though a bit inexpectedly in "DLLs, Processes, and Threads" section). Quotation:

"To enhance performance, access to graphics device interface (GDI) objects (such
as palettes, device contexts, regions, and the like) is not serialized. This
creates a potential danger for processes that have multiple threads sharing
these objects. For example, if one thread deletes a GDI object while another
thread is using it, the results are unpredictable. This danger can be avoided
simply by not sharing GDI objects. If sharing is unavoidable (or desirable), the
application must provide its own mechanisms for synchronizing access."

From my past experience (though I have not closely dealt with Windows GUI suff for some 5 years already, so don't take the following for an absolute truth), even some GDI objects can be accessed concurrently without ruinous consequences. For example to avoid serialization you probably could draw in the different parts of the same DC from multiple threads. But while doing so you should not change the context attributes (like different selecting pens, brushes, etc.). In any case this is exactly one of the "try and pray" cases as you aptly put it.

And at last, since thread safety rarely comes cheap, many windowing API functions use locks internally or even may dive into the kernel. This means that their frequent use by worker threads will serialize your parallel work and may seriously impact its scalability.

ok, thanks again for your answer. Thus, like in many situations with Windows, the "try and pray" seems to be the only solution ;)

Nevertheless, if I assume that "PostMessage" is thread-safe, it is a good solution in the case of a low CPU cost GUI : this avoids using my own locks and allows to obtain a GUI message queue in the main app thread. Thus, I don't have to be careful about the thread-safe property for all the others APIs.

I think I will continue to use this kind of application design, up to my next problems ... and next questions ! ;)

Thanks again, all that helped me a lot !

Leave a Comment

Please sign in to add a comment. Not a member? Join today