TBB on linux segfaulting

TBB on linux segfaulting

I have a cross platform program I've written that uses either TBB or OpenMP (not at the same time!). The code is in every respect cross platform, but I initially developed it in Visual Studio.

The program is a framework for developing n-body models.

On Windows it works perfectly, compiling with no errors or warnings, and runs as expected. On Linux (Ubuntu 64 bit) it compiles well, just one warning that wouldn't effect the TBB code, but when I run it, as soon as it tries to begin the first parallel_for class, it segfaults.
I tried using openMP, and the code ran perfectly. Note that the OpenMP and TBB code do not interfere with each other, using them at the same time isn't possible.

Since the parallel code is in my Runge Kutta 4th Order Integrator, I decided to comment out the first Parallel_for block and see if that had a bug in it (replacing it with equivilent serial code), but instead it just segfaults on running the next one it encounters. I deduce from this, and the fact that the same code runs perfectly on Windows, that it is the act of using TBB which is causing the segfault, not the code itself.

On Linux I've compiled it using GCC, with the project managed by Code::Blocks. I've linked tbb, and initialised tbb with a call to
tbb::task_scheduler_init init;

Have I missed something about using TBB on Linux?

I haven't posted lots of code because I rather think the issue is one of setting up TBB, rather than a problem in my code. I can do though, but the project is rather large, so it would be a fair bit to go through.

23 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I appear to have solved my own problem. When initialising TBB I was doing this

tbb::task_scheduler_init init;

When I should have been doing this:

tbb::task_scheduler_init automatic;

Quoting - carey.pridgeongmail.com
I appear to have solved my own problem. When initialising TBB I was doing this

tbb::task_scheduler_init init;

When I should have been doing this:

tbb::task_scheduler_init automatic;

The two lines are equivalent; the only difference is the name of the variable that controls lifetime of the TBB scheduler. And either should work just fine.
If you did not misprint anything in the above post, the problem might have been some naming conflict.

Also, in TBB 2.2 the scheduler can be initialized lazily on first use without creating a singletask_scheduler_init instance.

Quoting - Alexey Kukanov (Intel)

The two lines are equivalent; the only difference is the name of the variable that controls lifetime of the TBB scheduler. And either should work just fine.
If you did not misprint anything in the above post, the problem might have been some naming conflict.

Also, in TBB 2.2 the scheduler can be initialized lazily on first use without creating a singletask_scheduler_init instance.

That's interesting. I didn't realise they were equivilent, I'd just followed a tutorial, I'm new to TBB.
I didn't mis-type anything, so it may have been some other issue that I wasn't aware of.

I'm using the TBB version that comes with Ubuntu 9.10 64 bit. It may be that there is an issue with their build for the 64 bit platform.

Ok, not so good (and not solved either after all, how do I undo that?)

I fired up my PC today, ran the compiler again, and ... The program is back to segfaulting. I retreived a binary that worked many times over yesterday, and that is also segfaulting.

This is really confusing.

I ran some tbb examples (seismic and polygon overlay), and they work fine, so it's not tbb at fault.

The exact same code works perfectly on WIndows (VC8), and has done for some time.

It's here: http://www.politespider.com/moody/releases/moody-src-v1.1-all-18.11.2009...

With the linux build acheived via code::blocks. I'm working on a cmake version, which is having the same segfault problem, so it's not code::blocks.

I'm running 64Bit Linux (Ubuntu). I can't see why that would be the problem though, since the tbb examples run just fine, as I said.

I removed the initialisation code - no difference.

I'm not using global namespaces, everything is explicitally named (std:: tbb:: and so on)

I stepped through the program in the Code::blocks debugger, and that wasn't terribly informative.

All I know for certain is that as soon as the program actually launches a parallel_for thread, I get a segfault.

I prepare my parallel for code this way, in a seperate .h file from the class that uses it:
(This is the simplest of my parallel_for classes)

class UpdateExternalReferenceVectors {
Particle *const lp;
public:
void operator()( const tbb::blocked_range& r ) const {
Particle *localParticles = lp;
for( size_t i=r.begin(); i!=r.end(); ++i ) {
localParticles[i].setExternalReferenceVector();
}
}
UpdateExternalReferenceVectors( Particle localParticles[]) :
lp(localParticles)
{}
};

Then to call it I have the following wrapper in the class that uses it

/**
* local method that calls the TBB wrapper for the setting of external reference vectors
*/
void Capsule::copyCurrentPositionToExternalReferenceVectors(Particle localParticles[], int numParticles) {
tbb::parallel_for(tbb::blocked_range(0,(size_t)this->particleSetLength), UpdateExternalReferenceVectors(localParticles));
}

If I comment out this particuler parallel_for, which is the first called, and replace it with a serial loop, the program segfaults on meeting the next.

I'm seriously going round in circles here. I can't figure out what on earth is going wrong.

No core dump to investigate? What happens if you run the program under gdb?

Quoting - Raf Schietekat

No core dump to investigate? What happens if you run the program under gdb?

I ran it in the debugger for Code::Blocks (gdb I do beleive), and got this:

#0 ( task_group_context(this=0x7fffffffe540, relation_with_parent=tbb::task_group_context::bound) (/usr/include/tbb/task.h:284)
#1 0x409c76 tbb::internal::start_for, UpdateExternalReferenceVectors, tbb::simple_partitioner>::run(range=..., body=..., partitioner=...) (/usr/include/tbb/parallel_for.h:78)
#2 0x409a23 tbb::parallel_for, UpdateExternalReferenceVectors>(range=..., body=..., partitioner=...) (/usr/include/tbb/parallel_for.h:130)
#3 0x407c97 Capsule::copyCurrentPositionToExternalReferenceVectors(this=0x631010, localParticles=0x6cf4a8, numParticles=60) (/home/carey/moody/capsule/Capsule.cpp:156)
#4 0x407a14 Capsule::moveParticlesRK4_TBB(this=0x631010, distance=60) (/home/carey/moody/capsule/Capsule.cpp:93)
#5 0x407b91 Capsule::iterateRK4_TBB(this=0x631010, steps=1) (/home/carey/moody/capsule/Capsule.cpp:141)
#6 0x4049e9 main(argc=1, argv=0x7fffffffe8e8) (/home/carey/moody/TestMain.cpp:65)

The top entry being the last made. After that I got the message 'Program received signal SIGSEGV, Segmentation fault.'

It seems that the crash does indeed happen inside tbb when the parallel_for starts, if I'm reading it correctly.

What TBB version?

Quoting - Raf Schietekat

What TBB version?

libtbb2 (= 2.1r017-1)

Karmic Koala Ubuntu x64 package

I've compared with the sources, but Ifail tosee anything obvious, soI would recommend turning your computer off and back on (just kidding), maybe using a more recent TBB (no use trying to figure out something that may already have changed), perhaps building TBB yourself from the source distribution if you aren't doing that already (but maybe I'm just paranoid), and trying to single-step through the code to pinpointexactly what is happening(from a breakpoint on "init();").

(Added) Perhaps to make sure: is the task_scheduler_init instance still alive in the thread that calls parallel_for, assuming that this happens in a thread you created yourself and not inside a TBB task? If the task_scheduler_init instance is destroyed before that time because it goes out of scope somehow, it has no use. (I haven't kept track of which TBB version waived the requirement on additional user threads to have their own task_scheduler_init instance, if that should apply here.)

(Added) But it works on Windows, so...

Quoting - Raf Schietekat

I've compared with the sources, but Ifail tosee anything obvious, soI would recommend turning your computer off and back on (just kidding), maybe using a more recent TBB (no use trying to figure out something that may already have changed), perhaps building TBB yourself from the source distribution if you aren't doing that already (but maybe I'm just paranoid), and trying to single-step through the code to pinpointexactly what is happening(from a breakpoint on "init();").

Thankyou very much for trying.

I've decided that I'm going to re-install Linux anyway. I upgraded to karmic koala rather than installing it from scratch, and it killed my sound and video, and keeps telling me disks that I know to be healthy are about to die. I'm thinking it might be as simple as my Linux install is borked.

When I do this I'll try the ubuntu package again one time, then if I still have problems, I'll compile my own and use that.

edit: I tried single stepping through, but reached the same point. To be honest the output didn't mean much, as I was deep in the tbb code when the crash happened, and not really getting what I was seeing.

Quoting - Raf Schietekat

(Added) Perhaps to make sure: is the task_scheduler_init instance still alive in the thread that calls parallel_for, assuming that this happens in a thread you created yourself and not inside a TBB task? If the task_scheduler_init instance is destroyed before that time because it goes out of scope somehow, it has no use. (I haven't kept track of which TBB version waived the requirement on additional user threads to have their own task_scheduler_init instance, if that should apply here.)

(Added) But it works on Windows, so...

It may well be that the windows build of TBB is newer than the one in Linux. I *think* it is.

However, the tbb examples ran in my Linux install too, so that means something works. I can't see anything substantially different in the example code from my own in the way I use tbb. Not yet anyway.

When I thought I'd fixed it before it was the task scheduler I altered, although I was then told the alteration shouldn't have been meaningful. The issue does seem linked to the library not starting properly in Linux for my app (I think anyway).

Some re-organisation of code may be required, but the re-install first, since that might fix everything, cause the sun to shine, birds to sing etc...

Quoting - Carey
It may well be that the windows build of TBB is newer than the one in Linux. I *think* it is.

However, the tbb examples ran in my Linux install too, so that means something works. I can't see anything substantially different in the example code from my own in the way I use tbb. Not yet anyway.

When I thought I'd fixed it before it was the task scheduler I altered, although I was then told the alteration shouldn't have been meaningful. The issue does seem linked to the library not starting properly in Linux for my app (I think anyway).

The symptoms you describe are consistent with not having a task_scheduler_init object defined at the time your system encounters the first parallel_for. Having an older TBBversion on your Linux platform is also consistent with a possible scenario that also fits all the symptoms. The issue you had earlier (tbb::task_scheduler_init init vs tbb::task_scheduler_init automatic) is that init and automatic are in the position of the object name in the C++ syntax (nothing particular about TBB), so that code names the task_scheduler_init object by one of those identifiers. And the first creation of such an object also initializes the TBB run-time system. Subsequent creations of task_scheduler_init objects from other threads will bumpa reference count but will not recreate the thread pool and the rest of the mechanism that occurs with the creation of that first object.

That set of TBB run-time mechanisms will hang around as long as one of those task_scheduler_init objects still lives. The symptoms you describe sound like what happens when a thread calls parallel_for without the run-time mechanism initialized. Now, with TBB 2.2, an automatic mechanism for assuring the initialization of the run-time system has been added, so the requirement to create a task_scheduler_init object before the first parallel_something call has been relaxed. So the symptoms you describe are consistent with a program that either doesn't keep a task_scheduler_init object around long enough OR calls the parallel_for from a thread that has not created its own task_scheduler_init object before making the call. And still consistent considering the possibility that your Linux implementation is using a TBB version requiring this diligence but the Windows implementation using 2.2.

Quoting - Robert Reed (Intel)

The symptoms you describe are consistent with not having a task_scheduler_init object defined at the time your system encounters the first parallel_for. Having an older TBBversion on your Linux platform is also consistent with a possible scenario that also fits all the symptoms. The issue you had earlier (tbb::task_scheduler_init init vs tbb::task_scheduler_init automatic) is that init and automatic are in the position of the object name in the C++ syntax (nothing particular about TBB), so that code names the task_scheduler_init object by one of those identifiers. And the first creation of such an object also initializes the TBB run-time system. Subsequent creations of task_scheduler_init objects from other threads will bumpa reference count but will not recreate the thread pool and the rest of the mechanism that occurs with the creation of that first object.

That set of TBB run-time mechanisms will hang around as long as one of those task_scheduler_init objects still lives. The symptoms you describe sound like what happens when a thread calls parallel_for without the run-time mechanism initialized. Now, with TBB 2.2, an automatic mechanism for assuring the initialization of the run-time system has been added, so the requirement to create a task_scheduler_init object before the first parallel_something call has been relaxed. So the symptoms you describe are consistent with a program that either doesn't keep a task_scheduler_init object around long enough OR calls the parallel_for from a thread that has not created its own task_scheduler_init object before making the call. And still consistent considering the possibility that your Linux implementation is using a TBB version requiring this diligence but the Windows implementation using 2.2.

Thanks a lot for this. I'm releived to know that it's the task scheduler issue, and not something that means I have to re-write all my code.

I'm re-installing Linux at this very moment. When that's done I'll make the changes to my code that will make sure I have a task_scheduler_init object available at the right times.

I may also install the version of tbb from this site, since that will ensure I've got the same version on each platform.

This issue is going to cause potential problems with distributing my code, since it is intended to be used as a framework on which to build concurrent n-body models.

Possibly this means I will need to distribute tbb with my program and tell CMAKE where to find the one I provide, or simply make sure the initialisation code that will be ignored by later versions of tbb is still there anyway.

Much to do, but at least there is progress :)

Thanks everyone for the assistance, this forum really is excellent.

Ok, It's working again now. The final answer was as follows.

Ubuntu 9.10 has an older version of TBB, so I still have to use the init object. I've placed the call to tbb::task_scheduler_init automatic; inside the method that operates my parallel integrator for n steps, which is stopping the problem.

I did try putting it in the class constructor, but that doesn't work, shame that, it would have been handy, but obviously the init object is lost once the constructor loses scope. I can see why they decided to do away with it.

I'll leave it as it is, with the init calls that will be ignored by 2.2 but used by earlier versions rather than suggest that people install their own tbb instance to use my software, or supply tbb 2.2 with my source. That seems the simplest solution.

Thanks all for the help.

Carey

Quoting - Carey
I did try putting it in the class constructor, but that doesn't work, shame that, it would have been handy, but obviously the init object is lost once the constructor loses scope. I can see why they decided to do away with it.

I'll leave it as it is, with the init calls that will be ignored by 2.2 but used by earlier versions rather than suggest that people install their own tbb instance to use my software, or supply tbb 2.2 with my source. That seems the simplest solution.

I'm glad to hear you got it working. Regarding the idea of putting the task_scheduler_init object creation in your class constructor, I can imagine scenarios where that should work, but the principle thing is that the lifetime of that task_scheduler_init object must exceed that of any processing that uses the TBB task scheduler. I can imagine that either as a member object or pointer to such an object as part of a user class whose object has a sufficient lifetime.

One other clarification. The init calls are not ignored by 2.2--they will be used as in previous versions. However, with 2.2 there's an alternate means to initialize alocal scheduler if the task_scheduler_init objects are missing. Either way, the code should work, as you've demonstrated.

Can anyone explain why init() wasn't on the stack when the segmentation fault happened?

Anyway, if task_scheduler_init is indeed the issue, perhaps it should be (possibly redundantly) clarified that this object must be created on the same thread that uses TBB and most probably also destroyed on the same thread where it was created, so having it as a member variable, direct or indirect, may obscure and sabotage that assumption, and making it an automatic variable instead is the way to go.

Also, please don't call them "init calls", despite the unfortunate class name: task_scheduler_init must be created as a non-temporary object (hence the need to give it a variable name), so that its lifetime is long enough to serve its purpose. There should be one long-lived instance, e.g., in main(), because creating the first one and destroying the last surviving one are significantly expensive events, and before 2.2 any (non-trivial) number of additional instances where it is not certain that the code is on a user thread that already has such an instance, but these then have negligible overhead.

Quoting - Raf Schietekat
Can anyone explain why init() wasn't on the stack when the segmentation fault happened?

Anyway, if task_scheduler_init is indeed the issue, perhaps it should be (possibly redundantly) clarified that this object must be created on the same thread that uses TBB and most probably also destroyed on the same thread where it was created, so having it as a member variable, direct or indirect, may obscure and sabotage that assumption, and making it an automatic variable instead is the way to go.

Also, please don't call them "init calls", despite the unfortunate class name: task_scheduler_init must be created as a non-temporary object (hence the need to give it a variable name), so that its lifetime is long enough to serve its purpose. There should be one long-lived instance, e.g., in main(), because creating the first one and destroying the last surviving one are significantly expensive events, and before 2.2 any (non-trivial) number of additional instances where it is not certain that the code is on a user thread that already has such an instance, but these then have negligible overhead.

I don't know what was on the stack because I never ran the code, but your advice about notcalling them "init calls" is good--I was trying very carefully to describe the operations as object constructions to emphasize the notion of lifetime throughout my reply but I slipped up once as you noticed. I would expand the warning above to say that before TBB 2.2, it was a requirement that every thread that uses the TBB task scheduler must first register itself as a master thread by creating such an object.

"I don't know what was on the stack because I never ran the code"
I was referring to #6.

"I slipped up once as you noticed"
Actually, I didn't notice. :-)

"every thread"
Of course: my brain thought "non-zero" but my fingers typed "non-trivial" (and the rest wasn't very clear either).

Quoting - Robert Reed (Intel)

I don't know what was on the stack because I never ran the code, but your advice about notcalling them "init calls" is good--I was trying very carefully to describe the operations as object constructions to emphasize the notion of lifetime throughout my reply but I slipped up once as you noticed. I would expand the warning above to say that before TBB 2.2, it was a requirement that every thread that uses the TBB task scheduler must first register itself as a master thread by creating such an object.

I'm having an issue with this lifetime thing.

As I understand it, the lifetime of the task_scheduler_init object is within the scope in which it is created. I had thought this meant I could create one in the constructor of the class that uses TBB, but this does not work. I beleive this is because the init object only lasts as long as the constructor call.

Nor does it appear that I can create atask_scheduler_init object in the scope of main() and have it alive for calls to the object methods that contain my TBB integrator.

For example the following code in main:

tbb::task_scheduler_init automatic;
for (int i = 0; i for (int j=0;j experiment->iterateMidPoint_TBB(1); // the TBB calling method
}
}

does not work.

but when I create the task_scheduler_init object inside iterateMidPoint_TBB(int steps) within the scope of the TBB code it does work on Linux with its older TBB install, but on Windows, using 2.2 it fails with the following:

thread_monitor::launch: _beginThreadex failed

After a while, when the integrator is only moving a small number of particles, and is thus very fast indeed, creating and destroying thousands of task_scheduler_init objects a second.
This may also be the case for the Linux version as well, I only tried that with a larger number of particles (60) on a more accurate, but slower version of the integrator, and had no problems, I've just started using a small number of particles (5) for an experiment I'm writing.

I can deal with this, I'm just going to have to require that experimentors use 2.2

What I'm wondering is, when I create the task_scheduler_init object in the scope of main, can it not stay live for calls to object methods made within main? It seems unwise to be creating so many task_scheduler_init objects.

Removing the task_scheduler_init object code in windows stops this problem incidentally.

Certainly TBB is designed to let you write:

    int main( int argc, char* argv[] ) {
        tbb::task_scheduler_init init;
        ...do rest of program logic...
    }

Indeed that is normally the way I use it and we havemanyunit tests that use it in a way similar to this. If it segfaulted in this usage model on any of our many test platforms, we would have noticed.

I can conjecture a way the broken code with races might appear to have less chance of segfaulting if the task_scheduler_init is constructed at a more inner scope. Suppose the broken code as a latent race, that if exposed, causesa segfault. When TBB constructs a task_scheduler_init object, it does not wait for the worker threads to get started. If a task_scheduler_init object exists for a very short time, the parallel code may finish before the worker threads get a chance to help. Thus the race (and consequent segfault) will not happen.

Quoting - Arch Robison (Intel)

Certainly TBB is designed to let you write:

    int main( int argc, char* argv[] ) {
        tbb::task_scheduler_init init;
        ...do rest of program logic...
    }

Indeed that is normally the way I use it and we havemanyunit tests that use it in a way similar to this. If it segfaulted in this usage model on any of our many test platforms, we would have noticed.

I can conjecture a way the broken code with races might appear to have less chance of segfaulting if the task_scheduler_init is constructed at a more inner scope. Suppose the broken code as a latent race, that if exposed, causesa segfault. When TBB constructs a task_scheduler_init object, it does not wait for the worker threads to get started. If a task_scheduler_init object exists for a very short time, the parallel code may finish before the worker threads get a chance to help. Thus the race (and consequent segfault) will not happen.

While I'm not 100% sure there is no problem with the code (who ever can be?) I've tested the code with OpenMP, and had no problems, and this problem does not appear on windows with 2.2, it only happens if I use a previous version of TBB on linux, and thus have to create the task_scheduler_init object. Of course this might just be because not having to create the task_scheduler_init object means the error is being hidden.

I can feel a headache coming on..sigh..

I don't *think* it should generate a race condition, each particle class in my n-body model only handles its own movement, there are no writes to other particles, so each particle can run in its own thread without the need for critical sections.

I also made sure that those members of a particle class that would be accessed by other particles during integration (to find it's position during gravitational force calculation) are copies of those members, not the ones the particle class itself is using, so there is no reading of class members that are also being written to.

For example, a particle class holds its own x axis position in a member particle->x, and modifies it during integration, but before parallel integration happens, it copies this to particle->x_ext, and that is what other particles read.

There are simultanious reads, but I'm not aware of that being an issue.

I've stopped any possibility of two particles trying to read and write to the same class members at the same time, I just can't see how a race condition can emerge.

There is one thing though. to get my array of particles into the tbb parallel_for I pass a pointer to the array. This array is an array of pointers to objects, not an STL vector or anything nice like that, because they are too slow.

Might it be that I'm using the array this way that I have the issue?

I can probably re-write so I don't have to do it this way. Not quite sure how just yet, I still can't use STL.

edit:: I had my tbb code in a seperate header that I linked in to the class that called it, and wrote wrappers to make class members that called the tbb code. Very convoluted and probably silly. I just moved the tbb code into the class that calls it, rather than that header, and did away with the wrappers.

As a result I'm no longer getting the thread_monitor::launch: _beginThreadex failed issue. At least not in ten runs of the program, where before it happen each time, about 3/4 through thr run.
That's promising. I'll boot into Linux and see if the segfault problem is also gone.

I do beleive I now know what the problem was.

I had my tbb code badly organised (see edit on previous post). There weren't any race conditions, but it does seem my attempt to keep all my tbb code in a seperate header was a bad plan. I can't see why, since it ran on the latest TBB, so obviously that can cope, but the older version in Ubuntu couldn't, so there must be something about that aproach which causes issues.

A little re-organisation of code, getting rid of that header and putting the tbb code inside the class that was calling it, and the correct aproach for tbb::task_scheduler_init

    int main( int argc, char* argv[] ) {
        tbb::task_scheduler_init init;
        ...do rest of program logic...
    }


Now works as it should. I've run the program 20+ times, and the immediate segfault isn't happening, it's just running as it should.

Time for a cup of tea and a moment of being releived.

Leave a Comment

Please sign in to add a comment. Not a member? Join today