Okay, I'm baffled. All of the literature I've been able to access says it's "better than threads", "18x faster than threads", etc., but I haven't found anything yet that explains the actual mechanism by which this is all implemented. Templates are nice, but it all has to be code (maybe even machine code) eventually.
How is this accomplished without calls to clone(1) under Linux? Or, if it does use clone(1), why would startup be so much faster?
Also, is it possible to use tbb in a .so file, where the loading program has no knowledge of it? It seems like if clone is involved, the parent program might be surprised to have children, possibly be receiving signals, etc.
Is there a TR somewhere that describes the implementation?