TBB_NUM_THREADS to control number of threads in TBB

TBB_NUM_THREADS to control number of threads in TBB

Does Threaded Buliding Blocks provide an environment variable to control the number of threads spawned? I presume a sensible name for such an environment variable woud be TBB_NUM_THREADS, similar to OMP_NUM_THREADS, and have not seen any reference to this in the documentation. I know that in general it is best to let TBB set the number of threads, and that they can be set explicitly by an early call to task_scheduler_init, but I think that providing an optional environment variable that sets the number of threads if present would be useful to many users.

As an example, a software library that I use has chosen to use threaded building blocks behind the scenes. This has worked well overall, but is leading to some problems on large SMP machines. On these machines users request a small number of CPUS from the scheduler but unless the number of threads is actively controlled TBB tries to initialize N_CPUS + 1 threads which in one case is 156 threads. The resulting program is inefficient and ignores the resorces allocated by the scheduler.

If TBB_NUM_THREADS is not available in the TBB itself, I have to implement similar functionality in code that I want to run on an SMP machine. A Google search on TBB_NUM_THREADS suggests that this is an approach others have taken already. Are there plans to provide an environment variable like this in future releases?

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If I've understood correctly, recent TBB versions let you use a different number of threads from a user thread with its own task_scheduler_init, which is even better than a global override. Have a look at the documentation and some recent forum threads to see if that works for you, and then please summarise your findings here.

Thank you for your reply.

If you mean that I can control the number of threads by invoking task_scheduler_init (num_threads) sufficiently early in my application, then I am aware of this and it is the method I am currently using.

I am suggesting that as a new feature, the task_scheduler_init () method, that is often invoked internally, should check for an environment variable TBB_NUM_THREADS before determining how many threads to spawn via another method. This would not be a global override since it can be set differently in each environment. In fact, many queue mangers eg Torque and LoadLeveler allow users to setup the environment variables for each job submitted.

Overall providing an environment variable seems to be a good way to offer control over the number of threads. I presume it has been considered and would like to know if something like this is planned, or why it is not so good if it is not.

Thanks again.

Best Reply

The lack of an environment variable for controllilng the number of threads is a deliberate design decision in TBB. Experience with OpenMP indicated that some customers did not want users to be able to fiddle with their programs. Instead, we provided a programmatic interface via task_scheduler_init. I believe that .NET went down a similar philosophical path in providing programmatic interfaces for controlling the virtual machine instead of environment variables.

Is there a standard way to determine at run time how many cores are dedicated by a job dispatcher to the given process? We could take it into account when deciding on the default number of threads.

Also, the application developers can always provide their own environment variable and read its value before initializing TBB thread pool. That would be a "pay as you go" solution that keeps control in hands of the application programmer.

You can add your own getenv for TBB_NUM_THREADS and use the supplied value or default.

Jim

Thanks for all those answers. It is good to know that it has been considered and deliberately avoided, along with some reasons. It means that I can suggest options to the developers of the (finite element FEM) library I am using for allowing people using their library control of the number of threads knowing that it won't change in the next release.

Providing a getenv method in the FEM library to read an environment variable and set the number of threads is an option I will put forward. I think if we go this way the name TBB_NUM_THREADS should be avoided because it would give the impression that it is provided at the TBB library level and generally cause further confusion.

Intel TBB 4.3 Update 5 introduced global_control class for application-wide control of allowed parallelism and thread stack size.

https://www.threadingbuildingblocks.org/docs/help/reference/appendices/c...

--Vladimir

What is the intended usage, why does it take this form (with a "selection" rule), and why is there no way to query the current selection?

I would suggest a shorter name, though: "max_num_threads" (instead of the mouthful "max_allowed_parallelism"), to go with "default_num_threads". You could also rename task_scheduler_init's parameter (now "max_threads") for consistency (no API change).

I would also want to come back to the original suggestion of an environment variable. Maybe it's true that developers don't want users to be able to "fiddle" with their programs (why not, exactly?), but it's also true that sometimes users of a shared server don't want others to hog the machine whenever they run an off-the-shelf program using TBB... and those others currently have no other way to avoid causing resentment (aka. to be polite) than to not run those programs, which isn't much of a solution compared to being able to just set TBB_MAX_NUM_THREADS (also shorter than TBB_MAX_ALLOWED_PARALLELISM), which would logically participate in the selection rule and could also limit market capacity (the latter would just be an invisible implementation matter, but might be relevant to reliably avoid overshooting the mark and then having to park the excess threads).

BTW, if there are multiple master threads, does TBB try to compensate for that by parking one or more idle worker threads?

Raf, thank you for feedback.

The intended usage for global_control is at the top level of an application (e.g. main()) to limit the number of threads TBB can use, no matter what was specified in various program modules by task_scheduler_init or task_arena. One particular use case is to facilitate implementation of an application-specific environment variable to control the number of threads.

The selection rule is there to provide a limited form of composability in case more than one global_control object is activated at the same time. We think it's more composable than if a new setting always overrode the previous one.

To query the current selection, there is global_control::active_value() static method.

We discussed various names and decided that "max_allowed_parallelism" best describes the semantics of the setting, though of course it's subjective and the difference with e.g. max_num_threads is subtle. Thank you for bringing this up, we might reconsider the name later.

I still believe that providing the developers with a way to implement a max-threads environment variable (if they wish so) is better than doing it ourselves and leaving them no control. An environment variable recognized directly by TBB could cause undesirable effects for end users as well: being set for/by one application, it could inadvertently affect other TBB-based applications on a system (think of Windows where it's common to set an environment variable globally).

Resource distribution on shared servers is controlled by job managers, and those usually utilize taskset or similar utilities to run a program on a given subset of cores. TBB recognizes and respects the process affinity masks, thus job managers have a way to limit how much HW is available for a TBB based app.

TBB still does not try to compensate for multiple master threads running at the same time. 

"To query the current selection, there is global_control::active_value() static method." Oops, I missed that...

"I still believe that providing the developers with a way to implement a max-threads environment variable (if they wish so) is better than doing it ourselves and leaving them no control." Call me old-fashioned (using first person to mean any user), but if I bought the computer, and I decided to run a particular program on it, shouldn't I also be in control?

"think of Windows where it's common to set an environment variable globally" And how exactly does that mean that nobody else can have nice things? What exactly would be the scenario where a program developer would be aware, in advance, of a need to restrain execution and could easily build in a program-specific setting (following the recommendation of TBB's newly adapted documentation), but decides to instead advise his users to use a global setting and ignore their inevitable complaints that this interferes with other programs? Besides, I probably even would want to (be able to) set this in my login profile so I would have to explicitly disable it in a shell or for a specific program if I wanted to use the whole machine (for a specific need, or during quiet time): I would want to use this as an "environment" variable. It's not unlikely that a server's administrator would want to have a say in the matter as well, by setting it up as a default for all users.

"Resource distribution on shared servers is controlled by job managers" This isn't (only) about scheduled jobs: maybe I have an account on a server that I'm sharing with others, and I'm already using "-j 8" instead of "-j" to restrict my use of parallelism to build programs because I don't want to keep having lunch by myself all the time. Using taskset(1) is awkward because it involves some kind of partitioning that has to be coordinated with others, which is annoying enough by itself if it only has to be done occasionally. And if a machine has 32 cores and 25 potential users, do I only get a single core, or how do I coordinate with others in real time so our respective subsets don't overlap, without also restricting who can use the machine at each particular time? Why couldn't I just let the O.S. figure out for itself how to allocate resources like it normally would? And on top of that I also have to remember to use this each and every time I start a program!

Parallel programs haven't made scheduling shared resources any easier, and there's no definitive solution in sight, but why withhold this as an obvious workaround?

Leave a Comment

Please sign in to add a comment. Not a member? Join today