I've been very busy reading book from Intel Press, and deciding upon the components for my new TBB development system.
Something that I feel is missing from TBB, is an equivalent to limits.h. Something that will contain information about the size of the L1, L2 cache, number of processors, and other information for compile-time tuning. These values could be used for template programming or pre-processor directives to automagically optimize data structures. I can easily envision template libraries that fine-tune themselves at compile time based on processor capabilities.
Equivalently, something that is available at runtime could be very useful. I've read about information obtainable via CPUID in the Software Optimization Cookbook. This could be a useful structure for similar optimizations, but performed at runtime instead. This way a single binary could be moved around different systems, but still be capable of best utilizing resources available to it.