I went through the Concurrent Queue class in TBB and I havea fewquestions :
1. The concurrent_queue_rep class makes sure that head_counter and tail_counter are placed on seperate cachelines
so that the pushing threads and popping threads dont contendwhile accessing them.Also the consecutive operations to the queue aredirected to different microqueues so that these operations can be done quite independently. The size of each micro_queue happens to be 20 bytes. And the concurrent_queue_rep stores all the 8 microqueues consequetively
in memory(starting from the start of a cacheline).
So my question is, though the 8 concurrent operations are directed to 8 different micro queues, all the 8 threads are trying to access memorysome whereamong this 160 bytes ( 8 microqueues * 20 bytes) andI see quite some contention. So why not keep the each of the microqueues on different cachelines as well?
2. Idonot understand as to why the field 'mask' in the page struct is used to decide the success of a popping operation.
Because I see the operations SpinwaitUntilEq( head_counter, k ) and SpinwaitWhileEq( tail_counter, k ) in micro_queue::pop() method makesure that an itemis present for the popping threadto pop and hence I dont find the use of mask further to determine if the item is present.
3. The following is the code in the internal_pop methos of concurrent_queue_base
k = r.head_counter++;
} while( !r.choose(k).pop(dst,k,*this) );
Why is the head_counter incremented in a do-while loop? How will a thread which doesnot find a item for pop operation N find an item for an an operation (N+1)?
4. If the threads that push into the concurrent queue and the threads that pop from the concurrent queue are different then the threads which pop from the queue will be the threads who call deallocate_page(). This again calls for a slow path in the scalable_free since the popping threads will never be the owners of the page memory. So why not have something like the LIFOQueue( used in scalable allocator code) per microqueueto place the free pages and get it back when necessary. Would this not reduce contentionsin the scalable_allocator as well?
Please correct me if my understanding on the above is wrong.