I remember Arch saying something along the lines of implementing a data structure with atomic operations can sometimes provide no better performance than just wrapping a sequential data structure with a mutex.
Before I repeat the TBB teams' past work, I'm looking at wait-free data structures and the penalty of using atomic operations to implement them. Any general suggestions or observations on the performance of wait-free data structures?
In particular I'm looking at data structures in the "Art of Multiprocessor Programming" book to implement. Before I spend a lot of time doing this, any words of wisdom?