I am looking for cheap ways to maintain a consistent state. I have realized that mutexes and rw locks will clear out the value the test and set variable from the cache of all other cores than the one currently trying to do the test and set. Afterwards, a thread will have a cache miss and will have to go to memory. I am wondering whether this is the case with __sync_bool_compare_and_swap, __sync_add_and_fetch, etc? Will a memory barrier cause the same effect?
Are there any good references out there on inexpensive primitives that scale well?