Starvation-free, bounded-time rw-mutex with wait-free fast-pathed readers

Starvation-free, bounded-time rw-mutex with wait-free fast-pathed readers

Here is a Windows implmentation that works on Win95/NT/XP/CE/Whatever:

http://webpages.charter.net/appcore/misc/rwmutex_win_hpp.html

Here is a Linux implmentation that should compile and work fine on any system that suppots GCC, Pthreads and semaphores (e.g., `sem_post()):

http://pastebin.com/f3d6140eb

My invention has bounded-time and does not starve writers or readers. It simply interleaves batched of reads with writes. Please study the algorithm. I think it would be a good candiate for incusion within TBB.

Any thoughts?

Thank you,

Chris M. Thomasson

18 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - CHRIS M THOMASSON
Here is a Windows implmentation that works on Win95/NT/XP/CE/Whatever:

...

My invention has bounded-time and does not starve writers or readers. It simply interleaves batched of reads with writes. Please study the algorithm. I think it would be a good candiate for incusion within TBB.

Any thoughts?

Chris M. Thomasson

Thanks, Chris. We're always looking for a few good ideas. Any reason why you didn't post this to the TBB forum directly? And are you willing to submit this through the TBB code contribution process?

Quoting - Robert Reed (Intel)
Thanks, Chris. We're always looking for a few good ideas. Any reason why you didn't post this to the TBB forum directly? And are you willing to submit this through the TBB code contribution process?

Hi Robert. First of all, I apologize for taking so long to get back to you. Also, I have no good reason why I did not post this over in the TBB forum in the first place; I have migrated the thread over there.

As far as going through the official TBB submission process, well I would be happy to only if you think its a worthy addition to the existing code-base. I already have verified its correctness using Relacy Race Detector:

http://relacy.pastebin.com/f3cab2025

BTW, if your going to test this in Relacy, please download the most recent version (e.g., version: 1.5.0):

http://groups.google.com/group/relacy/files

The main benefits of my algorithm are:

- 100% starvation-free for both readers and writers.

- Bounded-time for both readers and writers.

- Wait-free fast-pathed readers (when there is no writer contention).

- Interleaves batches of readers and writers on contention.

- Does not use CAS; only needs Fetch-and-Add.

- 100% loop-free algorithm.

- Has conditional reader-to-writer upgrade.

- Small, tight algorithm.

- Easy to read, no cluttered implementation.

- Easy to port to POSIX systems (only needs sem_t for slowpaths).

- Easy to post to other architectures.

- Seems to outperform native rw-mutex implementations; Vista native rw-mutexs do NOT guarantee starvation-free and bounded-time behavior.

- No restrictive license; its 100% public domain.

The downsides are:

- It currently has no writer-to-reader downgrade.

- Its not wait-free on architectures that do not have native atomic fetch-and-add. For instance, on SPARC, this instruction has to be emulated with CAS.

Have you studied the algorithm yet? I would welcome any comments, good or bad! Please take a look at it.

Thanks a lot.

Sincerely,

Chris M. Thomasson

Quoting - Chris M. Thomasson
Hi Robert. First of all, I apologize for taking so long to get back to you. Also, I have no good reason why I did not post this over in the TBB forum in the first place; I have migrated the thread over there.

Have you studied the algorithm yet? I would welcome any comments, good or bad! Please take a look at it.

Thanks a lot.

Sincerely,

Chris M. Thomasson

Thanks for replying. Migrating the thread over here might draw the attention of Alexey or one of the other members of the development team. As for me, I've been buried in other topics and battles and have not had a chance yet to take a look.

Quoting - Robert Reed (Intel)

Thanks for replying. Migrating the thread over here might draw the attention of Alexey or one of the other members of the development team. As for me, I've been buried in other topics and battles and have not had a chance yet to take a look.

Okay. Well, I sure hope they like it...

;^)

Quoting - Chris M. Thomasson

Okay. Well, I sure hope they like it...

;^)

Was the posted algorithm way too crappy, supferlous, inadequate, crappy, shi%ty, whatever?

Quoting - Chris M. Thomasson
Was the posted algorithm way too crappy, supferlous, inadequate, crappy, shi%ty, whatever?

Chris,
sorry for not answering you earlier. I have just returned from vacation :) I promise to look at the proposal and get back to you. Meanwhile, you are always free and encouraged to submit it as a contribution.

Overall, it seems a good general-purpose RW lock implementation. Some comments:
- it is starvation-free and bounded-time for writers only if the underlying mutexfor writers has these qualities. Otherwise, constantly coming writers can cause starvation for some of them.
- the object is rather big in size for a mutex.
Other than that, I agree the properties you announced for it are correct. Whether they are important, depend on a particular customer needs (e.g. why "no CAS" is important, provided that compare-and-swap is more available in HW than fetch-and-add).

I think that writer-to-reader downgrade can be implemented as following:

  bool trywrtord() throw() {
    assert(m_count < 1);
    if (m_wrrecurse>1) return false;
    --m_wrrecurse;
    LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX-1); // one less, because "this" becomes a reader
    if (count < 0) {
      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
    LeaveCriticalSection(&m_wrlock);
    return true;
  }

Basically it is almost identical to wrunlock(), except that recursive locking prevent from downgrade,and also m_count is increased by LONG_MAX-1, to reflect the fact that the writer becomes a reader.

Currently, TBB does not have a generic passive-wait (i.e. non-spinning) RW mutex. I think this proposal can be considered as a designalternative, and I agree it has some advantages to a mere API wrapper, especially on Windows where RW locks weren't available before Vista. Of course that's only my opinion and can not serve as a guarantee of inclusion into future TBB versions.

Quoting - Alexey Kukanov (Intel)

Overall, it seems a good general-purpose RW lock implementation. Some comments:
- it is starvation-free and bounded-time for writers only if the underlying mutexfor writers has these qualities. Otherwise, constantly coming writers can cause starvation for some of them.
- the object is rather big in size for a mutex.
Other than that, I agree the properties you announced for it are correct. Whether they are important, depend on a particular customer needs (e.g. why "no CAS" is important, provided that compare-and-swap is more available in HW than fetch-and-add).

I think that writer-to-reader downgrade can be implemented as following:

  bool trywrtord() throw() {
    assert(m_count < 1);
    if (m_wrrecurse>1) return false;
    --m_wrrecurse;
    LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX-1); // one less, because "this" becomes a reader
    if (count < 0) {
      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
    LeaveCriticalSection(&m_wrlock);
    return true;
  }

Basically it is almost identical to wrunlock(), except that recursive locking prevent from downgrade,and also m_count is increased by LONG_MAX-1, to reflect the fact that the writer becomes a reader.

Currently, TBB does not have a generic passive-wait (i.e. non-spinning) RW mutex. I think this proposal can be considered as a designalternative, and I agree it has some advantages to a mere API wrapper, especially on Windows where RW locks weren't available before Vista. Of course that's only my opinion and can not serve as a guarantee of inclusion into future TBB versions.

Yes; you are correct that writers inherent properties from underlying mutex. You are also correct that the actual object is rather large. However, I think I can cut that down somewhat by removing the need for an OS mutex and steal some space from the `m_count' member. Instead of using LONG_MAX, I could replace it with RDMAX which could be defined as (LONG_MAX - 128), or something, and use that extra space for hand-crafted mutex state. I could also get rid of the `m_wrrecurse' member. WRT your example `trywrtord()' procedure, well, at first glance it should work just fine. So, I will try and cut some of the fat and also try out the writer-to-reader downgrade; even though I am not sure how useful that is in practice. If you have any suggestions on how trim more of it, I would be very happy to read about them!

Anyway, thank you very much for taking the time to actually read the algorithm. I really do appreciate it!

:^)

Here is a new version that now has a writer-to-reader downgrade, and it also has a fair bounded time mutex with wait-free fast-path. Its somewhat smaller that the original because I no longer use the CRITICAL_SECTION:

#if ! defined(RWMUTEX_WIN_HPP)
  #define RWMUTEX_WIN_HPP()0x0001UL
  #undef WIN32_LEAN_AND_MEAN
  #undef _WIN32_WINNT
  #define _WIN32_WINNT 0x0400
  #define WIN32_LEAN_AND_MEAN
  #include 
  #include 
  #include 
  #include 
  #if ! defined(RWMUTEX_WIN_UNEXPECTED)
    #if defined(_MSC_VER)
      #define RWMUTEX_WIN_UNEXPECTED() 
        assert(false), unexpected()
    #else
      #define RWMUTEX_WIN_UNEXPECTED() 
        assert(false), std::unexpected()
    #endif
  #endif
  #if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
    #define RWMUTEX_WIN_CTOR_UNEXPECTED() 
      assert(false); throw std::exception()
  #endif
  #if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
    #define RWMUTEX_WIN_DTOR_UNEXPECTED()assert(false)
  #endif
/*************************************************************/




/* Simple Scoped Lock Helpers
_________________________________________________________*/
namespace lockguard {
  template
  class read {
    T* m_state;
    bool m_upgraded;
  public:
    read(T& state) throw() 
      : m_state(&state), 
        m_upgraded(false) {
      m_state->rdlock();
    }
    ~read() throw() {
      if (m_state) { 
        if (! m_upgraded) {
          m_state->rdunlock();
        } else {
          m_state->wrunlock();
        }
      }
    }
  public:
    void cancel() throw() { m_state = 0; }

    bool tryrdtowr() throw() {
      return (m_state && ! m_upgraded) ? 
        (m_upgraded = m_state->tryrdtowr()) : false;
    }
  };


  template
  class write {
    T* m_state;
  public:
    write(T& state) throw() : m_state(&state) {
      m_state->wrlock();
    }
    ~write() throw() {
      if (m_state) { m_state->wrunlock(); }
    }
  public:
    void cancel() throw() { m_state = 0; }
  };
}




/* Wait-Free Fast-Pathed Read/Write Mutex
_________________________________________________________*/
class rwmutex_win {
  rwmutex_win(rwmutex_win const&);
  rwmutex_win const& operator =(rwmutex_win const&);


public:
  typedef lockguard::read lock_read;
  typedef lockguard::write lock_write;


private:
  LONG m_count;
  LONG m_rdwake;
  LONG m_mtx;
  HANDLE m_mtxwset;
  HANDLE m_wrwset;
  HANDLE m_rdwset;


public:
  rwmutex_win()
    : m_count(LONG_MAX),
      m_rdwake(0),
      m_mtx(1),
      m_mtxwset(CreateSemaphore(0, 0, LONG_MAX, 0)),
      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
    if (! m_rdwset || ! m_wrwset) {
      if (m_rdwset) { CloseHandle(m_rdwset); }
      if (m_wrwset) { CloseHandle(m_wrwset); }
      RWMUTEX_WIN_CTOR_UNEXPECTED();
    }
  }

  ~rwmutex_win() throw() {
    if (m_count != LONG_MAX || m_rdwake || m_mtx != 1) {
      RWMUTEX_WIN_DTOR_UNEXPECTED();
    }
    if (! CloseHandle(m_mtxwset) ||
        ! CloseHandle(m_rdwset) ||
        ! CloseHandle(m_wrwset)) {
      RWMUTEX_WIN_DTOR_UNEXPECTED();
    }
  }


public:
  void wrlock() throw() {
    if (InterlockedDecrement(&m_mtx) < 0) {
      if (WaitForSingleObject(m_mtxwset, 
            INFINITE) != WAIT_OBJECT_0) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
    assert(m_count > -1);
    LONG count = InterlockedExchangeAdd(&m_count, -LONG_MAX);
    if (count < LONG_MAX) {
      if (InterlockedExchangeAdd(&m_rdwake, 
            LONG_MAX - count) + LONG_MAX - count) {
        if (WaitForSingleObject(m_wrwset, 
              INFINITE) != WAIT_OBJECT_0) {
          RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
  }

  void wrunlock() throw() {
    assert(m_count < 1);
    LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX);
    if (count < 0) {
      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
    if (InterlockedIncrement(&m_mtx) < 1) {
      if (! ReleaseSemaphore(m_mtxwset, 1, NULL)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
  }

  void wrtord() throw() {
    assert(m_count < 1);
    LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX - 1);
    if (count < 0) {
      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
    if (InterlockedIncrement(&m_mtx) < 1) {
      if (! ReleaseSemaphore(m_mtxwset, 1, NULL)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
  }


public:
  void rdlock() throw() {
    assert(m_count <= LONG_MAX);
    LONG count = InterlockedDecrement(&m_count);
    if (count < 0) {
      if (WaitForSingleObject(m_rdwset, 
            INFINITE) != WAIT_OBJECT_0) {
        assert(false); RWMUTEX_WIN_UNEXPECTED();
      }
    }
  }

  void rdunlock() throw() {
    assert(m_count < LONG_MAX);
    LONG count = InterlockedIncrement(&m_count);
    if (count < 1) {
      if (! InterlockedDecrement(&m_rdwake)) {
        if (! SetEvent(m_wrwset)) {
          RWMUTEX_WIN_UNEXPECTED();
        }
      }
    } 
  }


public:
  bool tryrdtowr() throw() {
    assert(m_count < LONG_MAX);
    if (InterlockedCompareExchange(&m_mtx, 0, 1) == 1) {
      assert(m_count > -1);
      LONG count = 
        InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
      if (count < LONG_MAX) {
        if (InterlockedExchangeAdd(&m_rdwake, 
              LONG_MAX - count) + LONG_MAX - count) {
          if (WaitForSingleObject(m_wrwset, 
                INFINITE) != WAIT_OBJECT_0) {
            RWMUTEX_WIN_UNEXPECTED();
          }
        }
      }
      return true;
    }
    return false;
  }
};




/*************************************************************/
#endif

Okay... What do you think of this one?

;^)

Quoting - Chris M. Thomasson
Here is a new version that now has a writer-to-reader downgrade, and it also has a fair bounded time mutex with wait-free fast-path. Its somewhat smaller that the original because I no longer use the CRITICAL_SECTION:

[...]

Okay... What do you think of this one?

;^)

Humm... I still think I can merge some of the counter fields. I mean, do I really need to support LONG_MAX number of threads? lol. Any suggestions?

Thank you for your time.

Quoting - Chris M. Thomasson
Here is a new version that now has a writer-to-reader downgrade, and it also has a fair bounded time mutex with wait-free fast-path. Its somewhat smaller that the original because I no longer use the CRITICAL_SECTION:
[...]

Okay... What do you think of this one?

;^)

Oops; I forgot to add the downgrade ability to the write RAII class. Here it is:

  template
  class write {
    T* m_state;
    bool m_downgraded;
  public:
    write(T& state) throw() 
      : m_state(&state),
        m_downgraded(false) {
      m_state->wrlock();
    }
    ~write() throw() {
      if (m_state) { 
        if (! m_downgraded) {
          m_state->wrunlock();
        } else {
          m_state->rdunlock();
        }
      }
    }
  public:
    void cancel() throw() { m_state = 0; }

    void wrtord() throw() {
      if (m_state && ! m_downgraded) {
        m_state->wrtord();
        m_downgraded = true;
      }
    }
  };

Sorry about that.

Quoting - Chris M. Thomasson
template
class write {
T* m_state;
bool m_downgraded;
public:
void cancel() throw() { m_state = 0; }
...
};

What I do not understand is this cancel() method that simply nullifies the pointer. I do not see how it can be useful. Once constructed, the RAII object helds the lock until being destroyed, right? Calling cancel() leaves the lock in acquired state, but removes the reference to it from the RAII object, thus basically abandoning the lock (since the destructor will not release it). Why would one need such a dangerous method?

Quoting - Alexey Kukanov (Intel)

What I do not understand is this cancel() method that simply nullifies the pointer. I do not see how it can be useful. Once constructed, the RAII object helds the lock until being destroyed, right? Calling cancel() leaves the lock in acquired state, but removes the reference to it from the RAII object, thus basically abandoning the lock (since the destructor will not release it). Why would one need such a dangerous method?

I added it for flexibility. For instance if somebody wanted to maintain consistency across throwing and catching an exception. Something like this pseudo-code:

static rwmutex_win g_rwmutex;


void foobar() {
  rwmutex_win::lock_write write(g_rwmutex);
  if (something_went_wrong()) {
    write.cancel();
    throw something();
  }
}


void foo() {
  try {
    foobar();
  } catch(something const& e) {
    // respond to error while we still have
    // consistency because the lock is still held...
    g_rwmutex.wrunlock();
  }
}

Not sure how useful this actually is, but I wanted to give the user the ability to do it.

Quoting - Chris M. Thomasson

I added it for flexibility. For instance if somebody wanted to maintain consistency across throwing and catching an exception. Something like this pseudo-code:

static rwmutex_win g_rwmutex;


void foobar() {
  rwmutex_win::lock_write write(g_rwmutex);
  if (something_went_wrong()) {
    write.cancel();
    throw something();
  }
}


void foo() {
  try {
    foobar();
  } catch(something const& e) {
    // respond to error while we still have
    // consistency because the lock is still held...
    g_rwmutex.wrunlock();
  }
}

Not sure how useful this actually is, but I wanted to give the user the ability to do it.

Very clever idea! However I hope I NEVER have to debug a construct like that :)

Scoped locks are all that's needed, if I need consistency I'd place a scoped lock within foo, regardless of foobar's implementation (and locks). I think the design is in violation of the Demeter(?) principles.

Quoting - robert.jay.gould

Very clever idea! However I hope I NEVER have to debug a construct like that :)

Scoped locks are all that's needed, if I need consistency I'd place a scoped lock within foo, regardless of foobar's implementation (and locks). I think the design is in violation of the Demeter(?) principles.

;^)

Well, you might have to understand foobar's implmentation if you call it with a lock held. Perhaps foobar takes the same lock, and said lock is non-recursive. Or, you might get into locking order issues.

WRT cancelling RAII objects, well, its not unheard of. Locking aside for a moment, take ScopeGuard code into account:

http://www.ddj.com/cpp/184403758

It depends on the ability to cancel the RAII objects in order to "commit" their actions as the dtor will peform a rollback. If the action suceeds, you want to be able to cancel out the rollback. If an exception is thrown, then the rollback is allowed to do its thing.

Quoting - Chris M. Thomasson

;^)

Well, you might have to understand foobar's implmentation if you call it with a lock held. Perhaps foobar takes the same lock, and said lock is non-recursive. Or, you might get into locking order issues.

WRT cancelling RAII objects, well, its not unheard of. Locking aside for a moment, take ScopeGuard code into account:

http://www.ddj.com/cpp/184403758

It depends on the ability to cancel the RAII objects in order to "commit" their actions as the dtor will peform a rollback. If the action suceeds, you want to be able to cancel out the rollback. If an exception is thrown, then the rollback is allowed to do its thing.

Yeah I remember implementing a very simple STM prototype based on RAII transactions like that, where commit would stop the rollback in the destructor.

So yeah releaseRAII is a valid pattern.

Anyways I can imagine the cancel having some use, but without proper documentation, and perhaps a better name, I can already imagine dangling locks in user code :)

Quoting - robert.jay.gould

Yeah I remember implementing a very simple STM prototype based on RAII transactions like that, where commit would stop the rollback in the destructor.

So yeah releaseRAII is a valid pattern.

Anyways I can imagine the cancel having some use, but without proper documentation, and perhaps a better name, I can already imagine dangling locks in user code :)

Perhaps something like:

do_not_automatically_unlock__very_dangerous_and_deadlock_prone__experts_only()

;^=

Leave a Comment

Please sign in to add a comment. Not a member? Join today