STM Compiler: Quiescence for read-only transactions

STM Compiler: Quiescence for read-only transactions

Imagen de Dmitry Vyukov

From the OOPSLA paper:

To implement privatization safety in the presence of optimistic readers, ALL transactions must quiesce [46, 13, 33] on commit.

I'm not sure why ALL transactions must wait for quiesce on commit. In my opinion, read-only transactions don't have to wait for quiesce on commit. Just because they can't privatize anything. Because one have to make at least one write in order to privatize. If transaction is read-only, then other transactions are unable to distinguish whether that transaction are already completed, or not, or it's not started, or it won't be executed at all anyway. So waiting for quiesce looks here completely senseless.

What do you think?

I think if you will remove wait for quiesce for read-only transactions, it can have great impact for read-mostly workloads (for example search in hash-map).

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
publicaciones de 9 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Dmitry Vyukov

Quoting - Dmitriy V'jukov

From the OOPSLA paper:

To implement privatization safety in the presence of optimistic readers, ALL transactions must quiesce [46, 13, 33] on commit.

I'm not sure why ALL transactions must wait for quiesce on commit. In my opinion, read-only transactions don't have to wait for quiesce on commit. Just because they can't privatize anything. Because one have to make at least one write in order to privatize. If transaction is read-only, then other transactions are unable to distinguish whether that transaction are already completed, or not, or it's not started, or it won't be executed at all anyway. So waiting for quiesce looks here completely senseless.

What do you think?

I think if you will remove wait for quiesce for read-only transactions, it can have great impact for read-mostly workloads (for example search in hash-map).

Or put it this way. In order privatize some object, thread must make that object inaccessible to other threads (or at least mark it somehow). In order to make object inaccessible (or mark it) thread must make at least one write. Consequently if thread makes no writes, it don't privatize anything. So thread don't need to wait for quiesce.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Best Reply

Quoting - Dmitriy V'jukov

From the OOPSLA paper:

To implement privatization safety in the presence of optimistic readers, ALL transactions must quiesce [46, 13, 33] on commit.

I'm not sure why ALL transactions must wait for quiesce on commit. In my opinion, read-only transactions don't have to wait for quiesce on commit. Just because they can't privatize anything. Because one have to make at least one write in order to privatize. If transaction is read-only, then other transactions are unable to distinguish whether that transaction are already completed, or not, or it's not started, or it won't be executed at all anyway. So waiting for quiesce looks here completely senseless.

What do you think?

I think if you will remove wait for quiesce for read-only transactions, it can have great impact for read-mostly workloads (for example search in hash-map).

At first glance it may appear that read-only transactions don't need to quiesce, but consider the following example that illustrates why read-only transactions do need to quiesce:

thread 0

__tm_atomic {

if (!private) data=1; /* flag private indicates whether data is shared or private */

/* ... */

/* conflict later causes an abort */

}

thread 1

__tm_atomic {

private = true;

x = 1;

}

thread 2

__tm_atomic {

tmp = x;

}

if (tmp) {

t1 = data;

t2 = data;

assert(t1 == t2);

}

In this example, thread 1 sets the private flag to true and then signals thread 2 via x that thread 2 can now read data. In other words, thread 1 signals thread 2 that data is now private to thread 2. If thread 2 doesn't quiesce (i.e., wait until thread 0 has either validated successfully or finished aborting), then it will see a dirty value from thread 0 (in an in-place update STM).

Ali

Imagen de Dmitry Vyukov

Quoting - Ali-reza Adl-tabatabai (Intel)
In this example, thread 1 sets the private flag to true and then signals thread 2 via x that thread 2 can now read data. In other words, thread 1 signals thread 2 that data is now private to thread 2. If thread 2 doesn't quiesce (i.e., wait until thread 0 has either validated successfully or finished aborting), then it will see a dirty value from thread 0 (in an in-place update STM).

At first glance it may appear that read-only transactions don't need to quiesce, but consider the following example that illustrates why read-only transactions do need to quiesce:

Ali

Oh! I see! I completely miss that moment!

My line of reasoning was almost right, except that I don't count that thread can privatize on behalf of another thread.

Thank you. I need to think some more on this.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Imagen de Dmitry Vyukov

I see in the paper that quiescence optimizations speed up execution of some benchmarks up to 200%. So I think further optimizations of quiescence are worth doing. What do you think?

The moments that bother me is that (1) most part of read-mostly workloads actually don't need quiescence (i.e. they don't use 'transitive privatization' like the one you described above), (2) quiescence do incurs substantial overheads for such workloads and (3) if I understand all correctly, quiescence optimizations you described in the paper apply mainly to write-mostly workloads.

I have some other hypotheses how quiescence can be eliminated from read-mostly workloads. I will describe them in following posts.

Btw, Microsoft works on similar thing:

http://blogs.msdn.com/stmteam/

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Imagen de Dmitry Vyukov

Quoting - Dmitriy V'jukov

I see in the paper that quiescence optimizations speed up execution of some benchmarks up to 200%. So I think further optimizations of quiescence are worth doing. What do you think?

The moments that bother me is that (1) most part of read-mostly workloads actually don't need quiescence (i.e. they don't use 'transitive privatization' like the one you described above), (2) quiescence do incurs substantial overheads for such workloads and (3) if I understand all correctly, quiescence optimizations you described in the paper apply mainly to write-mostly workloads.

I have some other hypotheses how quiescence can be eliminated from read-mostly workloads. I will describe them in following posts.

First one.

Currently timestamp is associated with every transaction. And on commit thread waits until there will be no in-flight transactions with timestamp less than it's own timestamp. Right?

One can associate 2 timestamps with every transaction. First timestamp (let's call it just 'timestamp') is calculated just as current timestamp.
Second timestamp (let's call it 'timestamp2') is calculated as follows.
Initially timestamp2 = 0
On every optimistic read timestamp2 is updated as follows:
timestamp2 = max(timestamp2, read_location_write_timestamp)
where read_location_write_timestamp - timestamp when location in question was updated last time
On every write timestamp is updated as follows:
timestamp2 = current_timestamp

When transaction commits it waits until there will be no in-flight transactions with timestamp less than it's own *timestamp2*.
So the point is to make timestamp as great as possible, and timestamp2 as less as possible.

Provided high read-mostly workload it's likely that timestamp2 will be lesser than timestamps of most other in-flight transactions.

Well, I am not 100% sure that this scheme works, so I am looking forward to your response. Although, I believe that it's possible to make it work with little corrections, even if current scheme has some defects.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Imagen de Dmitry Vyukov

Quoting - Dmitriy V'jukov

I see in the paper that quiescence optimizations speed up execution of some benchmarks up to 200%. So I think further optimizations of quiescence are worth doing. What do you think?

The moments that bother me is that (1) most part of read-mostly workloads actually don't need quiescence (i.e. they don't use 'transitive privatization' like the one you described above), (2) quiescence do incurs substantial overheads for such workloads and (3) if I understand all correctly, quiescence optimizations you described in the paper apply mainly to write-mostly workloads.

I have some other hypotheses how quiescence can be eliminated from read-mostly workloads. I will describe them in following posts.

Second one.

Transactions can not wait for pure read-only transactions at all, if SEH/signal handler will be installed, and it will catch Access Violations and SIGSEGVs. The worst thing which can happen with pure read-only transaction is AV/SIGSEGV, and in most cases they will just detect inconsistency and retry.

Installation of SEH handler is relatively lightweight and thread-local operation. And signal handler can be installed only once. AV/SIGSEGV approach is used in some STM implementations, for example, Transactional Locking 2. I think it's much better than additional accesses to centralized data-structures.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Imagen de Dmitry Vyukov

Quoting - Dmitriy V'jukov

Second one.

I think that these two things can greately reduce quiescence overheads for read-mostly workloads.

Looking forward to your response.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Imagen de Dmitry Vyukov

Quoting - Dmitriy Vyukov

First one.

Currently timestamp is associated with every transaction. And on commit thread waits until there will be no in-flight transactions with timestamp less than it's own timestamp. Right?

One can associate 2 timestamps with every transaction. First timestamp (let's call it just 'timestamp') is calculated just as current timestamp.
Second timestamp (let's call it 'timestamp2') is calculated as follows.
Initially timestamp2 = 0
On every optimistic read timestamp2 is updated as follows:
timestamp2 = max(timestamp2, read_location_write_timestamp)
where read_location_write_timestamp - timestamp when location in question was updated last time
On every write timestamp is updated as follows:
timestamp2 = current_timestamp

When transaction commits it waits until there will be no in-flight transactions with timestamp less than it's own *timestamp2*.
So the point is to make timestamp as great as possible, and timestamp2 as less as possible.

Provided high read-mostly workload it's likely that timestamp2 will be lesser than timestamps of most other in-flight transactions.

Well, I am not 100% sure that this scheme works, so I am looking forward to your response. Although, I believe that it's possible to make it work with little corrections, even if current scheme has some defects.

Sun do similar thing, they implemented privatization in such a way so that read-only transactions do not wait for other transactions:
http://research.sun.com/scalable/pubs/TRANSACT2009-ScalableSTMAnatomy.pdf

Implementation is basically what I've described in initial post. And they solve transitive/proxy-privatization problem in the following simple way. Privatizing (write) transaction unlocks it's write-set AFTER wait for completion of concurrent transactions. So read-only transaction (which will use privatized memory) won't commit before that wait too.

Reported scalability is quite notable.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Inicie sesión para dejar un comentario.