New interesting application of TM for single-threaded programs

New interesting application of TM for single-threaded programs

Ritratto di Dmitry Vyukov

While dealing with Transactional Memory I realize new interesting application of Transactional Memory for SINGLE threaded applications. Atomicity guarantees provided by TM can be useful not only for multi-threaded environment, but also for single-threaded environment. Well, it's actually not astonishing, nevertheless I don't hear anything similar in all that hype around TM.

Assume we have complicated operation which involves non-trivial modifications of several objects/containers. Assume that exception can be thrown either by memory allocator, or by copy constructor of some object, or just by application logic. In order to provide strong exception safety in such situation we have to manually write code for cancellation of all those modifications. This can be non-trivial error-prone task.

TM already has all necessary machinery for cancellation of arbitrary operations. TM doesn't care about complexity of operations, number of involved objects/containers, it can just instantly cancel anything which happens inside atomic block. Why don't use it?

So the general recipe for strong exception safety with the help of TM: wrap arbitrary operation in atomic block, if the case of error just abort transaction. Hocus-pocus! Your operation obtains strong exception safety at once w/o single line of code!

This can be equally applied to exceptions and errors codes/return values, no matter.

Here is something which I was actually able to model with Intel C++ STM Compiler:

// object which "sometimes" throws exception in copy ctor
bool fail_bad_int = false;
int fail_bad_int_step = 8;

__declspec(tm_callable)
struct bad_int
{
    bad_int(int v = 0)
        : v(v)
    {}

    bad_int& operator = (bad_int r)
    {
        if (fail_bad_int && 0 == --fail_bad_int_step)
            throw 0;
        v = r.v;
        return *this;
    }

    operator int () const
    {
        return v;
    }

    bool operator < (bad_int r) const
    {
        return v < r.v;
    }

    int v;
};

// transactional sort function
template
__declspec(tm_callable)
void sort(T* begin, T* end)
{
    T temp;
    size_t n = end - begin;
    if (n < 2)
        return;
    bool swapped = false;
    do
    {
        swapped = false;
        n -= 1;
        for (size_t i = 0; i != n; ++i)
        {
            if (begin[i + 1] < begin[i])
            {
                temp = begin[i];
                begin[i] = begin[i + 1];
                begin[i + 1] = temp;
                swapped = true;
            }
        }
    }
    while (swapped);
}

int main()
{
    std::vector x;
    std::generate_n(std::back_
inserter(x), 10, rand);
    std::copy(x.begin(), x.end(), std::ostream_iterator(std::cout, " t"));
    std::cout << std::endl;

    fail_bad_int = true;
    bad_int* begin = &*x.begin();
    bad_int* end = begin + x.size();
    __tm_atomic
    {
        try
        {
            sort(begin, end);
        }
        catch (...)
        {
            __tm_abort;
        }
    }

    std::copy(x.begin(), x.end(), std::ostream_iterator(std::cout, " t"));
    std::cout << std::endl;
}

Output:

41 18467 6334 26500 19169 15724 11478 29358 26962 24464
41 18467 6334 26500 19169 15724 11478 29358 26962 24464

And if I comment __tm_abort statement out, then output is:

41 18467 6334 26500 19169 15724 11478 29358 26962 24464
41 6334 18467 19169 26500 26500 11478 29358 26962 24464

Notice that in latter case array is in some intermediate state. And in former case array is in initial state.

Although one has to understand that this method can incur substantial run-time (space and time) overheads even if exception is not thrown. On the other hand it's extremely simple and not error prone (i.e. you can not forget to cancel some things or cancel incorrectly). So this approach can be used on cold-paths or on initial development stage.

I see that Intel C++ STM Compiler has execution mode called 'serial atomic', which used for serialized transactions which have abort statements. Serial atomic mode can be used to hasten such single-threaded usage of TM. I.e. TM run-time only has to log writes, no need for synchronization, quiescence, verification etc.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
5 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Dmitriy,

In general your idea ispossible, but probably we need much less than entire TM to implement failure atomicity: we have special support for locals and in the case of single-threaded app all data locations should be treated as locals. However, currently it is hardly possible to abort transaction on exception: we somehow support this, but in fact abort in catch() handler may lead to incorrect state of C++ EH RTL because abort mekes something like longjump and does not return back to handler's caller. Thus C++ RTL will internally think that it is still handling exception while we're retried and already execute code outside of any EH constructs. Right now we're working on complete and correct support of EH in conjunction with TM and plan to include this support into the next release of our compiler.

By the way, I will make a talk at SEC conference in Moscow this week and if you by any chance planned to attend the conference, we may meet there and discuss your ideas and thoughts in more details. Otherwise, we still can meet in Moscow and discuss TM-related topics by the cup of coffee.

Ritratto di Dmitry Vyukov

Quoting - Serge Preis (Intel)

Dmitriy,

In general your idea ispossible, but probably we need much less than entire TM to implement failure atomicity: we have special support for locals and in the case of single-threaded app all data locations should be treated as locals. However, currently it is hardly possible to abort transaction on exception: we somehow support this, but in fact abort in catch() handler may lead to incorrect state of C++ EH RTL because abort mekes something like longjump and does not return back to handler's caller. Thus C++ RTL will internally think that it is still handling exception while we're retried and already execute code outside of any EH constructs. Right now we're working on complete and correct support of EH in conjunction with TM and plan to include this support into the next release of our compiler.

Ok, I understand the problem.

I think I can just change the code slightly:

---

	   bool failure = true;
     __tm_atomic  
     {  
         try  
         {  
             sort(begin, end);
             failure = false;
         }  
         catch (...)
         {
         }  
         if (failure)
             __tm_abort;  
     }
     if (failure)
         ; // operation if failed, all side-effects are automatically rolled back
     else
         ; // operation succeed

---

A bit ugly...

Btw, if STM automatically aborts transaction on exception then code will look just like:

     __tm_atomic  
     {  
             sort(begin, end);
     }

This is the approach taken by Microsoft STM team:

http://blogs.msdn.com/stmteam/archive/2008/10/08/welcome-to-the-transactional-memory-team-s-blog.aspx#9003050

BUT! By making this they are CHANGING SINGLE-THREADED SEMANTICS of the program!

Initially I underappreciate the fact that you are commiting transaction on exception. Now I understand that it's the only way.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Ritratto di Dmitry Vyukov

Quoting - Serge Preis (Intel)

Dmitriy,

In general your idea ispossible, but probably we need much less than entire TM to implement failure atomicity: we have special support for locals and in the case of single-threaded app all data locations should be treated as locals.

Well, I think that there are really very few purely single-threaded applications nowadays. By 'single-threaded' I meant that it's probably multi-threaded application, but thread executes operation on NON shared data. And if that data is allocated dynamically, compiler very likely will not be able to detect that it's not shared.

Yes, I understand that one doesn't need the whole STM machinery to achieve 'single-threaded atomicity', this is why I mention 'serial atomic' execution mode.

Taking into account above facts, the syntax can look like:

     __tm_atomic_serial[abort_on_exception=true]
     {  
             sort(begin, end);
     }

I am NOT saying that this is 'feature request', currently it is just of a theoretical interest.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Ritratto di Dmitry Vyukov

Quoting - Serge Preis (Intel)

Dmitriy,

By the way, I will make a talk at SEC conference in Moscow this week and if you by any chance planned to attend the conference, we may meet there and discuss your ideas and thoughts in more details. Otherwise, we still can meet in Moscow and discuss TM-related topics by the cup of coffee.

I've dropped you a private message (you can see it in your ISN Profile). Unfortunately, I failed to guess your email. If by any reason you will not receive my message, you can find my email on this page:

http://software.intel.com/en-us/forums/showthread.php?t=60050

Btw, is it Ok to write you in Russian?

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net

Accedere per lasciare un commento.