<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogs &#187; Robert Geva (Intel)</title>
	<atom:link href="http://software.intel.com/en-us/blogs/author/robert-geva/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Fri, 25 May 2012 22:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Serial Equivalence of Cilk Plus programs</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs/#comments</comments>
		<pubDate>Sat, 07 Apr 2012 15:01:39 +0000</pubDate>
		<dc:creator>Robert Geva (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Cilk Plus]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs/</guid>
		<description><![CDATA[The serial equivalence of a Cilk™ Plus parallel program There is a trend in the C++ community to grow capabilities thru more libraries and as much as possible, avoid adding language keywords. Consistent with these trends are Intel’s Threading Building Blocks and Microsoft’s Parallel Patterns Library. The question arises, then, why implement Intel’s Cilk™ Plus as [...]]]></description>
			<content:encoded><![CDATA[<p>The serial equivalence of a Cilk™ Plus parallel program</p>
<p>There is a trend in the C++ community to grow capabilities thru more libraries and as much as possible, avoid adding language keywords.<br />
Consistent with these trends are Intel’s Threading Building Blocks and Microsoft’s Parallel Patterns Library.<br />
The question arises, then, why implement Intel’s Cilk™ Plus as language extensions rather than a library?<br />
One of the answers is that the language is implemented by compilers, and compilers can provide certain guarantees. One such guarantee is serial equivalence.<br />
Every Cilk Plus program that uses the 3 taking keywords for parallelism has a well-defined serial elision.<br />
The serial elision is defined by replacing each cilk_spawn and each cilk_sync with white spaces, and each cilk_for with the for keyword.<br />
Obviously, the serial elision of a Cilk Plus program is a valid C/C++ program.</p>
<p>A program has a determinacy race if two logically parallel strands both access the same memory location and at least<br />
one of them modifies the memory location.<br />
If a Cilk Plus parallel program has no determinacy race, then it will produce the same results as its serial elision.<br />
What are the compiler’s contributions to the serial equivalence guarantees? Consider the following code illustration:</p>
<div><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">int<br />
</span></span><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="font-size: x-small;">foo()<br />
</span></span></span><span style="font-size: x-small;">{<br />
<span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">    int</span></span><span style="font-size: x-small;"> x1 = func1();<br />
</span></span></span><span style="font-size: x-small;"><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">    int</span></span><span style="font-size: x-small;"> d1 = 0;<br />
</span></span></span><span style="font-size: x-small;"><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">    int</span></span><span style="font-size: x-small;"> x2 = cilk_spawn child1(bar1(), bar2());  //spawn a funciton whose arguments are function calls<br />
</span><span style="font-size: x-small;"><span style="font-size: x-small;">    </span></span></span></span><span style="font-size: x-small;"><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">int</span></span><span style="font-size: x-small;"> x3 = cilk_spawn child2(&amp;d1);                        // pass a stack address to a spawned function<br />
</span></span></span><span style="font-size: x-small;"><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">    int</span></span><span style="font-size: x-small;"> x4 = func3();                                                              // func3 can execute in parallel with child1 and child2<br />
</span><span style="font-size: x-small;">   cilk_sync;                                                                          // wait for the child tasks to complete before their results can be used<br />
    </span></span></span><span style="font-size: x-small;"><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">return</span></span><span style="font-size: x-small;"> x1 + x2 + x3 + x4 + d1;<br />
</span><span style="font-size: x-small;">}</span></span></span></div>
<p>The function foo spawns the function child1() so that it can execute concurrently with the function func2().<br />
The statement cilk_sync causes execution to wait until child1 and child2 return, so that their return values can be used.<br />
The compiler makes 3 contributions that determine the execution of the code here:<br />
1. The functions bar1() and bar2(), which produce values that are arguments to child1(), are evaluated sequentially by the parent thread,<br />
i.e. the same thread on which the function foo executes. A different execution models would allow them to execute in parallel to each other,<br />
or on a thread that is different from the parent thread. However, only this execution model corresponds to the way in which the underlying,<br />
C/C++ languages work.<br />
2. The compiler inserts a cilk_sync statement before the return statement in function foo().<br />
The compiler inserts such a statement before the return of every function that includes a cilk_spawn.<br />
This enforces structured fork – join parallelism. It makes the program behavior easier to understand.<br />
On the practical side, the “implicit” cilk_sync inserted by the compiler ensures that the stack frame of foo() is in place throughout the<br />
execution of functions it spawns.<br />
In this illustration, since foo passed the address of d1 to child2(), it guarantees that when child2 writes into the location of d1,<br />
the memory location is as expected, in the stack of foo().<br />
3. The thread that executes foo proceeds to execute child1(). The continuation of foo(), starting from the statement that follows the<br />
cilk_spawn statement, is what it enqueued for later execution and being made available for stealing by other threads. This is called ‘parent stealing’.<br />
Library implementations of work stealing use ‘child stealing’. In this illustration, child stealing would mean that child1() would be<br />
enqueued for later evaluation and be made available for other workers to steal. However, only parent stealing is equivalent to the execution of<br />
the sequential program.<br />
Here is an example with a small code fragment that actually does something. Assume you have a ternary tree, in which nodes points to left, middle and right child nodes, and in addition have a color.  A linked list of all red nodes can be constructed with a recursive traversal of the tree, where each red node gets pushed onto a forming linked list. When parallelizing the recursion, care has to be taken not to create a data race when pushing nodes onto the global linked list. The recommended way to resolve the data race in Cilk Plus is to use a hyper object for the linked list. The hyper object provides a local view for each strand. A possible implementation of the parallelized recursion is here:</p>
<p><span style="font-size: x-small;">cilk::reducer_list_append&lt;terntreenode *&gt; root;<br />
</span><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">void<br />
</span></span><span style="font-size: x-small;">find_reds_par(terntreenode *p)<br />
{<br />
      <span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">if</span></span><span style="font-size: x-small;"> (p-&gt;color == red) {<br />
        </span></span><span style="font-size: x-small;">root.push_back(p);<br />
      }<br />
      <span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">if</span></span><span style="font-size: x-small;"> (p-&gt;left) cilk_spawn find_reds_par(p-&gt;left);<br />
      </span></span><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">if</span></span><span style="font-size: x-small;"> (p-&gt;middle) cilk_spawn find_reds_par(p-&gt;middle);<br />
      </span></span><span style="font-size: x-small;"><span style="color: #0000ff; font-size: x-small;"><span style="color: #0000ff; font-size: x-small;">if</span></span><span style="font-size: x-small;"> (p-&gt;right) find_reds_par(p-&gt;right);<br />
</span></span><span style="font-size: x-small;">}</span>    </p>
<p>The cilk Plus parallel traversal will produce the same linked list, with the same order of nodes, as the serial elision of the program.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>STM Compiler</title>
		<link>http://software.intel.com/en-us/blogs/2007/09/16/stm-compiler/</link>
		<comments>http://software.intel.com/en-us/blogs/2007/09/16/stm-compiler/#comments</comments>
		<pubDate>Mon, 17 Sep 2007 05:09:50 +0000</pubDate>
		<dc:creator>Robert Geva (Intel)</dc:creator>
				<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Intel® C++ STM Compiler]]></category>
		<category><![CDATA[What If]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2007/09/16/stm-compiler/</guid>
		<description><![CDATA[Hello, I'd like to welcome you to our STM portion of the whatif site. As Intel is moving towards multi core processors, we are committed to provide development tools to help programmers exploit the parallelism available in these processors. We view transactional memory as a part of the solution. Here we offer a prototype implementation [...]]]></description>
			<content:encoded><![CDATA[<p>Hello, I'd like to welcome you to our STM portion of the <em>whatif</em> site.</p>
<p>As Intel is moving towards multi core processors, we are committed to provide development tools to help programmers exploit the parallelism available in these processors.</p>
<p>We view transactional memory as a part of the solution.</p>
<p>Here we offer a prototype implementation of our C/C++ compiler product with the addition of transactional programming constructs. (I will refer to it as a transactional compiler).</p>
<p>While a lot of work has been done in the area of transactional memory, we expect that yet more work is needed. Therefore, we are not providing a compiler with transactional programming as a product. Instead, we are making it available through our <em>whatif</em> site to encourage experimentations and to seek feedback on our choices.</p>
<p>The compiler available here is a result of collaboration between Intel's researchers and product teams. Practitioners in the area of transactional programming are well aware of the many contributions of Intel's researchers, and these are the researchers who participated in developing this compiler. Many professional programmers are also aware of Intel's development tools, the C/C++ compiler supporting OpenMP, Intel performance library and others. Members of this product team participated as well in the development of this compiler.</p>
<p>The current compiler provides the basic construct you would expect to express a transaction. Our syntax for a transaction is <em>__tm_atomic<a href="#_ftn1"><strong>[1]</strong></a> statement</em>;'. Our implementation allows function calls within transactions. As we work in the IA32 eco system, we had two goals, that are particular to this domain: maintain the programmer's ability to control the size of the application, and allow them to leverage existing investments in IA32 SW packages. To that end, we provide facilities to express that a function will only be called within transactions, and alternatively to express that a function will be called both inside and outside of transactions. In the latter case, the compiler will generate two different translations of those functions. One that includes the transactional barriers for loads and stores, and one that does not. We also allow calling of functions that are not part of the current compilation and that were not compiled with transactional semantics. That allows a SW developer to use exciting packages where the source code is not available for recompilation. The addition to these constructs to the language made it harder to add some other constructs, mainly the "<em>retry</em>" and "<em>abort</em>" statements. While some would expect these to be a part of a transactional compiler, our initial view is that adding all of these constructs would results in a complex language. We therefore chose to not provide support for the <em>retry</em> and <em>abort</em> statements. We are seeking feedback on these, as well as some other choices.</p>
<p>Our goal in making available this early version of our transactional compiler is to encourage use of transactional programming and provide feedback on the usability of our language constructs. The implementation of these constructs was shown to provide acceptable performance on some programs we used internally. However, we do not expect it to perform consistently and we expect that some will see reduced performance when compared to other implementation alternatives. We chose to make the compiler available early rather than wait for a high performance implementation, which we hope to provide at a later time.</p>
<p>Thanks for your interest in our prototype compiler.</p>
<p>Robert Geva.</p>
<p><a name="_ftn1" title="_ftn1"></a>[1] We expect that once transactions are integrated into the C/C++ languages, the keywords will be friendlier. For the prototype implementation, our concern was to minimize the risk of name clashing with program variables.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2007/09/16/stm-compiler/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

