<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogs &#187; Server</title>
	<atom:link href="http://software.intel.com/en-us/blogs/category/server/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Fri, 25 May 2012 22:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Intel Announces New Software and Services Investments in Brazil</title>
		<link>http://software.intel.com/en-us/blogs/2012/05/14/intel-announces-new-software-and-services-investments-in-brazil/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/05/14/intel-announces-new-software-and-services-investments-in-brazil/#comments</comments>
		<pubDate>Tue, 15 May 2012 04:40:00 +0000</pubDate>
		<dc:creator>Lauren Dankiewicz (Intel)</dc:creator>
				<category><![CDATA[Android]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Intel SW Partner Program]]></category>
		<category><![CDATA[Intel® AppUp Developer Program]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Brazil]]></category>
		<category><![CDATA[brazilian software]]></category>
		<category><![CDATA[business and marketing]]></category>
		<category><![CDATA[idf]]></category>
		<category><![CDATA[Intel Software Partner Program]]></category>
		<category><![CDATA[international expansion]]></category>
		<category><![CDATA[software investment]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/05/14/intel-announces-new-software-and-services-investments-in-brazil/</guid>
		<description><![CDATA[<a href="http://software.intel.com/en-us/blogs/2012/05/14/intel-announces-new-software-and-services-investments-in-brazil/"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/IDF-Brazil-Banner-300x37.gif" alt="IDF Brazil 2012 Software and Services" title="IDF Brazil Banner" width="450" height="55.5" class="alignleft size-medium wp-image-47651" /></a>
<br /></br>
<br /></br>
Intel is accelerating the growth of Brazil’s software industry by making strategic investments in independent software vendors, developers, universities, technology parks, and government IT agencies. 
Today at <a href="http://software.intel.com/en-us/blogs/2012/05/11/intel-developer-forum-is-coming-to-so-paulo-brazil/">Intel Developer Forum (IDF) in Brazil</a>, Intel announced that the Intel® Software Partner Program and four Intel® Software Network developer communities are <a href="http://software.intel.com/partner/home?locale=pt-BR">launching in Portuguese</a>. The<a href="http://software.intel.com/"> Intel Software Network </a>provides hundreds of technical documents and guidance on how to maximize software performance on Intel® architecture. The <a href="http://software.intel.com/partner">Intel® Software Partner Program</a> helps companies develop and market commercial applications optimized for Intel® technologies.
<br /></br>
<strong><a href="http://software.intel.com/en-us/blogs/2012/05/14/intel-announces-new-software-and-services-investments-in-brazil/">Read more...</a></strong>]]></description>
			<content:encoded><![CDATA[<p><a href="http://software.intel.com/en-us/business-network/"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/BacktotheSoftwareBusinessNetwork.png" alt="Back to the Software Business Network" title="Back to the Software Business Network" /></a><br /></br><br /><a href="https://www-ssl.intel.com/content/www/us/en/intel-developer-forum-idf/sao-paulo/idf-2012-sao-paulo.html?"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/IDF-2012-Sky-is-the-baseline-300x291.jpg" alt="IDF Brazil 2012 - The Sky is the Baseline" title="IDF Brazil 2012 - The Sky is the Baseline" width="300" height="291" class="alignright size-medium wp-image-47650" /></a><strong>Investing in Brazil's Software Ecosystem</strong><br />
Intel is accelerating the growth of Brazil’s software industry by making strategic investments in independent software vendors, developers, universities, technology parks, and government IT agencies. </p>
<p>Today at <a href="http://software.intel.com/en-us/blogs/2012/05/11/intel-developer-forum-is-coming-to-so-paulo-brazil/">Intel Developer Forum (IDF) in Brazil</a>, Intel announced that the Intel® Software Partner Program and four Intel® Software Network developer communities are <a href="http://software.intel.com/partner/home?locale=pt-BR">launching in Portuguese</a>. The<a href="http://software.intel.com/"> Intel Software Network </a>provides hundreds of technical documents and guidance on how to maximize software performance on Intel® architecture. The <a href="http://software.intel.com/partner">Intel® Software Partner Program</a> helps companies develop and market commercial applications optimized for Intel® technologies.</p>
<p>Each community has a local community manager who will work with developers across the country to help build best-in-class solutions and end-user experiences. </p>
<ul>
<li>George Silva: Community manager for Ultrabook™, Consumer Client, and Android* Developer Communities</li>
<li>Jomar Silva: Intel® vPro™ Developer Community </li>
<li>Luciano Palma: Server Community and Parallel Programming Developer Community</li>
</ul>
<p><strong>Opportunities for Brazilian Software Companies to Partner with Intel</strong><br />
The Intel Software Partner Program will provide local marketing and sales support to drive campaigns with Brazil’s 300,000+ independent software vendors and 73,000+ software and services companies. Launching the program in Portuguese is an important step to connecting with Brazilian software companies for these new campaigns. </p>
<p><strong>Future Investments in Brazil</strong><br />
Intel’s software programs are focused on building strong businesses through best-of-class products. In 2012, Intel will grow its in-country foot print as well as its corporate support teams to drive programs with universities and technology parks.  </p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/05/14/intel-announces-new-software-and-services-investments-in-brazil/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deterministic Reduction: a new Community Preview Feature in Intel® Threading Building Blocks</title>
		<link>http://software.intel.com/en-us/blogs/2012/05/11/deterministic-reduction-a-new-community-preview-feature-in-intel-threading-building-blocks/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/05/11/deterministic-reduction-a-new-community-preview-feature-in-intel-threading-building-blocks/#comments</comments>
		<pubDate>Fri, 11 May 2012 10:22:42 +0000</pubDate>
		<dc:creator>Alexei Katranov (Intel)</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Computer Arithmetic]]></category>
		<category><![CDATA[deterministic calculations]]></category>
		<category><![CDATA[floating point]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[parallel_deterministic_reduce]]></category>
		<category><![CDATA[parallel_reduce]]></category>
		<category><![CDATA[TBB]]></category>
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/05/11/deterministic-reduction-a-new-community-preview-feature-in-intel-threading-building-blocks/</guid>
		<description><![CDATA[Computer Arithmetic has a lot of peculiarities [1]. One of these pitfalls is associativity failure in floating point arithmetic. For example, the two sums of fractions calculations below will not produce the same result when using floats: In a sequential program, it is not a big problem since the calculation order is exactly specified so [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Computer Arithmetic has a lot of peculiarities <a title="What every computer scientist should know about floating-point arithmetic, David Goldberg, Xerox Palo Alto Research Center, Palo Alto, CA, 1991." href="http://dx.doi.org/10.1145/103162.103163">[1]</a>. One of these pitfalls is associativity failure in floating point arithmetic. For example, the two sums of fractions calculations below will not produce the same result when using <code>float</code>s:</p>
<h4 style="text-align: center;"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/formula.png"><img class="size-large wp-image-47370 aligncenter" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/formula-1024x219.png" alt="The sum of fractions depend on the calculation order" width="461" height="99" align="middle" /></a></h4>
<p style="text-align: justify;">In a sequential program, it is not a big problem since the calculation order is exactly specified so the result is predictable and repeatable. The situation is not so clear in parallel programming.</p>
<p style="text-align: justify;">To make the example parallel, I used the parallel_reduce template function from Intel® Threading Building Blocks (Intel® TBB):</p>
<pre name="code" class="cpp:nocontrols">std::vector&lt;float&gt; arr( N, 1.0f/(float)N );
float sum = tbb::parallel_reduce( tbb::blocked_range( arr.begin(), arr.end() ), 0.0f,
    []( const tbb::blocked_range&amp; r, float sum ) {
        return std::accumulate( r.begin(), r.end(), sum );
    },
    std::plus&lt;float&gt;() );
std::cout &lt;&lt; sum &lt;&lt; std::endl;</pre>
<p style="text-align: justify;">As in the examples above, the code calculates the sum of N fractions, but it uses multiple processor cores if available. As it is well known, we face a disappointing fact of different results being possible for different orders of calculations. If we run it 10 times and N=1000 we will get something like this:</p>
<blockquote><p>0.999991<br />
1<br />
0.999999<br />
0.999996<br />
0.999998<br />
0.999998<br />
0.999998<br />
1<br />
0.999997<br />
0.999998</p></blockquote>
<p style="text-align: justify;">It’s worth mentioning that the result differs from run to run! In spite of the fact that the developer specifies the calculations – when it is calculated in parallel the order of calculation gets out of control.</p>
<p style="text-align: justify;">On the other hand, it is not as bad as all that. Although the OS operates on threads and fills the application with indeterminism, it is still possible to manage the order of calculations. One of the new features of Intel TBB 4.0 is the parallel_deterministic_reduce template algorithm. The algorithm has the same interface as parallel_reduce except that it does not allow you to specify a partitioner. (For parallel_reduce it is possible to pass a partitioner as the last argument.) We will discuss why this restriction exists later. But for now, let’s replace the parallel_reduce with parallel_deterministic_reduce and look at how the result changes:</p>
<pre name="code" class="cpp:nocontrols">std::vector&lt;float&gt; arr( N, 1.0f/(float)N );
float sum = tbb::parallel_deterministic_reduce( tbb::blocked_range( arr.begin(), arr.end() ), 0.0f,
    []( const tbb::blocked_range&amp; r, float sum ) {
        return std::accumulate( r.begin(), r.end(), sum );
    },
    std::plus&lt;float&gt;() );
std::cout &lt;&lt; sum &lt;&lt; std::endl;</pre>
<p>Again run it 10 times:</p>
<blockquote><p>1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1</p></blockquote>
<p style="text-align: justify;">The key point here is that the result is the same from run to run.</p>
<p style="text-align: justify;">The sources of non-determinism in parallel_reduce derive from partitioning and body splitting. Let’s consider each of these subjects:</p>
<ul style="text-align: justify;">
<li>Partitioning. The simple_partitioner determines exactly how many and which subranges are created. It splits the iteration range until each subrange is smaller than a given grain size. Thus the behavior only depends on the range size and grain size specified by the developer. However, other types of partitioning in Intel TBB are non-deterministic: to improve performance of the algorithms, range splitting provided by these partitioners depends on run-time stealing events, which we cannot predict.</li>
</ul>
<ul style="text-align: justify;">
<li>Body splitting. For performance reasons parallel_reduce minimizes body copies: it splits the body only when consecutive subranges are processed by different threads. Thus body splitting, like “advanced” partitioning, also depends on non-deterministic task stealing.</li>
</ul>
<p style="text-align: justify;">The example shows that parallel_reduce is really inapplicable for non-associative operations like floating point arithmetic. To achieve a repeatable result from a reduction with non-associative operations parallel_deterministic_reduce has been developed. From the considerations of partitioning (given above), it follows that only the simple_partitioner can be used for parallel_deterministic_reduce; and thus, no choice of an alternative partitioner is possible. Consequently, parallel_deterministic_reduce always challenges us with choosing an appropriate grain size. And smart body splitting has been disabled for the sake of deterministic behavior, so for each subrange a new body is created. This fact complicates the challenge of grain size selection even more: on the one hand, a small grain size increases the number of body copying and overall overhead, but on the other hand, a big grain size may lead to imbalance and underutilization. Fig. 1 shows the relative performance of parallel_deterministic_reduce (simple_partitioner with various grain sizes) in comparison with parallel_reduce (auto_partitioner with default grain size). An appropriate grain size provides the same performance of parallel_deterministic_reduce as parallel_reduce, - but an incorrectly chosen grain size may lead to significant performance degradation, as shown in Fig.1 at the extremes of the grain size axis.</p>
<h4 style="text-align: center;"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/chart.png"><img class="aligncenter size-full wp-image-47423" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/chart.png" alt="Fig.1. Comparison of parallel_reduce (auto_partitioner) and parallel_deterministic_reduce (simple_partitioner) on Pi calculation example." width="640" height="383" /></a><br />
Fig.1. Comparison of parallel_reduce (auto_partitioner) and parallel_deterministic_reduce (simple_partitioner) on Pi calculation example.</h4>
<p style="text-align: justify;">To demonstrate the split-join order behavior of parallel_deterministic_reduce, a small example is given with range [0, 20) and grain size = 5, similar to examples for parallel_reduce in the Intel TBB Reference manual:</p>
<h4 style="text-align: center;"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/tree.png"><img class="aligncenter size-full wp-image-47427" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/05/tree.png" alt="A tree of subranges" width="410" height="141" /></a><br />
A tree of subranges</h4>
<p style="text-align: justify;">For each right node a new body is created by the body split constructor. The slash marks (/) in the tree show where the body split is performed. Thus, for the current example the parallel_deterministic_reduce will always produce 4 subranges and 4 different bodies associated with them. Each of these subranges may be executed in parallel. When both children of a node finish, the corresponding bodies are merged: the right child body “added” to the left child body (in our examples via the <code>std::plus&lt;float&gt;()</code> binary function).</p>
<p style="text-align: justify;">To conclude, parallel_deterministic_reduce provides a deterministic number and deterministic sizes of subranges, and it exactly defines which pairs of subranges are merged. It’s important to note that a repeatable result obtained with parallel_deterministic_reduce may still be different from that obtained via serial execution. Moreover, the results may be different for various grain sizes, since range splitting depends on the grain size. Also, the algorithm is not targeted to improve the accuracy of calculations. The exact result of 1 in the above example of fraction sum calculation has been obtained by chance. For other examples the algorithm can cause a decrease in accuracy. Overall, parallel_deterministic_reduce is not a replacement to parallel_reduce but an alternative solution for those who need repeatability.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/05/11/deterministic-reduction-a-new-community-preview-feature-in-intel-threading-building-blocks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Intel(R) Cluster Studio XE toolkit for Server Developers</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/27/new-intelr-cluster-studio-xe-toolkit-for-server-developers/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/27/new-intelr-cluster-studio-xe-toolkit-for-server-developers/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 17:39:45 +0000</pubDate>
		<dc:creator>Mike Pearce (Intel)</dc:creator>
				<category><![CDATA[Server]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/27/new-intelr-cluster-studio-xe-toolkit-for-server-developers/</guid>
		<description><![CDATA[Late last year  Intel  released Intel® Cluster Studio XE, a series of  tools designed expressly for Server software developers. Please check out James Reinders' blog on the Subject - "Ready for 2X Moore's Law: Intel Cluster Studio XE,"  for more details. -Mike]]></description>
			<content:encoded><![CDATA[<p>Late last year  Intel  released <a href="http://software.intel.com/en-us/articles/intel-cluster-studio-xe/">Intel® Cluster Studio XE</a>, a series of  tools designed expressly for Server software developers. Please check out James Reinders' blog on the Subject - "<a href="http://software.intel.com/en-us/blogs/2011/11/08/ready-for-2x-moores-law-intel-cluster-studio-xe/">Ready for 2X Moore's Law: Intel Cluster Studio XE</a>,"  for more details.</p>
<p>-Mike</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/27/new-intelr-cluster-studio-xe-toolkit-for-server-developers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel Announces the New Intel® SDK for OpenCL* Applications 2012</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 11:38:33 +0000</pubDate>
		<dc:creator>Arnon Peleg (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA["Intel OpenCL SDK"]]></category>
		<category><![CDATA["Intel OpenCL"]]></category>
		<category><![CDATA[openCL]]></category>
		<category><![CDATA[vcsource_product_oclsdk]]></category>
		<category><![CDATA[vcsource_type_event]]></category>
		<category><![CDATA[vcsource_type_news]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/</guid>
		<description><![CDATA[In support of the recent announcement of the 3rd Generation Intel® Core™ Processors, Intel has released the Intel® SDK for OpenCL* Applications 2012. For the first time, OpenCL* developers using Intel® architecture can utilize compute resources across both Intel® Processors and Intel® HD Graphics Driver 4000/2500]]></description>
			<content:encoded><![CDATA[<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/OpenCL_Logo_RGB.jpg"><img class="size-thumbnail wp-image-47080 alignnone" title="OpenCL_Logo_RGB" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/OpenCL_Logo_RGB-150x150.jpg" alt="" width="64" height="64" /></a></p>
<p>In support of the recent announcement of the<a href="http://www.intel.com/content/www/us/en/processors/core/core-processor-family.html"> 3<sup>rd</sup> Generation Intel® Core™ Processors</a>, Intel has released the Intel® SDK for OpenCL* Applications 2012. For the first time, OpenCL* developers using Intel® architecture can utilize compute resources across both Intel® Processors and Intel® HD Graphics Driver 4000/2500</p>
<p>From a person who, for the last couple of years has closely followed the emergence of the OpenCL standard, this announcement was something worth waiting for.  Less than a year ago, on this blog, I posted the news that the <a title="Permanent Link to Intel® OpenCL SDK 1.1 gold released" href="http://software.intel.com/en-us/blogs/2011/06/29/intel-opencl-sdk-11-gold-released/">Intel® OpenCL SDK 1.1 gold  was released</a>,  This was the first production OpenCL implementation from Intel targeting Intel® processors on Windows* OS. This current announcement is special, the Intel SDK for OpenCL Applications 2012 now supports not only the CPU but also the Intel HD Graphics 4000/2500 for Windows* 7 users.  We’ve come a long way in a year.</p>
<p style="text-align: center;"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/product_overview.jpg"><img class="aligncenter size-medium wp-image-47079" title="product_overview" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/product_overview-300x300.jpg" alt="Introducing the Intel(R) SDK For OpenCL* Applications" width="170" height="170" /></a></p>
<p>OpenCL <a href="http://www.intel.com/content/www/us/en/processors/core/core-processor-family.html">on the 3<sup>rd</sup> Generation Intel® Core Processor Family</a> extends Intel’s line of tools and APIs on Intel platforms and adds interoperability with other graphics APIs like DirectX*, OpenGL* and Intel® Media SDK, directly on the Intel HD Graphics device.</p>
<p>So what else is new in this release?</p>
<ul>
<li>A Single OpenCL* platform enables shared context for OpenCL applications running on both the CPU and Intel HD Graphics 4000/2500. The OpenCL platform with both CPU and HD Graphics devices is available seamlessly on the <a href="http://www.intel.com/p/en_US/support/detect/graphics">Intel® HD Graphics Drivers</a>.</li>
<li>Interoperability with the <a href="http://www.intel.com/software/mediasdk">Intel Media SDK</a> with no memory copy overhead</li>
<li>Improved performance for OpenCL applications running on Intel® Xeon® Processors and Intel® Core™ Processors. This CPU support is also available for Linux* OS developers.</li>
<li>Intel® SDK for OpenCL* applications development tools includes an offline compiler and a step-by-step OpenCL Kernel debugger (for CPU) integrated in Microsoft Visual Studio* 2010 integrated development environment.</li>
<li>10 OpenCL code samples, three of them new, are now available for independent download.</li>
</ul>
<p>The list above is just a sample of what is available with this new SDK. I recommend you read <a href="http://software.intel.com/file/43384">the product brief</a> or watch the <a href="http://software.intel.com/en-us/videos/channel/visual-computing/new-intel%C2%AE-sdk-for-opencl-applications-2012/1571382381001">introduction video</a> to get started with this new SDK.</p>
<p><strong>Download the SDK for free at <a href="http://www.intel.com/software/opencl">www.intel.com/software/opencl</a> and begin optimizing your applications for the 3<sup>rd</sup>Generation Intel® Core™ Processors today.</strong></p>
<p>Don’t forget to follow us on Twitter at <a href="https://twitter.com/#!/IntelOpenCL">@IntelOpenCL</a></p>
<p>&nbsp;</p>
<p style="text-align: center;"><object codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,47,0" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" height="300" width="345" id="flashObj"><param value="http://c.brightcove.com/services/viewer/federated_f9?isVid=1" name="movie" /><param value="#FFFFFF" name="bgcolor" /><param value="videoId=1571382381001&amp;playerID=741496470001&amp;playerKey=AQ~~,AAAArH1stHk~,LuRqJUw7MaeYQkat5frTpWWPINh71g7p&amp;domain=embed&amp;dynamicStreaming=true" name="flashVars" /><param value="http://admin.brightcove.com" name="base" /><param value="false" name="seamlesstabbing" /><param value="true" name="allowFullScreen" /><param value="true" name="swLiveConnect" /><param value="always" name="allowScriptAccess" /><embed pluginspage="http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash" allowscriptaccess="always" swliveconnect="true" allowfullscreen="true" type="application/x-shockwave-flash" seamlesstabbing="false" height="300" width="345" name="flashObj" base="http://admin.brightcove.com" flashvars="videoId=1571382381001&amp;playerID=741496470001&amp;playerKey=AQ~~,AAAArH1stHk~,LuRqJUw7MaeYQkat5frTpWWPINh71g7p&amp;domain=embed&amp;dynamicStreaming=true" bgcolor="#FFFFFF" src="http://c.brightcove.com/services/viewer/federated_f9?isVid=1"></embed></object></p>
<p>&nbsp;</p>
<p><strong><a href="https://twitter.com/#!/IntelOpenCL"></a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sweet 16?</title>
		<link>http://software.intel.com/en-us/blogs/2012/02/06/sweet-16/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/02/06/sweet-16/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 21:59:29 +0000</pubDate>
		<dc:creator>Clay Breshears (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[Server]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/02/06/sweet-16/</guid>
		<description><![CDATA[Have we already hit the maximum number of cores that can be put in our processors? Or have the needs of the user and developer communities been served at sixteen cores?]]></description>
			<content:encoded><![CDATA[<p>I just saw the article "<a title="AMD calls end to core growth on server chips" href="http://news.techworld.com/data-centre/3334884/amd-calls-end-ot-core-growth-on-server-chips/">AMD calls end to core growth on server chips</a>" at Techworld.com. The gist of the article is that AMD has decided to produce server chips with no more than 16 cores. There were some interesting future directions outlined and hinted at by the end of the article, too.</p>
<p>What seemed most disturbing to me was the limit on the number of cores being self-inflicted. Surely we can't have reached the maximum number of cores that are possible to squeeze onto a chip? The whole "right turn" idea to add cores rather than try to cool processors reaching rocket engine temperatures was less than 10 years ago. I'm not sure where the physics starts to overshadow Moore's Law, but I thought I'd  heard that a few more generations of smaller wire sizes in processor dies were still possible. So why not push more and more cores into the same package?</p>
<p>It might be that the average server application (and, perhaps even more so, consumer applications) can't scale well beyond some fixed number of cores. How many cores does it take to type and post a tweet or update your Facebook status or to watch a streaming video? Would any of those tasks be faster or somehow enhanced if there were twice the number of cores available?</p>
<p>If we stop increasing the core counts in the next 5 years, how will new chips keep fulfilling the ever-growing hunger for more performance by consumers? Maybe it won't be about faster and faster application exeuction, but more about less energy consumption while maintaining a level of performance. I guess at some point we'll stop being concerned about Gigahertz or core counts because all processors will be able to do many of the same tasks in about the same amount of time.</p>
<p>I do know that power consumption is going to be a major driving design force as HPC moves closer toward Exascale platforms.  Thus, if the THX-1138 processor draws power twice as fast as the CFM602 processor, I would be more likely to build my system equipped with the former.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/02/06/sweet-16/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MIC: Stepping-stone to Quantum Computing?</title>
		<link>http://software.intel.com/en-us/blogs/2011/12/14/mic-stepping-stone-to-quantum-computing/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/12/14/mic-stepping-stone-to-quantum-computing/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 17:59:41 +0000</pubDate>
		<dc:creator>Clay Breshears (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[MIC]]></category>
		<category><![CDATA[QRAM]]></category>
		<category><![CDATA[quantum computation]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/12/14/mic-stepping-stone-to-quantum-computing/</guid>
		<description><![CDATA[I was reading Quantum Computing for Computer Scientists by Noson S. Yanofsky and Mirco A. Mannucci while I was on the treadmill last night. I started out reading the description of Shor's algorithm (for factoring integers) and thought that implementing this on a classical computer (in parallel, of course) would make an interesting problem for the Intel [...]]]></description>
			<content:encoded><![CDATA[<p>I was reading <em><a href="http://www.cambridge.org/us/knowledge/isbn/item1174708/?site_locale=en_US">Quantum Computing for Computer Scientists</a> </em>by Noson S. Yanofsky and Mirco A. Mannucci while I was on the treadmill last night. I started out reading the description of <a href="http://en.wikipedia.org/wiki/Shor%27s_algorithm">Shor's algorithm</a> (for factoring integers) and thought that implementing this on a classical computer (in parallel, of course) would make an interesting problem for the Intel Threading Challenge contest.</p>
<p>But what really caught my imagination was the first section of Chapter 7, "Programming Languages," that briefly described the Quantum Random Access Machine (QRAM) model of quantum computation. In addition to the few paragraphs that were devoted to this model, there was a picture that showed the relationship of a classic computer to a quantum computing device. Each part was simply a box with data/instructions passed from the classic to the quantum and data (results) passed from quantum to the classic side.</p>
<p>This setup looked familiar and it came to me during my cool down: this is how a system equipped with MIC would work. That is, your Intel Core processor does some initial computation to set up data, the data is passed over to the MIC (along with the computation instructions to be executed), and the results from the MIC can be returned to the Core side for use.</p>
<p>I know that MIC processors (and other GPU-like devices) don't have the same computational power as a quantum processor could have. However, the data-parallel and SIMD execution modes are similar to how a quantum device could take a superposition of all potential input data and execute a single computation step to arrive at a measurable result. This similarity got me thinking that MIC devices could be the first steps taken by the industry to better understand, prepare for and program effective quantum computations.</p>
<p>I don't know if we will ever see commodity quantum computation devices. I doubt they'll be developed within my lifetime, at least. Even so, I am nothing short of astounded when I look back at how far computer technology has come since I wrote my first COBOL program on an IBM mainframe. </p>
<p>Knowing I should "never say never," how about on the day after I get my qPad(TM) quantum tablet device, I come back and comment on this blog post to say I was mistaken about how quickly quantum computation entered our lives? If it's anywhere in the cloud-o-sphere (and you know once these bits get pushed out, they never go away), I'll find it with the qSearch app, which will be based on the algorithm outlined in section 6.4 of Yanofsky's and Mannucci's book.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/12/14/mic-stepping-stone-to-quantum-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FLASH: Why haven&#039;t we seen this sooner?</title>
		<link>http://software.intel.com/en-us/blogs/2011/12/12/flash-why-havent-we-seen-this-sooner/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/12/12/flash-why-havent-we-seen-this-sooner/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 18:32:32 +0000</pubDate>
		<dc:creator>Clay Breshears (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[Server]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/12/12/flash-why-havent-we-seen-this-sooner/</guid>
		<description><![CDATA[I saw an announcement of the Gordon supercomputer in an online Wired article. What made the new installation at the San Diego Supercomputer Center (SDSC) noteworthy wasn't the size of the machine or that the machine debuted at #48 on the TOP500 list. No, it was the fact that Gordon is the world's first supercomputer [...]]]></description>
			<content:encoded><![CDATA[<p>I saw an announcement of the <a href="http://www.wired.com/wiredenterprise/2011/12/gordon-supercomputer/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+wired%2Findex+%28Wired%3A+Index+3+%28Top+Stories+2%29%29">Gordon supercomputer</a> in an online Wired article. What made the new installation at the San Diego Supercomputer Center (SDSC) noteworthy wasn't the size of the machine or that the machine debuted at #48 on the TOP500 list.</p>
<p>No, it was the fact that Gordon is the world's first supercomputer that uses flash memory instead of disk drives. This is the world’s largest thumb drive (300 Terabytes across 1024 Intel 710 series drives), as Allan Snavely, SDSC Associate Director, noted. There are quite a few advantages for using flash memory drives in place of spinning disks including lower power consumption, lower latency to access data, and fewer moving parts that can have mechanical failure. This just seems so logical that I'm surprised it hadn't happened sooner.</p>
<p>On a coincidental note, I saw a report earlier today that stated <a href="http://www.crn.com/news/components-peripherals/232300356/intel-cuts-q4-revenue-forecast-as-hard-drive-shortage-continues.htm;jsessionid=wuhovW-ybu7I-FqOBknOww**.ecappj02">Intel was lowering Q4 revenue projections</a> due to a drop in microprocessor demand from PC manufacturers. This is a direct result of not having enough disk drives available, which was caused by flooding in Thailand earlier this year.</p>
<p>I would think that the expected hard drive shortage will open the doors for wider adoption of SSD drives in the PC market and provide a hefty revenue stream for companies that can supply those drives. Big-scale projects like SCSD's Gordon computer just reinforce the efficacy of SSDs in desktops and laptops and even smaller form factors.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/12/12/flash-why-havent-we-seen-this-sooner/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MIC architecture support by software tools - SC11 wrap-up</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 00:24:45 +0000</pubDate>
		<dc:creator>James Reinders (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Knights Corner]]></category>
		<category><![CDATA[Knights Ferry]]></category>
		<category><![CDATA[MIC]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/</guid>
		<description><![CDATA[This week we demonstrated the Knights Corner co-processor at SC11 and we had many developers demonstrating real results with the prototype systems. During the "SC11 season," a number of tool vendors announced they will be providing versions of their software tailored to supporting MIC architecture, starting with the Knights Corner co-processor. Here are the ones I know [...]]]></description>
			<content:encoded><![CDATA[<p>This week we demonstrated the <a href="http://software.intel.com/en-us/blogs/tag/knights-corner/">Knights Corner</a> co-processor at <a href="http://sc11.supercomputing.org/">SC11</a> and we had many developers demonstrating real results with the prototype systems.</p>
<p>During the "SC11 season," a number of tool vendors announced they will be providing versions of their software tailored to supporting MIC architecture, starting with the <a href="http://software.intel.com/en-us/blogs/tag/knights-corner/">Knights Corner</a> co-processor.</p>
<p>Here are the ones I know about and can share (there are more who will make their own announcement in the future):</p>
<ul>
<li><a href="http://www.roguewave.com/company/news-events/press-releases/2011/rw_mic_support.aspx">IMSL Library, Rogue Wave</a></li>
<li><a href="http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/">NAG Libraries, NAG Ltd.</a></li>
<li><a href="http://www.platform.com/press-releases/2011/PlatformAnnouncesSupportforIntelManyIntegratedCoreArchitectureBasedProducts%20/">Platform HPC, Platform LSF and Platform Cluster Manager, Platform Computing</a></li>
<li><a href="http://www.altair.com/newsdetail.aspx?news_id=10609&amp;news_country=en-US">PBS Works, Altair</a></li>
<li><a href="http://www.roguewave.com/company/news-events/press-releases/2011/rw_mic_support.aspx">Totalview debugger, Rogue Wave</a></li>
</ul>
<p><em>[editor's note... additional announcements (post-SC11) include:</em></p>
<ul>
<li><a href="http://www.caps-entreprise.com/fr/page/index.php?id=85">CAPS directive-based HMPP compiler, CAPS</a></li>
</ul>
<p><em>]</em></p>
<p>Of course... <a href="http://intel.com/software/products">Intel tools</a> as well, many of which we on display at the show in conjunction with Knights Ferry platforms.</p>
<p>There were also countless applications, many open source, that have been recompiled for MIC architecture and were being discussed around the show. Some I remember are NWChem, ENZO, ELK, MADNESS, MPI, GA, and Python... and I know I forget quite a few. Of course, Linux has been ported (and is running on both Knights Ferry and Knights Corner).</p>
<p>For additional tools announcements, please let me know, or post a comment!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>quick chat about MIC architecture with Mike Dewar, NAG</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 23:37:39 +0000</pubDate>
		<dc:creator>James Reinders (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[MIC]]></category>
		<category><![CDATA[NAG]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/</guid>
		<description><![CDATA[I ran into Mike Dewar at SC11 today as the exhibition draws to a close.  Mike is the CTO of NAG Ltd. - a company we've had the good fortune to work with for years. NAG is one of a handful of companies that have been providing feedback on our Knights Ferry (prototype MIC architecture). [...]]]></description>
			<content:encoded><![CDATA[<p>I ran into Mike Dewar at <a title="SC11" href="http://sc11.supercomputing.org">SC11</a> today as the exhibition draws to a close.  Mike is the CTO of <a href="http://www.nag.com/">NAG Ltd.</a> - a company we've had the good fortune to work with for years.</p>
<p>NAG is one of a handful of companies that have been providing feedback on our Knights Ferry (prototype MIC architecture).</p>
<p>Mike told me: "We found porting existing routines from the NAG Library to the Intel <a href="http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html">Many Integrated Core Architecture</a> (MIC) to be a relatively quick and painless process. The team was impressed at the way Intel has extended their existing software tools to support the MIC environment, allowing them to work in a familiar and productive environment."</p>
<p>I quizzed Mike on what it took to get it running on Knights Ferry, and he did share the one type of tuning they have to explore. Since they use OpenMP which has generally meant that the number of threads is more like 10-20 instead of the 120 threads they use on Knights Ferry.  I'll have to write more about that later - scaling and vectorization are keys as multicore and many-core grow. No mystery there. The good news is that their use of OpenMP made this a straightforward challenge they understood. It was not a mystery to them. They also make good use of <a href="http://intel.com/go/MKL">MKL</a> in their library as well, and of course we support that for MIC architecture.</p>
<p>It is great to know that NAG users will have the opportunity to continue using NAG software with <a href="http://software.intel.com/en-us/blogs/tag/knights-corner/">Knights Corner</a>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

