<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogs &#187; David Mackay (Intel)</title>
	<atom:link href="http://software.intel.com/en-us/blogs/author/david-mackay/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Fri, 25 May 2012 22:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Software Tool Talk!</title>
		<link>http://software.intel.com/en-us/blogs/2012/02/10/software-tool-talk/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/02/10/software-tool-talk/#comments</comments>
		<pubDate>Sat, 11 Feb 2012 02:38:58 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/02/10/software-tool-talk/</guid>
		<description><![CDATA[Hello everyone, Where do you go to learn about the features of Intel Software Development Products?    Soon the answer to this question will be Tool Talk.   Let me tell you about a new program we are starting.   We are creating a new video series called Tool Talk.  I will host these episodes.   Each segment will be about 10 minutes long.  I [...]]]></description>
			<content:encoded><![CDATA[<p>Hello everyone,</p>
<p>Where do you go to learn about the features of Intel Software Development Products?    Soon the answer to this question will be Tool Talk.   Let me tell you about a new program we are starting.   We are creating a new video series called Tool Talk.  I will host these episodes.   Each segment will be about 10 minutes long.  I will introduce a feature of one of the Intel Software Development Products and explain how to use it and what it does.   Feel free to add your comments or come over here to my blog and ask a question.    A week or two after I talk about and demonstrate a feature of Intel Software Development Products I will host one of the Intel lead engineers or architects to talk about the technology or what goes on underneath the hood when you exercise that option.   </p>
<p>The first session is already live!   View it here:  <a title="Tool Talk episode 1 - CilkPlus" href="http://software.intel.com/en-us/videos/channel/parallel-programming/intel%C2%AE-software-development-products-tool-talk-1-parallelization-with-intel-cilk%E2%84%A2plus/1441436906001" target="_blank">Tool Talk Episode 1</a> </p>
<p>In this first session I talk about threading for multi-core platforms with Intel® Cilk™Plus.   Using scalable threading abstractions such as Intel® Cilk™Plus or Intel® Threading Building Blocks is a much easier way to express parallelism and write maintainable code than it is to explictly create and manage threads on your own.    I can not cover everything so if you have a specific question post it here for me or come to the <a title="forums" href="http://software.intel.com/en-us/forums/" target="_blank">Software Product Forums </a>and start a discussion.  Just select the Intel Software Products header and the appropriate section to begin a new thread. </p>
<p>Let me know what topics/features you would like me to cover in a future episode.</p>
<p>-David</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/02/10/software-tool-talk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XE - Time to start talking more about Intel® Parallel Studio XE 2011</title>
		<link>http://software.intel.com/en-us/blogs/2011/05/12/xe-time-to-start-talking-more-about-intel-parallel-studio-xe-2011/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/05/12/xe-time-to-start-talking-more-about-intel-parallel-studio-xe-2011/#comments</comments>
		<pubDate>Thu, 12 May 2011 22:48:15 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/05/12/xe-time-to-start-talking-more-about-intel-parallel-studio-xe-2011/</guid>
		<description><![CDATA[Spring is here and it is well past time for me to get back to the blog page and start sharing ideas and notes. I have been thinking about a new series of blogs and it is time to start writing them. This new series should be a lot of fun. I plan to write [...]]]></description>
			<content:encoded><![CDATA[<p>Spring is here and it is well past time for me to get back to the blog page and start sharing ideas and notes.   I have been thinking about a new series of blogs and it is time to start writing them.  This new series should be a lot of fun.  I plan to write about - Intel(r) Parallel Studio XE.   I will be creating blogs based on discussions and communication with some of the Intel product architects, technical leads and consulting engineers.    The first blog will appear next week and Douglas Armstrong the architect of Intel® VTune™ Amplifier XE.   We will write about using VTune™Amplifier XE to examine the performance of code written with  Intel® Threading Building Blocks.    So watch next week for the new posting.  Let me know what other Intel software products you are interesting hearing more details about.</p>
<p>-David</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/05/12/xe-time-to-start-talking-more-about-intel-parallel-studio-xe-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance Analysis of Threading Building Blocks</title>
		<link>http://software.intel.com/en-us/blogs/2011/05/10/performance-analysis-of-threading-building-blocks/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/05/10/performance-analysis-of-threading-building-blocks/#comments</comments>
		<pubDate>Tue, 10 May 2011 14:27:50 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/05/10/performance-analysis-of-threading-building-blocks/</guid>
		<description><![CDATA[Intel® Threading Building Blocks (TBB) is a popular abstraction for expressing parallelism in C++ software.  The Threading Building Blocks lead to good decomposition for threading.   But do you know how to check how well it is tuned, so you use Threading Building Blocks most effectively. Today Douglas Armstrong, Intel® VTune™ Amplifier XE architect, joins me [...]]]></description>
			<content:encoded><![CDATA[<p>Intel® Threading Building Blocks (TBB) is a popular abstraction for expressing parallelism in C++ software.  The Threading Building Blocks lead to good decomposition for threading.   But do you know how to check how well it is tuned, so you use Threading Building Blocks most effectively.</p>
<p>Today Douglas Armstrong, Intel® VTune™ Amplifier XE architect, joins me to share tips on using VTune Amplifier XE for tuning TBB software.   VTune Amplifier XE has built-in support for helping find and tune the granularity of domain decomposition in TBB.   Douglas feels this is an under-appreciated feature and has captured some screen shots to share with us.   Douglas created a sample TBB application and analyzed it with the concurrency option of VTune Amplifier XE.   Below we show a screenshot of the summary view.  It appears as though we were almost fully parallel the entire time.</p>
<div id="attachment_33873" class="wp-caption alignnone" style="width: 681px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture1-e1305029062211.jpg"><img class="size-full wp-image-33873" title="Summary Concurrency View" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture1-e1305029062211.jpg" alt="" width="671" height="508" /></a><p class="wp-caption-text">Summary concurrency view</p></div>
<p>You may think this means we have almost perfect threading as there are not blocked threads or contention on synchronization. But as we scroll through the full summary display, you'll see the the elapsed time of over 8 1/2 seconds. There is also a new metric here called "Overhead Time" that indicates over 1 1/2 seconds, over 10% of the total CPU time of the app was spent on synchronization overhead. You can even see this CPU time being spent in the TBB internals in the list of the top hotspots of the app. Amplifier XE is even putting up a warning note here saying that we may have a problem with synchronization overhead.</p>
<div id="attachment_33871" class="wp-caption alignnone" style="width: 681px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture2-e1305029207571.jpg"><img class="size-full wp-image-33871" title="Overhead" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture2-e1305029207571.jpg" alt="" width="671" height="471" /></a><p class="wp-caption-text">Overhead Summary</p></div>
<p>Let’s take a look at the results in the “Bottom-up” tab to see what’s going on. If we just look at where the CPU Time is being spent you’ll see that it is all labeled in green, meaning that we were fully utilizing the system’s processors when it was running. About 14 seconds looks productive in do_work. But, notice that some of CPU Time is showing up in TBB functions, this time is also marked as Overhead Time.</p>
<div id="attachment_33874" class="wp-caption alignnone" style="width: 681px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture3.jpg"><img class="size-full wp-image-33874" title="Overhead Time" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture3-e1305036739622.jpg" alt="picture 3" width="671" height="290" /></a><p class="wp-caption-text">Overhead Time</p></div>
<p>If we arrange the data by module, instead of the default  arrangement by function, we can see that there is three fourths of a second of overhead time in the TBB DLL, but there is also almost a second of overhead time in the user module. This is because the TBB header files include many templates which cause inline functions to spend time working on TBB overhead within the user modules. If we arranged the data by source file this is confirmed.</p>
<div id="attachment_33875" class="wp-caption alignnone" style="width: 681px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture4.jpg"><img class="size-full wp-image-33875" title="Module Base" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture4-e1305037001800.jpg" alt="" width="671" height="195" /></a><p class="wp-caption-text">Module Base view</p></div>
<div id="attachment_33876" class="wp-caption alignnone" style="width: 681px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture5.jpg"><img class="size-full wp-image-33876" title="Module Expanded" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture5-e1305037086104.jpg" alt="" width="671" height="405" /></a><p class="wp-caption-text">Arranged by Source File</p></div>
<p>So now that we know we have lots of overhead from using TBB, how do we fix it. Let’s start by figuring out where that overhead is coming from. This test application uses TBB for a couple of different algorithms so we need to find out which one is causing the problem. By looking at the function with the most overhead time we see it is labeled “[TBB parallel_for on class inner_body]”. This is Amplifier XE’s way of telling us that the time was spent processing for a TBB parallel_for template based on the class “inner_body”. We could go check out how we are using that in our source code. By selecting that row, the call stack pane on the very right shows the call stacks associated with this selected overhead. Make sure that the stack type selector combo box is set to “CPU Time” instead of “Wait Time” when looking for overhead. The third row down here refers to line of source code where we called parallel_for() in this inner_body class. We can double click on that and see the actual source code.</p>
<div id="attachment_33877" class="wp-caption alignnone" style="width: 743px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture6.jpg"><img class="size-full wp-image-33877" title="Concurrency Expanded" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture6-e1305037291533.jpg" alt="" width="733" height="209" /></a><p class="wp-caption-text">Concurrency Expanded</p></div>
<div id="attachment_33878" class="wp-caption alignnone" style="width: 753px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture7.jpg"><img class="size-full wp-image-33878" title="Source from concurrency drill down" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture7-e1305037356406.jpg" alt="" width="743" height="207" /></a><p class="wp-caption-text">Source View (Drill Down From Expanded Concurrency)</p></div>
<p>By looking further we find out that a grainsize of 1 was manually specified for this blocked_range with over 8,000 iterations. The appropriate grainsize is based on finding the right balance of doing enough cycles in the body so the ratio of work to overhead is low while still allowing TBB the flexibility it needs to schedule work on different threads to keep them busy. This grain size is too small. A bad setting in the other direction, a grain size too large, could cause imbalance problems with TBB being unable to break up the work and schedule a pieces to another thread. This could show up in Amplifier XE as poor utilization of CPUs.</p>
<p>For now, let’s fix this grainsize to a larger number and see the result. From the summary, it now looks like we have removed this significant TBB overhead and the elapsed time has dropped from 8.7 seconds to 6.9 seconds. So we have used VTune™ Amplifier XE to help us identify poor grainsize selection and improve performance of this software. Let us know if you have other examples you would like to see illustrated with VTune™ Amplifier XE.</p>
<div id="attachment_33879" class="wp-caption alignnone" style="width: 681px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture8.jpg"><img class="size-full wp-image-33879" title="Concurrency Summary View After Tuning" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/05/Picture8-e1305037473303.jpg" alt="" width="671" height="345" /></a><p class="wp-caption-text">Concurrency Summary View after Tuning</p></div>
<p>*All data was collected on a laptop with an Intel® Core™ 2 Duo processor running Microsoft Windows* 7 using Microsoft Visual Studio 2008 SP1, and Intel® VTune™Amplifier XE 2011.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/05/10/performance-analysis-of-threading-building-blocks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel Programming with Intel and Microsoft</title>
		<link>http://software.intel.com/en-us/blogs/2010/09/15/parallel-programming-with-intel-and-microsoft/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/09/15/parallel-programming-with-intel-and-microsoft/#comments</comments>
		<pubDate>Thu, 16 Sep 2010 00:26:19 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/09/15/parallel-programming-with-intel-and-microsoft/</guid>
		<description><![CDATA[Intel is combining with Microsoft to teach a short course (1 day) on parallel programming.   Click here to register for the course.  This is your chance to register and attend.  Last year I piloted a new course  year and we visited four citiies to teach the class in the first half of 2010.   The class [...]]]></description>
			<content:encoded><![CDATA[<p>Intel is combining with Microsoft to teach a short course (1 day) on parallel programming.   Click<a href="http://www.programmers.com/PPI_US/PartnerCenter/partners.aspx?name=Parallelism_Techday" target="_blank"> here </a>to register for the course.  This is your chance to register and attend.  Last year I piloted a new course  year and we visited four citiies to teach the class in the first half of 2010.   The class has been updated and improved, it is now taught jointly by Intel and Microsoft.  The course agenda covers the following topics:<br />
* Thinking in Parallel<br />
* Getting started with Parallelism<br />
* Implementing Parallelism<br />
* Debugging &amp; Correctness (introduction)<br />
* Tuning<br />
The demonstrations are all shown on he newly launched Intel(r) Parallel Studio 2011 and Visual Studio* 2010.  The examples are based on Intel(r) Threading Buiding Blocks and Microsoft PPL*.   The course is offered in Montreal, Chicago, San Francisco and Seattle.   So don't wait any longer click <a href="http://www.programmers.com/PPI_US/PartnerCenter/partners.aspx?name=Parallelism_Techday" target="_blank">here </a>and register.</p>
<p>If you missed the news about Intel Parallel Studio 2011 then <a href="http://software.intel.com/en-us/intel-parallel-studio-home/" target="_blank">click here </a>to take a look.  Intel Parallel Advisor is a brand new feature to help identify and model opportunities for parallelism.  </p>
<p><a href="http://www.intel.com/sites/sitewide/en_US/tradmarx.htm?iid=ftr+trademark" target="_self">* trademarks</a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/09/15/parallel-programming-with-intel-and-microsoft/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel Studio contest - All the fun without the roadtrip</title>
		<link>http://software.intel.com/en-us/blogs/2010/06/08/parallel-studio-contest-all-the-fun-without-the-roadtrip/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/06/08/parallel-studio-contest-all-the-fun-without-the-roadtrip/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 19:00:39 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/06/08/parallel-studio-contest-all-the-fun-without-the-roadtrip/</guid>
		<description><![CDATA[Every year my parents would load up the car and we would head out on vacation.  Invariably I or my siblings would ask "are we there yet?"  One game we played to pass the time and avoid the incessant "are we there yet?" question was 20 questions.   This summer we are going to bring you [...]]]></description>
			<content:encoded><![CDATA[<p>Every year my parents would load up the car and we would head out on vacation.  Invariably I or my siblings would ask "are we there yet?"  One game we played to pass the time and avoid the incessant "are we there yet?" question was 20 questions.   This summer we are going to bring you 20 questions without the car ride.  So you don't have to ask "are we there yet?"  Beginning next week Monday 14 June we will ask one question per day about Parallel Studio.  You get to answer the question.  After 20 business days we will award prizes to the winners.   Each question will ask about some feature or component of Parallel Studio.   So think of the 20 questions as a guided tour of Intel Parallel Studio and you don't even have to travel!   You will need Intel Parallel Studio - if you don't have it already, don't worry.  Just download an evaluation copy on Monday 14 June the same day the contest begins!  Your free evaluation copy will be good for 30 calendar days which will last through the 20 business days of our contest.   So here is the most important part of this blog -  a link to the contest web page.  Here it is -  <a href="http://software.intel.com/en-us/contests/rock-your-code/codecontest.php">Intel 20 Questions Contest </a>.  </p>
<p>Check out the contest web page for full information about contest details and prizes.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/06/08/parallel-studio-contest-all-the-fun-without-the-roadtrip/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tune Up Your Code For Performance</title>
		<link>http://software.intel.com/en-us/blogs/2010/01/28/tune-up-your-code-for-performance/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/01/28/tune-up-your-code-for-performance/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 00:13:58 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Intel SW Partner Program]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/01/28/tune-up-your-code-for-performance/</guid>
		<description><![CDATA[A couple of weeks ago I wrote an blog about a couple of web seminars about improving the quality of your software - removing memory errors and data races using Intel(r) Parallel Inspector.      Today I want to encourage you to attend an MSDN web seminar about software performance tuning.     Vasanth Tovinkere, one of the software engineers on [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago I wrote an <a href="http://software.intel.com/en-us/forums/showthread.php?t=71201">blog </a>about a couple of web seminars about improving the quality of your software - removing memory errors and data races using Intel(r) Parallel Inspector.      Today I want to encourage you to attend an MSDN web seminar about software performance tuning.    </p>
<p>Vasanth Tovinkere, one of the software engineers on my team who is great at parallelization and software tuning will be conducting this web seminar.  This presentation will be mixed with live demonstrations that show how easy it is to leverage the power of Microsoft Visual Studio and Intel® Parallel Studio to find performance issues due to lock contention in threaded applications. This ensures that shipped applications can take better advantage of the available processors in end user systems. <em>Intel Parallel Studio is an add-in to Visual Studio to help create fast, reliable code that takes advantage of multicore processors.</em></p>
<p>The web seminar is Thursday 18 February 2010 at 9:00 AM PST.  <a href="http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032440361&amp;Culture=en-US">Go here to sign up and register!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/01/28/tune-up-your-code-for-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallelism and threading training</title>
		<link>http://software.intel.com/en-us/blogs/2010/01/19/parallelism-and-threading-training/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/01/19/parallelism-and-threading-training/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 18:52:38 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[intel threading]]></category>
		<category><![CDATA[Multicore Parallel Programming]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[TBB]]></category>
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/01/19/parallelism-and-threading-training/</guid>
		<description><![CDATA[The courses are rolling out.  Windows C++ developers  register now!   Intel is offering a one day course on threading and parallelism!   Last year I taught a pilot class on threading and parallelism.    We had great reviews from those who attended this pilot course.    In the post pilot class survey 100% of attendees said they would recommend the course to their peers.   We [...]]]></description>
			<content:encoded><![CDATA[<p>The courses are rolling out.  Windows C++ developers  <a href="http://www.programmers.com/PPI_US/PartnerCenter/partners.aspx?name=Parallelism_Techday">register now!</a>   Intel is offering a one day course on threading and parallelism!   Last year I taught a <a href="http://software.intel.com/en-us/blogs/2009/06/04/learn-parallelism-and-threading-opportunity-to-attend-pilot-class-for-free/">pilot class </a>on threading and parallelism.    We had great reviews from those who attended this pilot course.    In the post pilot class survey 100% of attendees said they would recommend the course to their peers.   We refined the content and are ready to go.   Course includes the following: <br />
• Introduction<br />
– Why go parallel?<br />
– Parallelization methodology<br />
• Analysis/Design<br />
– Finding opportunities for parallelism<br />
• Introduction of Threads<br />
– Threading environments<br />
– Threading process<br />
• Debug<br />
– Finding parallel bugs<br />
• Tune<br />
– Scalability issues<br />
– Data sharing &amp; locking techniques<br />
This class will introduce parallelism concepts and supplement learning with demonstrations and code samples. Students will also gain exposure to the new Intel® Parallel Studio product. We will have a number of Intel consulting engineers on hand to answer your questions during the breaks.   The classes will run from 9:00 AM to 3:00 PM.</p>
<p>We have more good news for you too!   The marketing team worked with the Intel Software reseller Programmers Paradise to help sponsor the event and so the course will be free to attendees.    There will be a variety of prizes to those who attend.    Events are scheduled for Houston TX, Austin TX, Iselin NJ, New York NY, and Waltham MA in February and March.   Follow the links, register and come!    Keep track as we may add more cities in the future.<a href="http://www.programmers.com/PPI_US/PartnerCenter/partners.aspx?name=Parallelism_Techday"><img class="alignnone size-full wp-image-13438" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2010/01/techdays.bmp" alt="" /></a> </p>
<p><a title="http://click.bsftransmit2.com/ClickThru.aspx?pubids=354|69099|711251&amp;digest=54kU2yNbhL2ioqCMqj6/KA http://click.bsftransmit2.com/ClickThru.aspx?pubids=354%7c69099%7c711251&amp;digest=54kU2yNbhL2ioqCMqj6%2fKA" href="http://click.bsftransmit2.com/ClickThru.aspx?pubids=354%7c69099%7c711251&amp;digest=54kU2yNbhL2ioqCMqj6%2fKA" target="_blank"> </a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/01/19/parallelism-and-threading-training/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Improving Software quality - checking for memory errors and thread data races</title>
		<link>http://software.intel.com/en-us/blogs/2010/01/13/improving-software-quality-checking-for-memory-errors-and-thread-data-races/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/01/13/improving-software-quality-checking-for-memory-errors-and-thread-data-races/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 00:01:06 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[DataParallel]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[TBB]]></category>
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/01/13/improving-software-quality-checking-for-memory-errors-and-thread-data-races/</guid>
		<description><![CDATA[In December I had fun participating in a couple of web seminars about a product for finding common memory errors as well as thread deadlocks and data races:  Intel Parallel Inspector.   Many of you might have missed these so let me tell you about them.  The first web seminar was one I got to present [...]]]></description>
			<content:encoded><![CDATA[<p>In December I had fun participating in a couple of web seminars about a product for finding common memory errors as well as thread deadlocks and data races:  Intel Parallel Inspector.   Many of you might have missed these so let me tell you about them.  The first web seminar was one I got to present at MSDN.    At this ( <a href="//wmbmodigital.microsoft.com/a10125/o9/events/webcasts/1032433591_Str.wmv">click here for streaming link</a>, <a href="http://dlbmodigital.microsoft.com/audio/1032433591.wma">click here for wma link</a>) MSDN web seminar I provide an introduction to Intel Parallel Inspector and demonstrate how to use it on some sample code.   I cover both analysis of memory issues and thread data races.   I introduce the use of suppression filters, but ran out of time before I could cover command line interface.     Watch and send my your comments about what you like about this web seminar or wish I had included.</p>
<p>The second web seminar I mention was also in December.   I hosted this intel web seminar and Matt Dunbar the director of performance technology at Simular presented. This web seminar was part of the Intel Real World Parallelism series.    <a href="https://event.on24.com/event/36/88/3/rt/1/index.html?&amp;eventid=36883&amp;sessionid=1&amp;key=D76A2FD29D7444AEC06765011A2D4953&amp;sourcepage=register">Just follow the link here. </a>Matt Dunbar spoke about the Simulia product and how they use Intel Parallel Inspector in their development process.  A couple of points Matt made are listed below:</p>
<ul>
<li>Their code base has over 1 million lines of code.  They had Intel Parallel Inspector up and running from within Visual Studio and command line interface within one hour.</li>
<li>With threading error analysis in Parallel Studio, it is possible to write simpler component tests and have errors trapped</li>
</ul>
<p>Memory analysis and thread data race detection are complex analysis operations.  Using multiple small component like tests match our recommended usage model and it was great to hear Matt recommend similar ideas.   Matt also talks about their usage in interactive mode and in batchmode.  Just select the <a href="https://event.on24.com/event/36/88/3/rt/1/index.html?&amp;eventid=36883&amp;sessionid=1&amp;key=D76A2FD29D7444AEC06765011A2D4953&amp;sourcepage=register">"Real World Parallelism"</a> tab on the web site.  Check the box " <span class="arrow">How to Use Intel<sup>®</sup> Parallel Studio to Streamline Code Development in a Multicore Environment - presented by SIMULIA." </span></p>
<p><span class="arrow">The Intel web seminar requires free registration.  I hope you find them interesting; if so download an evaluation copy of Intel(r) Parallel Amplifier and try it out.<br />
</span></p>
<div style="width: 1px;height: 1px;overflow: hidden">http://software.intel.com/en-us/</div>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/01/13/improving-software-quality-checking-for-memory-errors-and-thread-data-races/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://dlbmodigital.microsoft.com/audio/1032433591.wma" length="14740225" type="audio/x-ms-wma" />
		</item>
		<item>
		<title>Fun with Locks and Waits - Performance Tuning</title>
		<link>http://software.intel.com/en-us/blogs/2009/12/23/fun-with-locks-and-waits-performance-tuning/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/12/23/fun-with-locks-and-waits-performance-tuning/#comments</comments>
		<pubDate>Wed, 23 Dec 2009 22:44:51 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[TBB]]></category>
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/12/23/fun-with-locks-and-waits-performance-tuning/</guid>
		<description><![CDATA[Intel Parallel Amplifier can help identify which synch objects impact performance the most.   The Locks and Waits analysis pinpoints synch objects that cause threads to wait and lists them in descending order.]]></description>
			<content:encoded><![CDATA[<p>At times threaded software requires some critical sections, mutexes or locks.   Do developers always know which of the objects in their code has the most impact?   If I want to examine my software to minimize the impact or restructure data to eliminate some of these synchronization objects and improve performance, how do I know where I should make changes to get the biggest performance improvement?    Intel Parallel Amplifier can help me determine this.    </p>
<p>Intel Parallel Amplifier provides 3 basic analysis types: hotspots (with call stack data), concurrency and locks  and waits.  The locks and waits analysis highlights which synchronization objects block threads the longest.    It is common for software to have too many or too few synchronization points.   Insufficient synchronization points lead to race conditions and indeterminant results (if you have this problem you need Intel Parallel Inspector, not Intel Parallel Amplifier.  See this MSDN web seminar for more on Parallel Inspector:  <a href="http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&amp;EventID=1032433592&amp;amp%3bEventCategory=5&amp;amp%3bEventCategory=5&amp;amp%3bculture=en-US&amp;amp%3bculture=en-US&amp;amp%3bCountryCode=US%22%3ehttp%3a%2f%2fmsevents.microsoft.com%2fCUI%2fWebCastEventDetails.aspx%3fEventID%3d1032433592&amp;amp%3bCountryCode=US">Got Memory Leaks?</a> ).  If you have too many synchronization objects you want to know which ones if removed would improve performance the most.   Even if all the synchronization objects you have are necessary, you might want to know how much they impact performance. This may help you decide whether to spend time to refactor the software so the synchronization object can be removed.     The Locks and Waits analysis in Parallel Amplifier tracks how much time threads spend waiting on each synch object and reports the times in an ordered table.  The synch objects that cause the most waiting are listed at the top, the rest in declining order.   </p>
<p>Let’s look at a simple example.  One of the commonly used computational exercises to teach parallel programming is a simple program to calculate pi.    The main process thread creates NTHREADS threads and sets them off to execute a routine called calcpi, then waits for them to complete.   The basic algorithm in calcpi is shown below: </p>
<p>47       int my_begin = myid * (ITER/NTHREADS) ;<br />
48       int my_end = (myid +1) * (ITER/NTHREADS) ;<br />
49       step = 1.0 / (double) ITER ;<br />
50       for (int i = my_begin ; i &lt; my_end ; ++i)<br />
51       {<br />
52           x = (i+0.5) * step ;<br />
53           EnterCriticalSection(&amp;protect);<br />
54           pi = pi + 4.0/(1.0 + x*x) * step ;<br />
55           LeaveCriticalSection(&amp;protect) ;<br />
56       }</p>
<p> Each thread takes a section of the for loop to operate on.  Pi is a global variable so when each thread updates pi, a critical section protects its access from any data races.   After creating NTHREADS and assigning them to begin executing function calcpi, the main thread waits for all the threads to complete.    When I run this through the Locks and Waits analysis of Intel Parallel Amplifier I see three synch objects appear in the analysis section. This screenshot is shown below.     I can expand each of these three synch objects and drill down to the source code.   The top synch object where a thread waits the most time is the main thread waiting for all the threads it created to complete.    All this thread does is create threads and wait.    There are no intermediate steps to tune.  We certainly don’t want to exit the program before calculations are done, so let’s keep this synchronization point.    Let’s look down the list for the next opportunity.         </p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/12/screen1.jpg"><img class="alignnone size-full wp-image-13069" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/12/screen1.jpg" alt="" width="960" height="600" /></a> The The second item in the table highlights a critical section.This second synch object that shows significant wait time is the critical section in calcpi.    I can double click and drill down to source and verify that the critical section that causes so much delay is the critical section shown in the code segment above.  In the screenshot, you can also see this was line 53 in my code (I rearranged the display so the creation line column is next to the wait time   You may rearrange column ordering as you like, by dragging each column to the desired position). So let’s look at how I can minimize the number of entrances into a critical section.  The contention is around the access to this global variable pi.   There is no reason that I must accumulate each contribution in the global variable.  I can create partial sums within each thread and combine these partial sums for the final result.   The first thing I do is create a local copy of pi for each thread.  I call this my_pi.  Then each thread calculates my_pi and when the for loop is complete, each thread adds their portion of the calculation into the global variable pi.   For the data collected here, I used a dual core system and 1,200,000 iterations with NTHREADS set to 2.   So for the first algorithm I entered a critical section 1,200,000 times.  For the algorithm below, I enter the critical section only twice.  </p>
<p>50         double my_pi = 0.0 ;<br />
51         for (int i = my_begin ; i &lt; my_end ; ++i)<br />
52        {<br />
53             x = (i+0.5) * step ;<br />
54             my_pi = my_pi + 4.0/(1.0 + x*x) ;<br />
55         }<br />
56         my_pi *= step ;<br />
57         EnterCriticalSection(&amp;protect);<br />
58         pi += my_pi ;<br />
59         LeaveCriticalSection(&amp;protect) ;</p>
<p>  As a further optimization, I pulled out the multiplication-by step out of the for-loop and did the multiplication-by step once per thread rather than once per iteration.   Now when I run this through Parallel Amplifier the screenshot is shown below.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/12/screen2.jpg"><img class="alignnone size-large wp-image-13071" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/12/screen2-1024x573.jpg" alt="" width="1024" height="573" /></a></p>
<p>Once again the top of the table is the thread in main waiting for all the worker threads to complete.      The code runs so much faster the scale of the display changed from seconds to milliseconds.  The amount of time spent waiting is far shorter.       The second entry in this screenshot  (Stream)   maps back to a print statement – remember printf is an implicit barrier and its behavior is tracked just like the calls to critical sections or WaitforMultipleObjects.   </p>
<p>So what to do next?  Get an evaluation copy of Intel Parallel Amplifier and analyze your code.  Check out which synch objects have threads waiting the most.  Then see if you can find some things to optimize.  The difference in runtime between the two algorithms I showed here is about 50x.  This was a simple example, where the worst case performed worse than the sequential algorithm, so changes in performance of this magnitude are not common.  Improved performance by minimizing time spent waiting at barriers is common, though.    So see what you can get.   Now, if I had I begun with Intel Threading  Building Blocks, I would have let the Threading Building Blocks parallel reduce algorithm handle the reduction intelligently and I would have had a very different experience, but that would be a topic for a different blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/12/23/fun-with-locks-and-waits-performance-tuning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software without Memory errors and data races</title>
		<link>http://software.intel.com/en-us/blogs/2009/12/07/software-without-memory-errors-and-data-races/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/12/07/software-without-memory-errors-and-data-races/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 21:50:47 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Intel SW Partner Program]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[TBB]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/12/07/software-without-memory-errors-and-data-races/</guid>
		<description><![CDATA[This year we launched intel Parallel Studio.  Intel Parallel Studio is designed to improve the productivity and help the Visual C++ developer create good threaded software that scales.   It includes components that help express parallelism (Intel Threading Building Blocks), performance libraries, as well as tuning and correctness checking software. Intel Parallel Inspector checks for both [...]]]></description>
			<content:encoded><![CDATA[<p>This year we launched intel Parallel Studio.  Intel Parallel Studio is designed to improve the productivity and help the Visual C++ developer create good threaded software that scales.   It includes components that help express parallelism (Intel Threading Building Blocks), performance libraries, as well as tuning and correctness checking software.</p>
<p>Intel Parallel Inspector checks for both memory issues and threads data races and deadlocks.   This new product uses a new interface and we believe is easy to configure and use.   I will be conducting a web seminar on Tuesday 8 December at 9:00 AM Pacific time.    I will cover both the memory inspection feature as well as the thread inspection aspects of Intel Parallel Amplifier.   The link to register to attend this event is listed <a href="http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032433591&amp;Culture=en-US">here</a>.   I hope you can join me at this MSDN web seminar.</p>
<p>In addition to presenting at the MSDN web seminar I get to host the Intel web seminars.   I hope you get to catch these in the spring and the fall.   This fall we did something new and brought in several ISVs to tell about their experiences threading for Intel multi-core platforms.    This series is called Real World Parallelism.    Last week I also get to host Bernard Laberge, Senior principal Engineer in the video editors division at Avid.    There is one more web seminar in this series on 15 December.   The old seminars are available for playback - just check the Intel software products web pages <a href="https://event.on24.com/event/36/88/3/rt/1/index.html?&amp;eventid=36883&amp;sessionid=1&amp;key=D76A2FD29D7444AEC06765011A2D4953&amp;sourcepage=register">here</a>.  Come join me and Matt Dunbar  on 15 December 15</p>
<p>-David</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/12/07/software-without-memory-errors-and-data-races/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Event Based Sampling - are you ready?</title>
		<link>http://software.intel.com/en-us/blogs/2009/09/30/event-based-sampling-are-you-ready/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/09/30/event-based-sampling-are-you-ready/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 22:56:02 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Core i7 optimization performance analysis]]></category>
		<category><![CDATA[event based sampling]]></category>
		<category><![CDATA[Nehalem]]></category>
		<category><![CDATA[VTune]]></category>
		<category><![CDATA[What If]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/09/30/event-based-sampling-are-you-ready/</guid>
		<description><![CDATA[Event based sampling uses counters on the Intel processors to detect what your software is doing.   This is helpful for tuning and improving software performance.   Typical hotspot analysis shows you where your software spends most of the execution time, but event based sampling allows you to see not just what sections of your application take [...]]]></description>
			<content:encoded><![CDATA[<p>Event based sampling uses counters on the Intel processors to detect what your software is doing.   This is helpful for tuning and improving software performance.   Typical hotspot analysis shows you where your software spends most of the execution time, but event based sampling allows you to see not just what sections of your application take the most time but why and what it is doing.</p>
<p>The events are different for each platform so you need to know what platform you are executing your software on.   We just recently updated our guide to event based tuning on the popular Intel(R) Core(TM) i7 processor based platforms.  The guide is updated to include suggestions for both single socket and dual socket systems.   Check out: <span style="#1f497d;"><span style="small;"> </span><a href="http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/"><span style="small;">http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/</span></a></span></p>
<p><span style="#1f497d;">Many of you following our platforms know that on our new platforms the memory bandwidth measurements can no longer be done on the core processor.    For those who want to measure bandwidth on the Intel(r) Core(TM) i7 processor based platforms we have enabled a method using uncore events.   Check out this link:  <a href="http://software.intel.com/en-us/articles/how-do-i-measure-memory-bandwidth-on-an-intel-core-i7-or-xeon-5500-series-platform-using-intel-vtune-performance-analyzer/">http://software.intel.com/en-us/articles/how-do-i-measure-memory-bandwidth-on-an-intel-core-i7-or-xeon-5500-series-platform-using-intel-vtune-performance-analyzer/</a>  You must have a VTune Performance Analyzer License and you will also need to download the Performance Tuning Utility from whatif.intel.com.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/09/30/event-based-sampling-are-you-ready/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learn parallelism and Threading – opportunity to attend Pilot class for FREE!</title>
		<link>http://software.intel.com/en-us/blogs/2009/06/04/learn-parallelism-and-threading-opportunity-to-attend-pilot-class-for-free/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/06/04/learn-parallelism-and-threading-opportunity-to-attend-pilot-class-for-free/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 15:28:17 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[multi-core]]></category>
		<category><![CDATA[TBB]]></category>
		<category><![CDATA[thread]]></category>
		<category><![CDATA[threading]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/06/04/learn-parallelism-and-threading-opportunity-to-attend-pilot-class-for-free/</guid>
		<description><![CDATA[We created a new one day course on parallelism and threading. This is a great opportunity to learn about threading software for multi-core platforms. This course is targeted for Windows* C++ developers using Microsoft Visual Studio* 2005 or 2008. If you are in that category keep reading! The performance benefits on modern computing platforms will [...]]]></description>
			<content:encoded><![CDATA[<p>We created a new one day course on parallelism and threading. This is a great opportunity to learn about threading software for multi-core platforms. This course is targeted for Windows* C++ developers using Microsoft Visual Studio* 2005 or 2008. If you are in that category keep reading!<br />
The performance benefits on modern computing platforms will come from threading software to take advantage of the many cores that will be available on modern platforms. Learn how to develop software that utilizes many cores in this class! We are building off of our experiences in creating training material for several years and we are taking a fresh look with this new class. We will conduct the pilot class on Friday 17 July at the Intel offices in Santa Clara. Familiarity with threads is helpful, but not required (target is beginning to intermediate experience with threads, experts would not benefit as much from this course). Course outline will be as follows:<br />
• Introduction<br />
– Why go parallel?<br />
– Terminology<br />
– Parallelization methodology<br />
• Analysis/Design<br />
– Finding opportunities for parallelism<br />
• Introduction of Threads<br />
– Threading environments<br />
– Threading process<br />
• Debug<br />
– Finding parallel bugs<br />
• Tune<br />
– Scalability issues<br />
– Data sharing &amp; locking techniques<br />
This class will introduce parallelism concepts and supplement learning with demonstrations and code samples. Students will also gain exposure to the new Intel® Parallel Studio product. We will have a number of Intel consulting engineers on hand to answer your questions during the breaks. I will be down there and look forward to meeting you!<br />
To register for this class send email to itt.support@intel.com. Please include your name, company, email address and phone number. You will receive a confirmation in response providing you with final details. The course will begin at 9:00 AM in Santa Clara California and will run until 4:00 PM. Attendees must complete a survey providing feedback on the course and stay for the entire course. Lunch will be provided. See you there. Go Parallel!</p>
<p>*Other brands and names are the property of their respective owners</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/06/04/learn-parallelism-and-threading-opportunity-to-attend-pilot-class-for-free/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick Intel® Core™ i7 platform tuning</title>
		<link>http://software.intel.com/en-us/blogs/2009/02/23/quick-intel-core-i7-platform-tuning/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/02/23/quick-intel-core-i7-platform-tuning/#comments</comments>
		<pubDate>Mon, 23 Feb 2009 21:46:50 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[i7]]></category>
		<category><![CDATA[Nehalem]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[TBB]]></category>
		<category><![CDATA[VTune]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/02/23/quick-intel-core-i7-platform-tuning/</guid>
		<description><![CDATA[I hope you are all enjoying the new Intel® Core™i7 platforms.  Most people are very pleased with the performance of these new platforms.  I hope all of you software developers are regular VTune Analyzers users too (but I know not all of you are).     One of the great advantages of VTune Analyzer’s event based sampling [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal"><span>I hope you are all enjoying the new Intel® Core™i7 platforms.<span>  </span>Most people are very pleased with the performance of these new platforms.<span>  </span>I hope all of you software developers are regular VTune Analyzers users too (but I know not all of you are).  <span>   </span>One of the great advantages of VTune Analyzer’s event based sampling feature is that it doesn’t just show you where your code spends the most cpu time, it helps you understand why.<span>  </span>When you understand why you can better make changes that will improve performance.<span>   </span>Dr. Levinthal wrote an excellent guide on tuning for our Core™ microarchitecture.<span>   </span>As one of my professors once said<span>, “</span>this new case is exactly the same as the old case, just different.<span>”</span><span>   </span>I won’t say<span> </span>Core i7 processors are the same as Core 2 processors but they share a lot in common.<span>    </span>The event names change and there are new instructions and events.<span>    </span>We have published a new tuning guide specifically for core i7 processors.<span>     </span>Take a look at it:<span>  </span><a href="http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-on-intel-core-i7-processors/"><span>http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-on-intel-core-i7-processors/</span></a>.<span>  </span>For those of you who have been using VTune for a while, this should be a good quick reference for you in getting started on Core i7.<span>  </span>For those of you new to performance tuning using event based sampling, this is a great time to get started.<span>   </span>Read the tuning guide.<span>   </span>Download a<span>n</span> <span><a href="http://www3.intel.com/cd/software/products/asmo-na/eng/download/eval/219690.htm"><span>Intel® VTune™ Performance Analyzer</span></a></span> evaluation and try it out.<span>  </span>Let me know what you discover.</span></p>
<p class="MsoNormal"><span> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/02/23/quick-intel-core-i7-platform-tuning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Correctness and Threading</title>
		<link>http://software.intel.com/en-us/blogs/2008/11/17/correctness-and-threading/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/11/17/correctness-and-threading/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 18:35:44 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[TBB]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/11/17/correctness-and-threading/</guid>
		<description><![CDATA[We have often stated the three main points of parallelism are: Correctness, Scalability, and Maintainability.    We are working to provide better tools to improve all three aspects of software development.  The other month I wrote about Intel Threading Building Blocks which helps provide an abstraction that helps with both maintainability and scalability.    I wonder how [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal"><span>We have often stated the three main points of parallelism are: Correctness, Scalability, and Maintainability.<span>    </span>We are working to provide better tools to improve all three aspects of software development.<span>  </span>The other month I wrote about Intel Threading Building Blocks which helps provide an abstraction that helps with both maintainability and scalability.<span>    </span>I wonder how many of you are familiar with Intel® Thread Checker?<span>    </span>Intel Thread Checker analyzes software to check for common threading errors such as data races.<span>   </span>My team is looking at new formats to present much of the information we have.<span>  </span>In the spring I led<span>  </span>6 one hour webinars on a variety of threading topics.<span>   </span>This fall we are exploring some shorter videos.<span>  </span>For those interested in a short audiovisual presentation on Intel Thread Checker for Linux, follow this link</span><span>: </span><span>  </span><a href="http://software.intel.com/en-us/videos/introduction-to-intel-thread-checker-for-linux"><span>http://software.intel.com/en-us/videos/introduction-to-intel-thread-checker-for-linux</span></a><span><span> and skip this blog. <span>  </span>For those still reading Intel Thread Checker tracks every memory access and thread api to analyze for data races.<span>  </span>If it identifies memory references by multiple threads where at least one thread alters or writes to memory and there are no controls (e.g. mutex, or critical section) to protect the operations it recognizes this as a race condition.<span>  Thread Checker reports all of these race conditions as diagnostics.   </span>Sometimes I am asked what if a race doesn’t happen?<span>  </span>If the data set exercises the code path, then the race happens.<span>   </span>Just because the resolution of the race occurs in the order you expected doesn’t mean the race condition didn’t happen.<span>  </span>The race just didn’t have any ill effects.<span>  </span>Thread Checker doesn’t know which order of the operations you consider correct, it just points out that the operations are not protected correctly.<span>   </span>It is up to you the developer to select the best solution to eliminate the race.<span>   </span>As you would expect Intel Thread Checker is memory and compute intensive.<span>   </span>It helps find many latent thread issues that are difficult to find any other way. <span>   </span>Check out the 12 minute video on Intel Thread Checker for Linux.<span>  </span>Let me know if you like these shorter video segments.<span>  </span>Let me know if you have questions about Intel Thread Checker too.</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/11/17/correctness-and-threading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TBB is part of the concurrency revolution</title>
		<link>http://software.intel.com/en-us/blogs/2008/09/19/tbb-is-part-of-the-concurrency-revolution/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/09/19/tbb-is-part-of-the-concurrency-revolution/#comments</comments>
		<pubDate>Fri, 19 Sep 2008 23:40:31 +0000</pubDate>
		<dc:creator>David Mackay (Intel)</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[TBB]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/09/19/tbb-is-part-of-the-concurrency-revolution/</guid>
		<description><![CDATA[In my first blog that I said I would blog mostly about threading and performance. This time I am writing about TBB or Intel® Threading Building Blocks. But let me begin by discussing the multi-core transition. Several years ago Intel began shifting to multi-core platforms. This was a shift for Intel (you may remember we [...]]]></description>
			<content:encoded><![CDATA[<p>In my first blog that I said I would blog mostly about threading and performance. This time I am writing about TBB or Intel® Threading Building Blocks. But let me begin by discussing the multi-core transition. Several years ago Intel began shifting to multi-core platforms. This was a shift for Intel (you may remember we called it the right hand turn). We knew it would be a shift for developers too. We began improving and adding more to our software development products for threading. Others in the industry wrote about the change. Herb Sutter wrote several articles. I will reference two of them. One was entitled "The Free Lunch is Over." The point is that performance was no longer going to come from increasing processor clock speed and developers have to actively develop differently to gain performance on the new world of multi-core platforms. Yes, the world is changing and in addition to doing everything we were doing before – now we all get to think about concurrency in order to get performance. Some of my colleagues at Intel liked the article quite well and liked to use this as an introduction to conversations about changes in software development. Yes, software developers relied on faster processor clock speeds for performance improvements, but software development is a competitive environment, and developers work very hard. They never felt like they were getting a free lunch or a free ride. Telling them there is more to do now, didn't seem like a value proposition to me.</p>
<p>Does it mean there is more for software developers to do now? Maybe (undoubtedly there are more ways to make mistakes)? Is it a lot more to do or are there different ways to do things? This is why I preferred to refer to another of Herb Sutter's article where he called the changes going on a concurrency revolution (see <a href="http://www.ddj.com/architect/191800187">http://www.ddj.com/architect/191800187</a>). I used to have a copy of the last couple of paragraphs from this article taped to the outside of my cubicle wall (when we moved around it went away and I did not repost it). I quote a couple of sentences from his article: " Yes, this is a wonderful time to be a software engineer. For the rest of this decade and into part of the next, we're going to do for concurrency what we did for objects and garbage collection: ..."</p>
<p>Rather than thinking of working for lunch now, I like to think of being on the forefront of a transition or revolution (Yes, I like parallel programming. That is what I was doing when I joined Intel in 1992 and when I had a chance to do it at Intel again I took it). There was a big shift years ago to object oriented programming. There is a shift now to concurrency. Did the shift to object oriented programming mean more work than procedural programming styles? I don't think so. But if you had a lot of legacy code the transition wasn't free. Likewise, the transition to concurrency allows for more complexity, and can be more difficult. If we don't design well and use the best tools, we can run into paralyzation instead of parallelization. If we design well and use the best techniques we can minimize the complexity.</p>
<p>One of the reasons Intel created the Threading Building Blocks is to provide an abstraction for threading that provides good performance and removes the developer from the tyranny of trying to explicitly create and manage all of the threads. The developer can focus on the parallel tasks and templates and allow the runtime library to manage scheduling and management of the threads. We also wanted something that was evolutionary – something that works in an environment that is familiar to developers without requiring them to learn a new language or use a special compiler. C++ templates fit that perfectly. We wanted this to help drive a change in the way developers think and develop – that is make thinking parallel a natural part of software development – so we released this as an opensource project (threadingbuildingblocks.org).</p>
<p>We have seen great responses from and adoption by many ISVs (e.g. Autodesk Maya <a href="http://softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/Brief_TBB.pdf">http://softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/Brief_TBB.pdf</a>). When my team of engineers visits software developers we sometimes get the response –" that is exactly what I need (parallel pipeline, or concurrent hashmap, or...), Threading Building Blocks will be easier for me than me implementing that by myself, I will just adopt Threading Building Blocks for that". Yes we sometimes meet developers who say, I have already implemented threading with native Windows threads or Posix threads and don't want to change.</p>
<p>One way to look at how the environment is changing is to look at not just how the development community picks up Threading Building Blocks beyond the platforms we developed it on. One of the more recent announcements though was adoption of Threading Building Blocks by Deep Shadows. Now their port was not an academic exercise of will this work well enough to run a benchmark. They wanted to use this in their product (see <a href="http://www.developmag.com/news/30160/Deep-Shadows-talks-multiprocessor-optimisation">http://www.developmag.com/news/30160/Deep-Shadows-talks-multiprocessor-optimisation</a>). One of the platforms they develop for is the Xbox game console. By completing a port of Intel Threading Building Blocks to Xbox they can get the parallel performance they want and maintain a more common code base between Xbox and PC versions of a game. They generously contributed that back to the threading building blocks community. Because of their contribution any Xbox developer can download the threading building blocks source to use threading building blocks on Xbox. This is not the only port that the community has contributed. Others have contributed ports for PowerPC Linux.</p>
<p>This summer we launched Intel® Threading Building Blocks 2.1. This included several improvements to increase performance and to clean things up. Several new items were added in response to feedback we received in the Threading Building Blocks forum – for example parallel_do. We appreciate the response and feedback we receive in the threading building blocks forum and in our face to face visits with software developers.</p>
<p>Herb Sutter wrote in the article I referenced above that "The world awaits." Let me ask what are you waiting for? If you are not threading for performance now, what types of abstractions or development products are you looking for or waiting for? What else belongs in Intel Threading Building Blocks? What do you want in the developer's toolkit that is not available now?</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/09/19/tbb-is-part-of-the-concurrency-revolution/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

