<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogs &#187; Robert Chesebrough (Intel)</title>
	<atom:link href="http://software.intel.com/en-us/blogs/author/robert-chesebrough/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Fri, 25 May 2012 22:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>An Intel hardware based digital random number technology could mitigate recent RSA security flaw</title>
		<link>http://software.intel.com/en-us/blogs/2012/02/22/an-intel-hardware-based-digital-random-number-technology-could-mitigate-recent-rsa-security-flaw/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/02/22/an-intel-hardware-based-digital-random-number-technology-could-mitigate-recent-rsa-security-flaw/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 19:14:53 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Manageability & Security]]></category>
		<category><![CDATA[Bull Mountain]]></category>
		<category><![CDATA[Cryptography]]></category>
		<category><![CDATA[Digitial Random Number Generator]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Randomness]]></category>
		<category><![CDATA[RSA]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/02/22/an-intel-hardware-based-digital-random-number-technology-could-mitigate-recent-rsa-security-flaw/</guid>
		<description><![CDATA[Mathematicians from Europe and the United States are reporting a flaw in the RSA encryption method that apparently hinges on crypto keys being created with insufficient randomness. Enter Intel’s Bull Mountain technology. Bull Mountain is a hardware based digital random number generator which will be released this year when the processor, code named “Ivy Bridge” is launched. Bull Mountain allows digital random numbers to be generated at near clock cycle speeds and with a very high degree of randomness or “entropy”.]]></description>
			<content:encoded><![CDATA[<p>Mathematicians from Europe and the United States are reporting a flaw in the RSA encryption method that apparently hinges on crypto keys being created with insufficient randomness. You can read more about this story in a NY Times article by John Markoff entitled, “<a href="http://www.nytimes.com/2012/02/15/technology/researchers-find-flaw-in-an-online-encryption-method.html?pagewanted=1&amp;_r=2&amp;hp">Flaw Found in an Online Encryption Method</a>” and in an IEEE article by Sam Moore entitled, “<a href="http://spectrum.ieee.org/tech-talk/computing/it/rsa-flaw-found/?utm_source=techalert&amp;utm_medium=email&amp;utm_campaign=021612">RSA Flaw Found</a>”. The researchers submitted their <a href="http://eprint.iacr.org/2012/064.pdf">work for publication </a>at a cryptography conference to be held this coming August, but decided to make their research known last Tuesday because they think the issue is an immediate concern to the crypto community and web server operators. A smallish number (27,000) of cases of flawed crypto keys was discovered out of seven or so million crypto keys tested.</p>
<p>The central issue in the flaw is that secret prime numbers generated to create the crypto keys must be generated randomly. The findings indicate that in some cases the prime numbers were not generated in a random enough way, which lead to crypto keys having prime factors in common.</p>
<p>According to Intel’s Greg Taylor, and George Cox (see <a href="http://spectrum.ieee.org/computing/hardware/behind-intels-new-randomnumber-generator/0">Behind Intel's New Random-Number Generator</a>), researchers have managed to devise pseudo, random-number generators that are considered cryptographically secure. But you must still start them off using a special seed value; otherwise, they'll always generate the same list of numbers. And for that seed, you really want something that's impossible to predict.</p>
<p>Enter Intel’s digital random number technology, code named Bull Mountain. Bull Mountain is a hardware based digital random number generator which will be released this year when the processor, code named “Ivy Bridge” is launched. Bull Mountain allows digital random numbers to be generated at near clock cycle speeds and with a very high degree of randomness or “entropy” as the crypto folks say it. Using such highly random seeds in the cryptographically secure pseudo random-number generators could help allay the concerns raised by this new research into the RSA flaws.</p>
<p>For more information, you can download the Intel® Bull Mountain Software Implementation Guide and code samples <a href="http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/02/22/an-intel-hardware-based-digital-random-number-technology-could-mitigate-recent-rsa-security-flaw/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel Tool Helps SW Developers Develop More Secure Applications</title>
		<link>http://software.intel.com/en-us/blogs/2012/02/07/intel-tool-helps-sw-developers-develop-more-secure-applications/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/02/07/intel-tool-helps-sw-developers-develop-more-secure-applications/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 23:27:40 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Manageability & Security]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Buffer Overflow]]></category>
		<category><![CDATA[Buffer Overrun]]></category>
		<category><![CDATA[Build Security In]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Common Weakness Evaluation]]></category>
		<category><![CDATA[Intel Compiler]]></category>
		<category><![CDATA[Intel® vPro™]]></category>
		<category><![CDATA[Mitigate Secure Bugs]]></category>
		<category><![CDATA[OS Command Injection]]></category>
		<category><![CDATA[owasp top 10]]></category>
		<category><![CDATA[scanf]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[security layer]]></category>
		<category><![CDATA[sprintf]]></category>
		<category><![CDATA[static security analysis]]></category>
		<category><![CDATA[Ultrabook]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/02/07/intel-tool-helps-sw-developers-develop-more-secure-applications/</guid>
		<description><![CDATA[Developers are urged to find these kinds of bugs using tools such as Intel Static Security Analysis, and then make it a practice to validate all inputs and replace unsafe functions (strcpy, strncpy, strcat, and gets, among others)  with safe counterparts.  To learn more about steps you can take as a developer to reduce your exposure to security attacks go to the Department of Homeland Security's Build Security In website or visit the Common Weakness Evaluation site.
]]></description>
			<content:encoded><![CDATA[<p>There has been a steady occurrence of security breaches at prestigious companies over the last weeks, months and years.  These breaches are becoming far too frequent and, as the folks at Amazon and Zappos know, expensive.</p>
<p>A wide variety of ways exist for addressing these kinds of security challenges and Intel offers technologies to assist in the battle.  For probably the most sane and scalable way of addressing security issues, at least for enterprise applications, I would recommend that you jump over to Blake Dournaee's (Intel) blogs <a href="http://software.intel.com/en-us/blogs/2010/11/09/using-a-service-gateway-to-protect-against-the-owasp-top-10/">"Using a Service Gateway to Protect against the OWASP Top 10</a>" and "<a href="http://software.intel.com/en-us/blogs/2011/02/10/how-about-a-security-layer/">How about a Security Layer?"</a>.  The idea of a Security Layer on a Service Gateway is truly the most comprehensive way to tackle these kinds security issues. </p>
<p>Never-the-less, some enterprise shops may be unwilling to re-architect their legacy systems using the security layer approach and some developers are targeting client applications.  What tools and techniques can these developers use to mitigate security bugs?  For those developers I offer the following:</p>
<p> I had a chat with Julian Horn, who is Intel's architect on the compiler team for the Static Security Analysis (SSA) tool.  SSA comes as part of the Intel compiler (C/C++/Fortran) and is available for Linux* and Windows*.  SSA identifies various coding errors such as memory and resource leaks, pointer and array errors, incorrect use of OpenMP* directives, and incorrect use of Cilk Plus language features.  SSA also identifies security errors such as buffer overflows and boundary violations, use of uninitialized variables and objects, incorrect usage of pointers and dynamically allocated memory, dangerous use of unchecked input, arithmetic overflow and divide by zero, and misuse of string, memory, and formatting library routines.</p>
<p>I was curious what kind of security flaws that SSA could find. Specifically, I wanted to know if it could help developers to mitigate any of the most dangerous software errors as identified by the <a href="http://cwe.mitre.org/top25/index.html#CWE-798">Common Weakness Enumeration</a> (CWE) community sponsored by Mitre.</p>
<p>After an email exchange with Julian, and pouring over the descriptions of the top twenty-five security bugs as reported by CWE, I determined that the Intel SSA could help to mitigate at least two of the top five errors listed.  Coming in at error number two is "OS Command Injection", and at number three was "Classic Buffer Overflow". How can SSA mitigate these errors?</p>
<p><strong>Identification and Mitigation of CWE Top Error #2 (OS Command Injection)</strong><strong><br />
</strong>OS Command Injection is an error type that really should be checked on both server and client side applications.  The essence of this error, or potential attack, is that, sometimes, your application is a bridge linking an outsider to the internals of your operating system. If your application simply passes un-trusted inputs to be fed into a command string that you pass to a system call, then your application can inadvertently wreak all kinds of havoc on the system. <em>The recommended mitigation step is to validate all inputs to your application</em>.</p>
<p>A simple minded BAD example, in "C", might be issuing a system call to delete a file that a user types in.</p>
<p><em>          // user inputs a filename to be deleted<br />
</em><em>          scanf (“%s”, str);                        // buffer overflow</em><em><br />
<em>          sprintf (cmd, "del %s", str);   // another buffer overflow</em><br />
<em>          system(cmd);                             // OS command injection, due to not validating the input</em></em></p>
<p>What happens if the user types into the input *.* rather than a normal filename?  Since the input has not been validated and was passed right to the OS<ins datetime="2012-02-06T09:55" cite="mailto:C%20Breshears">,</ins> then clearly deletions unintended by the developer would occur. </p>
<p>Analyzing your code for un-validated input is known as <em>taint analysis – tainted input means un-validated input</em>.  CWE recommends doing a taint analysis to identify where in your code you are not validating input, and then take steps to remove the taint.</p>
<p>Intel's Static Security Analysis tool uses a taint analysis algorithm to detect whether or not an unknown input has been compared against another value.  There are various rules under which taintedness is propagated from one variable to another.  One rule is that when a value is <em>compared </em>against another value this removes the taint.  If an untested value is used in a “dangerous” context, then you get an error reported by SSA.</p>
<p>The logic here is that a <em>comparison </em>is considered sufficient to sanitize the value.  The example below demonstrates the idea of tainted variable, x.  When x is used blindly with no comparisons done on it to check it validity, SSA flags this value as tainted:</p>
<p><em>          x = input;</em><br />
<em>          a[x] = 0;   // SSA identifies use of tainted value x</em></p>
<p>The example below uses a comparison operator to check the input value x, so it is considered untainted now by SSA:</p>
<p><em>          x = input;</em><br />
<em>          ok = (x &lt; 10);     // comparison un-taints the value x</em><br />
<em>          if (ok) a[x] = 0;</em></p>
<p>This "good" example might still have some issues with it, the checking is not extensive, but at least the developer went to some effort to validate the input.</p>
<p>The key take away here is to use tools to find un-validated inputs and then add the necessary validation around each of these inputs.</p>
<p><strong>Identification and Mitigation of CWE Top Error #3 (Classic Buffer Overflow)</strong><strong><br />
</strong>Michael Howard &amp; David LeBlanc, in their book <em>Writing Secure Code</em>, 2nd edition, identify the buffer overflow (AKA buffer overrun) as public enemy number one.  The Common Weakness Enumeration list is kinder, listing this issue as the number three most dangerous error.  It is well known that certain <a href="http://tldp.org/HOWTO/Secure-Programs-HOWTO/dangers-c.html">"C" functions are unsafe</a> because they are vulnerable to buffer overflow attacks. These functions should be replaced with <a href="http://msdn.microsoft.com/en-us/library/bb288454.aspx">safe counterparts</a> : <em>strcpy</em>, <em>strncpy</em>, <em>strcat</em>, and <em>gets</em>, among others. </p>
<p>The buffer overflow terminology comes from the idea that if you continue to pour water into a finite sized container, the container will eventually overflow. In computer terms, the analogy means that copying too much text into a finite sized array<span style="text-decoration: line-through;"><del datetime="2012-02-06T10:10" cite="mailto:C%20Breshears">,</del></span> will cause the extra text in the buffer to spill over into areas of memory that the developer did not intend.  These areas of memory get corrupted with the excess text and malicious coders use this to exploit your application and potentially run malicious code within the confines of your application's process. I found the following buffer overflow example insightful, though I didn't want to copy it here in its entirety and will simply link<ins datetime="2012-02-06T10:10" cite="mailto:C%20Breshears"> </ins>to it instead.  It demonstrates how an overflow attack can occur and is found on an <a href="http://blogs.msdn.com/b/roberthorvick/archive/2004/01/16/59460.aspx">MSDN blog by Robert Horvick</a>.</p>
<p>Other strains of buffer overflow can occur in some types of formatted input.  The biggest issue here is when the "%s" input format is used. This format specifier is generally regarded as unsafe. In the <a href="http://software.intel.com/sites/products/evaluation-guides/docs/intelparallelstudio-evaluationguide-ssa.pdf">Intel Parallel Studio XE evaluation guide on Static Security Analysis (SSA)</a>, there is a nice example of SSA detecting a buffer overflow in a <em>fscanf</em> function. In this case, SSA indicates that it found an "unsafe format specifier<ins datetime="2012-02-06T10:22" cite="mailto:C%20Breshears">,</ins>" which is essentially a condition that can lead to buffer overflow.  The code snippet from this guide is as follows:</p>
<p>          // example that would allow buffer overflow  <br />
<em>          char data[255];</em><em><br />
<em>          fscanf(dfile, "%s", data);</em></em><br />
<em>          if (strcmp(data, string) != 0) {</em><br />
<em>                fprintf(stderr, "parse: Expected %s, got %s \n", string, data);<br />
          }</em></p>
<p>The call to <em>fscanf </em>uses an input descriptor string with a “%s” format specifier. This reads input characters up to the next newline and stores the data in the array “data”. There is no guarantee that the number of characters read will not overflow the bounds of the array, so this statement could corrupt memory.  SSA reported this as an error and the developer should follow up by making code changes using an alternative format specifier such as the "%255s"  to limit the number of characters read in.  The corrected code should be something like this:</p>
<p><em>          // example that corrects the undesired  buffer overflow  condition<br />
</em><em>          char data[255];</em><em><br />
<em>          fscanf(dfile, "%255s", data);</em></em><br />
<em>          if (strcmp(data, string) != 0) {</em><br />
<em>                fprintf(stderr, "parse: Expected %s, got %s \n", string, data);</em><em></em></p>
<p>For similar tips on how to protect your code through defensive programming, read this article by McGraw &amp; Viega, <em><a href="http://www.ibm.com/developerworks/library/s-buffer-defend.html">Make your software behave: Preventing buffer overflows</a>.</em></p>
<p>The key take away here, in addition to validating all  inputs, is to find unsafe “C” functions and format specifiers, and replace them with safe alternatives .</p>
<p><strong>What should a developer do?</strong><br />
The security bugs discussed above are two of the most dangerous and prevalent according to CWE.  These bugs affect client applications that are run on laptops, desktops, ultrabooks, as well as enterprise applications on web servers, application servers, database servers and more.  Developers are urged to find these kinds of bugs using tools such as <a href="http://software.intel.com/sites/products/evaluation-guides/docs/intelparallelstudio-evaluationguide-ssa.pdf">Intel Static Security Analysis</a>, and then make it a practice to validate all inputs and to replace unsafe functions (<em>strcpy</em>, <em>strncpy</em>, <em>strcat</em>, and <em>gets</em>, among others)  with <a href="http://msdn.microsoft.com/en-us/library/bb288454.aspx">safe counterparts</a>.  To learn more about steps you can take as a developer to reduce your exposure to security attacks go to the Department of Homeland Security's <a href="https://buildsecurityin.us-cert.gov/bsi-rules/home/g1/816-BSI.html">Build Security In</a> website or visit the <a href="http://cwe.mitre.org/top25/index.html#CWE-798">Common Weakness Evaluation</a> site.</p>
<p>Oh, and did I mention that enterprise developers  should  jump over to Blake Dournaee's (Intel) blogs <a href="http://software.intel.com/en-us/blogs/2010/11/09/using-a-service-gateway-to-protect-against-the-owasp-top-10/">"Using a Service Gateway to Protect against the OWASP Top 10</a>" and "<a href="http://software.intel.com/en-us/blogs/2011/02/10/how-about-a-security-layer/">How about a Security Layer?"</a> to learn an even better way to secure your systems?</p>
<p>For more complete information about compiler optimizations, see our <a href="http://software.intel.com/en-us/articles/optimization-notice/">Optimization Notice</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/02/07/intel-tool-helps-sw-developers-develop-more-secure-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AES-NI in Laymen&#039;s Terms</title>
		<link>http://software.intel.com/en-us/blogs/2012/01/11/aes-ni-in-laymens-terms/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/01/11/aes-ni-in-laymens-terms/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 03:18:37 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Manageability & Security]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AES]]></category>
		<category><![CDATA[AES-NI]]></category>
		<category><![CDATA[Cryptography]]></category>
		<category><![CDATA[decryption]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[Rijndael]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/01/11/aes-ni-in-laymens-terms/</guid>
		<description><![CDATA[To understand how the AES Rijndael algorithm works I highly recommend that you look at <a href="http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html">Jeff Moser's "A Stick Figure Guide to the Advanced Encryption Standard (AES) - A play in 4 acts"</a>.  This creative stick figure cartoon approach to describing the AES algorithm is one of the best ways I have seen for communicating how AES works.]]></description>
			<content:encoded><![CDATA[<p><strong>What is AES-NI - first answer</strong><br />
AES-NI are a set of six new instructions introduced by Intel when we introduced the new 2010 Intel® Core™ processor family code named Westmere. AES-NI stands for Advanced Encryption Standard - New Instructions. These instructions implement hardware accelerated versions of certain compute intensive steps used in the AES (RijnDael) algorithm.</p>
<p>Okay - so <strong>what is the <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Advanced Encryption Standard</a> (AES)?</strong></p>
<p>AES is a standard that defines how to encrypt plain text using an encryption key. It is implemented with the RijnDael (pronounced Rhine Dahl) algorithm.  One cool thing about AES is that even though this algorithm is completely open for examination, it is possible to encrypt a plain text message with it that is very, very difficult to break.  This is possible because the algorithm takes the plain text message you want to encyrpt, and merges it in a certain way with a secret key.  As long as the key is kept private, the encrypted message has proven to be safe from being broken, at least to this point in time.  So the algorithm is completley known, but as long as the key is protected, messages encoded with it are virtually safe from eves dropping.</p>
<p><strong>So who cares?</strong><br />
So what kind of software developers might use AES? and who might benefit from the new AES-NI? There may be more than you think at first: developers who write code that that use secure socket layer (SSL), database engines, whole disk encryption applications, files compression applications, VoIP, instant messaging, email, virtualization software, electronic payment systems, virtual private networks, and list goes on. To learn more about who might use AES see this <a href="http://en.wikipedia.org/wiki/AES_instruction_set">wiki article on AES instruction set</a> or this article on <a href="http://www.tomshardware.com/reviews/clarkdale-aes-ni-encryption,2538.html">AES-NI analysis on Tom's Hardware</a>.</p>
<p><strong>So how does AES (Rijndael) work?</strong><br />
To understand how the AES (Rijndael) algorithm works I highly recommend that you look at <a href="http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html">Jeff Moser's "A Stick Figure Guide to the Advanced Encryption Standard (AES) - A play in 4 acts"</a>.  This creative, stick figure, cartoon approach is the best method I have seen for communicating how AES works - five stars Mr. Moser!</p>
<p>My stick figure image below is an icon tribute to the excellent efforts of Mr. Moser in laying bare the essense of AES.</p>
<p><a title="AES Stick Figure" href="http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html"><img class="alignnone size-medium wp-image-44135" title="AES_Stick" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/01/AES_Stick-186x300.jpg" alt="AES Stick Figure" width="186" height="300" /></a></p>
<p>Thanks Mr. Moser!</p>
<p><strong>What is AES-NI - second answer</strong><br />
Now consider that the six AES-NI from Intel provide two instructions to accelerate encrypting a round, two instructions for decryping a round, and two more instructions to accelerate the generation of round keys.  In summary, the six new instructions provide a faster way to crunch through the Rijndael algorithm (AES).  Curious to know more? Read more about it in my friend, Jeff Rott's, blog. Jeff wrote an excellent blog on <a href="http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/">Intel® Advanced Encryption Standard Instructions (AES-NI)</a>, in which he introduces the six instructions, describes the benefits, and introduces ways to actually implement these in your code (plus references).</p>
<p><strong>So how can you implement AES-NI in your code?</strong><br />
As long as you are using one of the following compilers (or later) you can get direct access to the instructions:<br />
AES-NI are supported by version 11 of the Intel C/C++ compiler, and also by Microsoft* Visual Studio* 2008 Service Pack 1 and by gcc version 4.4.<br />
You can implement it the hard way using MASM or inline assembly.  Or you can make it easier on yourself and use compiler intrinsics (just be sure to include wmmintrin.h or intrin.h). See <a href="http://software.intel.com/en-us/articles/how-to-compile-for-the-intel-core-i5-processor-with-aes-ni/">Martyn Corden's Post here on Compiling with AES-NI</a>. Another approach is to use a library such as OpenSSL or Intel's IPP to implement AES-NI - Jeff has references ;-)</p>
<p>If you really want to dig in and see the reference and code snippets read Intel's Shay Gueron's in-depth whitepaper called  "Intel® Advanced Encryption Standard (AES) Instructions Set". See <a href="http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set/">Shay's abstract and whitepaper link here</a>.</p>
<p>Finally - if you want a complete understanding of AES, much more than you will find in a Wiki article or blog, then check out the following book. "<a href="http://www.amazon.com/gp/product/3540425802?ie=UTF8&amp;tag=moserware-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=3540425802">The Design of Rijndae</a>l" is the definitive book on the subject, written by the Rijndael creators.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>Optimization Notice</strong><br />
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p>Notice revision #20110804</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/01/11/aes-ni-in-laymens-terms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Taking a look at Intel Anti Theft &amp; Identity Protection Technologies</title>
		<link>http://software.intel.com/en-us/blogs/2011/12/14/taking-a-look-at-intel-anti-theft-identity-protection-technologies/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/12/14/taking-a-look-at-intel-anti-theft-identity-protection-technologies/#comments</comments>
		<pubDate>Thu, 15 Dec 2011 04:53:16 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Manageability & Security]]></category>
		<category><![CDATA[Mobility]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/12/14/taking-a-look-at-intel-anti-theft-identity-protection-technologies/</guid>
		<description><![CDATA[Intel is showcasing Anti-theft &#038; Identity Protection Technologies]]></description>
			<content:encoded><![CDATA[<p>I wanted to start a new blog introducing myself in a new role at Intel.  As part of my new role I will be explaining Intel’s Security, Manageability, and Virtualization features to a broad base of ISV’s  through our scale enabling team and associated platform communities.  </p>
<p>In my new role, I have been learning about many of Intel’s security technologies and am excited about bringing these technologies to light in my blogs, and Intel Software Network TV.</p>
<p>To see why I am excited, and a little daunted with my new tasks, take a look at a couple of clips of Mooly Eden at Intel’s recent Intel Developer Conference.  These technologies are amazing ,…but there is so much ground that they span! The <a href="http://intelstudios.edgesuite.net/idf/2011/sf/keynote/110914_me/110914_me_fl/index.htm">first clip showcases Intel’s Anti Theft technologies</a> (starting at 31:10 mark and ending at 34:57).  Here Mooly invited McAfee co president, Todd Gebhart, to the stage to discuss McAfee’s Anti Theft which allows a user to lock their laptop or even wipe their data by issuing a poison pill in the event that their laptop is stolen.  Then Mooly introduced a new technology called Intel Identity Protection Technology (IPT) . To showcase this technology, Mooly had a hacker, garbed in a ninja costume, attempt to use a key logger and frame grabbing software to attempt to hack, demo presenter, Mark’s bank transaction.  In this amazing clip, the hacker successfully grabs the username and password to Mark’s bank account, using a nefarious keylogger. BUT – the hacker cannot capture or generate a third authentication token which has been set up between Mark &#038; his bank. The hacker is thwarted from any mischievous activity by IPT. Using this IPT technology, a random layout pin pad is generated and displayed to Mark, which allows Mark to send an additional credential to the bank in order to authenticate the transaction.  Mark’s bank account is safe! <a href="http://intelstudios.edgesuite.net/idf/2011/sf/keynote/110914_me/110914_me_fl/index.htm"> See this part of the clip at 35:08 to 38:44</a>. If you want to learn even more, see <a href="http://www.google.com/url?sa=t&#038;rct=j&#038;q=kerberos%20intel%20smith&#038;source=web&#038;cd=1&#038;ved=0CCEQFjAA&#038;url=http%3A%2F%2Fwww.kerberos.org%2Fevents%2F2011conf-interop%2F2011slides%2F2011kerberos_ned_smith.pdf&#038;ei=GSTpTtP6IuKdiALhhYwl&#038;usg=AFQjCNHlnZWnogG5nEMUUKtHLlnPK4">Intel’s Ned Smith’s IPT foils </a>at 2011 Kerberos conference.</p>
<p>I plan to be interviewing experts from various corners of Intel to help describe these technologies in more detail. We will also be working to bring API’s to light with Software Developer Guides, tech briefs, and whitepapers, videos and more.</p>
<p>I also hope to keep one eye on new developments in the security space in the rest of industry to help articulate security, virtualization and manageability trends that I see developing.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/12/14/taking-a-look-at-intel-anti-theft-identity-protection-technologies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Last 25 Years of Parallel Computing</title>
		<link>http://software.intel.com/en-us/blogs/2011/06/09/the-last-25-years-of-parallel-computing/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/06/09/the-last-25-years-of-parallel-computing/#comments</comments>
		<pubDate>Thu, 09 Jun 2011 22:55:05 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[parallelism]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/06/09/the-last-25-years-of-parallel-computing/</guid>
		<description><![CDATA[I am back from a very interesting 25th year anniversary of the IPDPS conference in Anchorage Alaska. I was able to interact with a number of professors and bounce ideas around on parallel education. To learn more about what occurred at IPDPS take a look at Lauren Dankiewicz' blog where she lays out the conference [...]]]></description>
			<content:encoded><![CDATA[<p>I am back from a very interesting 25th year anniversary of the IPDPS conference in Anchorage Alaska.  I was able to interact with a number of professors and bounce ideas around on parallel education.  </p>
<p>To learn more about what occurred at IPDPS take a look at <a href="http://software.intel.com/en-us/blogs/2011/05/31/tech-talks-from-ieee-international-parallel-and-distributed-processing-symposium-ipdps-2011/">Lauren Dankiewicz' blog</a> where she lays out the conference processing with links to video coverage of various keynotes and panels.</p>
<p>I listened attentively to the panel which looked back on 25 years in parallel and distributed computing.  I cannot adequately summarize each panelists view but I have included a link to the broadcast so you can view &#038; hear these opinions first hand.  The panel consisted of:<br />
Moderator: Yves Robert, Ecole Normale Supérieure de Lyon, France<br />
Panelists:<br />
William (Bill) Dally, Stanford &#038; NVIDIA<br />
Jack Dongarra, University of Tennessee &#038; Oak Ridge National Laboratory<br />
Satoshi Matsuoka, Tokyo Institute of Technology, Japan<br />
Rob Schreiber, HP Labs, Palo Alto, CA<br />
Arnold Rosenberg from University of Massachusetts, Amherst<br />
Uzi Vishkin, University of Maryland</p>
<p>Speakers were asked to address what went right, what went wrong, what were the striking events and the big surprises which have arisen in the past 25 years. My thoughts about this panel session are included below and I encourage readers to watch the <a href="http://techtalks.tv/talks/54106/">video for this panel discussion </a>to formulate their own take-aways. </p>
<p>Some of the positive vibes from this panel included how the last 25 years have seen impressive gains in performance as a result of parallelism. LINPAC numbers have improved from 70 -80 flops on a single 1980’s vintage 6800 based processor on LINPAC 30 years ago, to 1.2 PetaFlops today - a 150 trillion fold increase.  Advances during these same years have demonstrated the value of parallel computing to an audience much wider than the typical IPDPS audience and the need for parallel education is being noted by key panelists at the conference.</p>
<p>Rob Schreiber, a mathematician, gave a list of necessary ideas that historically had to be tested but which he contends were bad ideas from the past. Among these ideas were <a href="http://en.wikipedia.org/wiki/Amdahl's_law">Amdahl's law</a>, <a href="http://home.wlu.edu/~whaleyt/classes/parallel/topics/Gustafson.html">"weak scaling" AKA Gustafson-Baris Law</a>, Automatic Parallelization via compilers, High Performance Fortran, RISC &#038; VLIW architectures, external accelerators (GPGPU). </p>
<p>I disagree with Rob's contention that “weak scaling” supported by Gustafson-Baris was a bad idea.  I think history has proven Gustafson-Baris right - the clear trend in computing has been to make simulations more real, more detailed, solve larger more complicated problems. Yes we all strive hard to do this in real time as well.  Bill Dally from Stanford also took exception with "weak scaling". Dally said that people don't want a bigger version of "angry birds", the popular Smartphone app. Since most of our personal compute devices have many cores now days, he cited an example of a phone that has 70-80 cores (depending on how you define a core), he contends that we need to tackle “strong scaling”, making existing applications faster to make parallelism useful to users. His contention is that strong scaling is the big challenge going forward and he argues that we cannot continue to rest on weak scaling to come to our rescue going forward.  Again, I disagree with him on this point. I think the ability to do new things, solve harder problems, to make apps behave in ever more real ways with greater resolution has been a clear historical trend and ultimately is what end users care about.  My counter to Bill's "Angry Birds" argument is that I believe people don't care if MS Word runs any faster, but they do care about access to detailed medical imaging that might prevent invasive surgical procedures.  </p>
<p>Bill did make what I thought was a excellent point: most parallel systems today are really serial or nearly serial processors bolted together through their IO channels and as a result communication has microsecond level latency that programmers have to contend with.  This makes the programming parallel systems much harder because programmers now have four parallel programming challenges to deal with rather than simply three.  The "easy" three are parallelism, locality, load balance.  The tough challenge is programming around differing latency issues that are man-made artifacts.  He pointed to examples of his own work at MIT years ago that demonstrated that short latency communication is possible at an architectural level (see his J Machine work).</p>
<p>I also agree with Dally’s premise that academia, even at his own institution at Stanford has not focused appropriately on parallel computing and the state has been "appalling". He argued that a single course on parallel computing was insufficient and cited an example of a typical algorithms course which still taught complexity theory as if Floating Operations were important! He says FLOPs are NOT important! What matters is data movement. This implies there is much to be done to revamp today's algorithms courses to make them useful for real applications on real architectures.</p>
<p>Rob's inclusion of external accelerators (think GPGPU) as a bad idea from the past may need some explanation. His argument rests on how difficult they have been to program in the past - his term is PITB - to program.  He said they have historically always been faster than the CPU's of the day but have also been PITB to use. He did say, however, that when these accelerators have been integrated into the main CPU they became easier to program – case in point floating point accelerators.</p>
<p>The past 25 or so years in computing have been an exciting time and now new and lofty challenges are cropping up, such as the ever increasing impact of power consumption, how to minimize data movement and improve communication speed and lower latency, among others.  Stay tuned for my review of the panelist views of the next 25 years.</p>
<p>Bob C </p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/06/09/the-last-25-years-of-parallel-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using OpenMP to Parallelize a Game</title>
		<link>http://software.intel.com/en-us/blogs/2010/06/02/using-openmp-to-parallelize-a-game/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/06/02/using-openmp-to-parallelize-a-game/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 18:53:16 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Event Handlers]]></category>
		<category><![CDATA[Game Loop]]></category>
		<category><![CDATA[games]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[Parallelize]]></category>
		<category><![CDATA[Tasks]]></category>
		<category><![CDATA[threading]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/06/02/using-openmp-to-parallelize-a-game/</guid>
		<description><![CDATA[Turns out it was relatively easy to parallelize this game demo using OpenMP tasks with some signal handling.  ]]></description>
			<content:encoded><![CDATA[<p>Several weeks ago I did an unofficial google survey to see if there was much information on the topic of using OpenmP for games.  There were not that many posts on the subject and most were rather negative on using OpenMP for games – and the general consensus was that OpenMP is fine for data decomposition of for loops only and since this does not apply to the main game loop this relegated OpenMP to a minor role for developing parallel games.</p>
<p> I had not found any examples in web search of anyone using OpenMP to parallelize a game.  Turns out it was relatively easy to parallelize this game demo using OpenMP tasks with some signal handling.  The advantage is of using OpenMP , and my reason for wanting to give it a try, is the ability to incrementally parallelize a portion of the code and if you run into problems at run time you can comment a single line of code or not compile with the /OpenMP switch – this gets the developer back to the serial baseline, whenever this is needed.   This lets a developer test relatively quickly parallel strategies in code.  For example, I was able to test the effect of adding the Physics-&gt;update, AI-&gt;Update etc to the list of parallel tasks – this lead to race conditions so I commented out the two OpenMP pragmas and went about my subsequent investigations. But this is getting ahead of my story…</p>
<p>Undaunted at the lack of examples, I decided to try my hand at using OpenMP Tasks to try to parallelize the DestroyTheCastle Game demo that Intel engineers put together for GDC several years ago.  At first I stumbled around trying to figure out just how to parallelize it. I decided to take an approach similar to the way the Intel engineers has previously used Windows QueueUserWorkItem in the original game demo to queue up tasks using event handlers to coordinate when stages of work were completed and thus ready for rendering.</p>
<p>This approach essentially makes tasks out of the Physics, AI, and Particle calculates.  In the serial baseline of the code, these “tasks” are done one after the other in sequential order.  By using OpenMP tasks I was able to do these three tasks concurrently and achieve a nice speedup.</p>
<p>I was not all that sure OpenMP was even a sensible choice for parallelizing a game because games are inherently event driven with event handlers all over the place and so they feel rather unstructured.  OpenMP requires structured block of code to parallelize.  A parallel region must have only on entrance and one exit possible to be a valid parallel region and the compiler will complain otherwise.  So how do you make such an event based application “structured” for the purposes of OpenMP?  In this case I created the parallel region inside the WinMain routine which is found in ParallelDemo.cpp. The OpenMP parallel region starts near the top of WinMain and ends near the bottom of WinMain.  I then immediately placed a single region within the parallel region so that only a single thread really operated WinMain.   Well, you ask, how does that help?  Remember that the parallel region sets up a pool of threads, the single construct is a worksharing construct that allows only one thread operate the enclosing single region.  So what I have done by using this trick is really just create a pool of threads for me to use at other locations in the code, but meanwhile I did not affect the behavior of the WinMain sequence of instructions.</p>
<p>After building the above mods to with OpenMP and running the code to verify that the app still behaved similarly to the serial baseline, I went on to place OpenMP tasks at strategic points in the code.  The Version of ParallelDemo.cpp that contains the parallel/serial region trick is called ParallelDemoOpenMPSolutionActivity1.cpp. </p>
<p>After adding OMP Task directives around calls to the Physics, AI, and Particle tick methods I realized that I needed a way to synchronize when the tasks were compete and thus ready for rendering.  I decided to use OMP Taskwait clauses at strategic locations in the code to ensure that all tasks were complete prior to rendering.  Turns out, due to the event based nature of the game, I had to sprinkle OMP Taskwait clauses at all points where the code could exit.  So I had to include taskwaits in the various event handlers such as MsgProc, KeyboardProc, etc.  Failing to do this caused the app to crash when doing such things as bringing up menus, switching in and out of multi-threaded mode, etc.</p>
<p>I saved this level of mods to ParallelDemoOpenMPSolutionActivity2.cpp.  It represents the fully OpenMP parallelized version of the code.  Unfortunately, I did not see any speed up at this point.  I had to go back and do a little tuning, mainly ridding myself of one call to taskwait in OnFrameMove and replacing it with some event handling routines built with RestEvent,SetEvent and WaitForSingleObject.  A little playing around with Intel® Parallel Studio Amplifier helped me identify that I was over synchronized and I decided to try the synchronization scheme used with the Windows threading QueueUserWorkItem approach, where some event handlers were used to indicate when tasks were completed. </p>
<p>In the new approach, saved as ParallelDemoOpenMPSolutionActivity3.cpp, I created an array of event handlers named s_hTickDoneEvent.  There is an array element for each event I care about, Physics, AI, and Particles.  In this new approach, the event handlers are used to keep track of the state of each task.  The event for Physics is reset prior to executing the Physics-&gt;tick task and the event is set when the task is complete.  AI-&gt;tick and Particles-&gt;tick is handled in a similar fashion, each having event handlers reset prior to tick, the set immediately afterwards.</p>
<p>It turns out that a big performance gain is achieved by removing only one omp taskwait – the one at the top of FrameMove. Replacing it with calls to WaitForSingleObject, which waits for the associated event handler to become set, gave a nice sppedup. Due diligence then required in order to create and destroy the events and to reset them and set them around each of my three tasks.  When these steps are complete the code shows a decent speedup.</p>
<p>To evaluate my performance, I ran the demo loop by pressing the “B” key in the demo.  All frame rates were measured when the graphics were set to the maximum dimensions on my 22 inch screen. I recorded the minimum framerate and maximum frame rate but from a user experience perspective, minimum frame rates were what really made the demo feel fast or slow so I will restrict my metrics here to just minimum frame rates. My rough performance results on my Intel® Core™2 Quad CPU Q6600 @2.40GHz running Windows XP professional &amp; using DirectX 9.0c are as follows:</p>
<p>11 FPS:    Serial baseline minimum frame rate<br />
12 FPS:    OpenMP Tasks &amp; Taskwaits<br />
60 FPS:    OMP Task w Event signals<br />
99 FPS:    Windows QueueUserWorkItem with Signal Handling</p>
<p>So while the original QueueUserWorkItem with Signal handling approach was the fastest threading methodology, the OMP Tasks &amp; Taskswaits modified with a bit of signal handling was not far behind.  The advantage of the OMP method however, is that I could easily get back to my serial base version by just commenting out the #pragma omp statements or else just not compiling with the OpenMP switch.  The other advantage was that I was able to quickly try other parallel strategies quickly.  I was able to see the effect of adding Physics-&gt;Update &amp; AI-&gt;Update  to the task list.  It will also afford me the ability to use data decomposition using OMP For at some point in future investigations.</p>
<p>A short powerpoint gives an overview of the Destroy The Caste demo and the OpenMP parallelism approach.  This presentation as well as all the code and solution &amp; project files for this example is accessible at <a title="http://software.intel.com/en-us/courseware/course/category.php?id=37" href="http://software.intel.com/en-us/courseware/course/category.php?id=37" target="_blank">http://software.intel.com/en-us/courseware/course/category.php?id=37</a>.  Download it and see I there are other enhancements you can make to this code and if you use this in your class please post to my blog to let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/06/02/using-openmp-to-parallelize-a-game/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Five role playing exercises to introduce parallelism concepts</title>
		<link>http://software.intel.com/en-us/blogs/2009/11/05/five-role-playing-exercises-to-introduce-parallelism-concepts/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/11/05/five-role-playing-exercises-to-introduce-parallelism-concepts/#comments</comments>
		<pubDate>Thu, 05 Nov 2009 19:12:40 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[Critical Section]]></category>
		<category><![CDATA[Domain Decomposition]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Race Condition]]></category>
		<category><![CDATA[Role Playing]]></category>
		<category><![CDATA[Task Decomposition]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/11/05/five-role-playing-exercises-to-introduce-parallelism-concepts/</guid>
		<description><![CDATA[Since the kickoff of the High School Parallelism bootcamp this summer, I've received several requests for a write up of the five role playing activities we used. The activities put students in the place of procesor cores and had them perform tasks in parallel. These activities proved to be popular among many of the students [...]]]></description>
			<content:encoded><![CDATA[<p>Since the kickoff of the High School Parallelism bootcamp this summer, I've received several requests for a write up of the five role playing activities we used.  </p>
<p>The activities put students in the place of procesor cores and had them perform tasks in parallel. These activities proved to be popular among many of the students at the camp, however, some of the more advanced students did express that they felt the exercsies could seem childish.</p>
<p>My personal observation is that these exercises laid an excellent foundation that was built upon later with actual computer lab activites using OpenMP and Threading Building Blocks.</p>
<p>The activities are best done in groups of 4 or 5 individuals but even two in a single group can.</p>
<p>Without further ado - here is my promised write up:</p>
<h1>
<p><strong>Thinking Parallel</strong></h1>
<p>
These role playing activities are designed to get you start thinking in parallel. They will also expose you to some terminology &amp; concepts that we will use throughout the rest of the boot-camp. Many of these exercises were inspired from  Chapter two if James Reinders’ book “Intel Threading Building Blocks”.</p>
<p><strong>Objectives</strong><br />
The objectives for the following five exercises are to:<br />
1)	Explore domain &amp; task decomposition using a mailer example.<br />
2)	Learn about race conditions and two ways to mitigate them.<br />
3)	Learn what a critical section does<br />
4)	How a reduction can eliminate a race condition.</p>
<p>
<h1>Activity 1 – Explore Domain Decomposition</h1>
<p>
In this activity, each member of your team or group will play the role of a processor core, or more accurately, the role of a thread executing code on a core. Your team will explore how to use domain decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping &amp; mailing multiple envelopes.</p>
<p><strong>Time Required</strong><br />
fifteen minutes</p>
<p><strong>Objective</strong><br />
Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism<br />
through domain decomposition.</p>
<p><strong>Materials</strong><br />
1) 16 envelopes per table of 4 to 5 “processors”<br />
2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)<br />
3) Pens or pencils</p>
<p><strong>Setup</strong><br />
1) Divvy up the 16 envelopes &amp; colored post-its so that each process gets a roughly equal share<br />
2) Each “processor” (think student) must read over and become familiar with the “code” or instructions she is to execute so that the “code” is committed to memory.<br />
Here is your “code” of instructions – For Domain decomposition – each processor does the same task but uses different data (represented here by different addresses)<br />
a) Fold scrap paper<br />
b) Stuff paper into envelop<br />
c) Pretend to seal envelope (so we can re-use them later)<br />
d) Address an “address label” and fix it to the middle of the envelope<br />
You may use these fictitious addresses if you cannot come up with 16 of your own:</p>
<table class="MsoTableGrid" border="1" cellspacing="0" cellpadding="0" style='.5pt solid windowtext'>
<tr style='yes'>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>1600 Daily Planet Ave</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Metropolis</span><span style='black'>, NY, 12345</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Gotham</span><span style='black'> City College</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Gotham</span><span style='black'>, IL, 60506</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="156" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span class="SpellE"><span style='black'>Kwikspell</span></span><span style='black'> University</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Briton, UK</span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="139" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Know-It-All university</span></p>
<p class="MsoNormal" style='none'><span style='black'>Bullwinkle</span><span style='black'>, AK</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
</tr>
<tr style='1'>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Waco</span><span style='black'> University</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Waco</span><span style='black'>, Texas, 12987</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Acme <span class="SpellE">Looniverisity</span></span></p>
<p class="MsoNormal" style='none'><span class="SpellE"><span style='black'>Toonville</span></span><span style='black'>, CA, 10023</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="156" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span class="SpellE"><span style='black'>Bronto</span></span><span style='black'> Crane Academy</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Rock Vegas, NV, 010101</span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="139" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Ferris</span><span style='black'> <span class="SpellE">Bueller</span><br />
   School</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Burbank</span><span style='black'>, CA</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
</tr>
<tr style='2'>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span lang="PT-BR" style='PT-BR'>CXVI Caesar Av.</span></p>
<p class="MsoNormal" style='none'><span lang="PT-BR" style='PT-BR'>Rome, Italy, XLIV BC</span></p>
<p class="MsoNormal" style='none'><span lang="PT-BR" style='PT-BR'>&nbsp;</span></p>
</td>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Central</span><span style='black'> High School</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Central</span><span style='black'>, IN, 17766</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="156" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span class="SpellE"><span style='#333333'>Bullworth</span></span><span style='#333333'> Academy</span><span style='#333333'></span></p>
<p class="MsoNormal" style='none'><span style='#333333'>Missions Blvd, Bermuda</span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="139" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Boston</span><span style='black'> Bay College</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span class="SpellE"><span style='black'>Dawsons</span></span><span style='black'> Creek</span><span style='black'>, IL, 10982</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
</tr>
<tr style='yes'>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>1600+1/2 Pennsylvania<br />
    Avenue NW</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Washington</span><span style='black'>, DC 20500</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="148" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Bedrock</span><span style='black'> High School</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Little Rock</span><span style='black'>, AR, 56005</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="156" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span class="SpellE"><span style='black'>Watsamata</span></span><span style='black'> University</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>Rocky Road, AK</span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
<td width="139" valign="top" style='0in 5.4pt 0in 5.4pt'>
<p class="MsoNormal" style='none'><span style='black'>Hill</span><span style='black'> Valley</span><span style='black'> Schoolhouse</span></p>
<p class="MsoNormal" style='none'><span style='black'>Hill Valley</span><span style='black'>, CA, 33773</span><span style='black'></span></p>
<p class="MsoNormal" style='none'><span style='black'>&nbsp;</span></p>
</td>
</tr>
</table>
<p>e) Fix another post-it (representing a stamp) to the stamp area of the envelope<br />
f) Place envelope in table area designated “the mail box”</p>
<p>Monitor Time for completion of mailer exercise<br />
1) Have someone on the team write down the time just prior to saying go<br />
2) Each “processor” should complete his assigned set of envelopes as quickly as possible.<br />
3) Record the time to complete the assigned mailer job. Time: _____________________<br />
4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?</p>
<p>
<h1>Activity 2 – Explore Task Decomposition</h1>
<p>
In this activity, your team will explore how to use a task decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping &amp; mailing multiple envelopes.</p>
<p><strong>Time Required </strong><br />
fifteen minutes</p>
<p><strong>Objective</strong><br />
Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism<br />
through task decomposition.</p>
<p><strong>Materials</strong><br />
1) 16 envelopes per table of 4 to 5 “processors”<br />
2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)<br />
3) Pens or pencils</p>
<p><strong>Setup</strong><br />
1) Divvy up the 16 envelopes &amp; colored post-its so that each process gets a roughly equal share<br />
a) Each “processor” will agree with team ahead of time which task or tasks he/she is committed<br />
to accomplishing. Perhaps, one processor is assigned the tasks of Folding, Stuffing, Pretend<br />
Sealing all the envelopes.<br />
b) Another processor, or perhaps even two, are assigned the task of addressing the envelopes<br />
Use same addresses as before<br />
c) Another processor is assigned the role of stamping and mailing the envelopes</p>
<p>Monitor Time for completion of mailer exercise<br />
1) Have someone on the team write down the time just prior to saying go<br />
2) Each “processor” should complete his assigned tasks as quickly as possible.<br />
3) Record the time to complete the assigned mailer job. Time: _____________________<br />
4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?</p>
<p>
<h1>Activity 3 –Vector Addition exposes race conditions</h1>
<p>
In this activity, your team will explore how to use a domain decomposition to accomplish the job of (mis)adding a set of numbers which we will call a vector. The activity exposes the problem that occurs when writes to a shared memory variable are not protected by a synchronization construct.</p>
<p><strong>Time Required </strong><br />
fifteen minutes</p>
<p><strong>Objective</strong><br />
Use some index cards, and pencils to explore the concept of a race condition.</p>
<p><strong>Materials</strong><br />
1)	16 numbered index cards.<br />
2)	One index card labeled “Shared Sum”<br />
3)	16 extra index cards are labeled “local memory”<br />
4)	Pencils</p>
<p><strong>Setup</strong><br />
1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”<br />
2) One extra index card is labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors<br />
3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.</p>
<p><strong>Execution</strong><br />
1) This is a domain decomposition exercise, where each processor “reads” the value of the “shared sum” and writes a new value to this shared sum – ignoring all the other processors previously written values – this is called a race condition.<br />
Each processor should do these steps as quickly as possible:<br />
   a) “read” the value of the “shared sum” and writes that number on your own scratch pad.<br />
   b) add one of your “vector” card’s values to the sum on your scratch pad.<br />
   c) Immediately cross off the current value on the “shared sum” card and write your own value on the card (probably stomping over someone else’s value).<br />
   d) Repeat steps 5a – 5c until you are out of index cards<br />
2) Compare the final value written on the “shared sum” with the known total (46)<br />
3) Did your team compute the correct grand total for the vector sum?</p>
<p>
<h1>Activity 4 –Vector Addition fixed with critical section</h1>
<p>
In this activity, your team will explore how a critical section can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also expose the performance penalty that can be taken as a result of critical sections.</p>
<p><strong>Time Required </strong><br />
Fifteen minutes</p>
<p><strong>Objective </strong><br />
Use some index cards, a magic marker &amp; pencils to explore the concept of a critical section</p>
<p><strong>Materials</strong><br />
1)	16 numbered index cards (we call it our vector)<br />
2)	1 index card labeled “Shared Sum”<br />
3)	16 extra index cards are labeled “local memory”<br />
4)	Magic marker<br />
5)	Pencils</p>
<p><strong>Setup</strong><br />
1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”<br />
2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.<br />
3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.<br />
4) A Magic maker, known as “critical section”, is placed on the table next to the “shared sum”</p>
<p><strong>Execution</strong><br />
1) We are now trying to synchronize access to shared memory by implementing a critical section. Our goal is to get rid of the race condition we encountered earlier.<br />
2) New rule: “Each processor can only use the magic marker, named critical section, to write values to the “shared sum” index card. And – No processor can do any computations without first acquiring the magic marker, called critical section. As soon as a processor writes a new value to  “shared sum” the processor should expeditiously return the capped critical section to the middle of the table”.<br />
3) Each processor should do these steps as quickly as possible:<br />
   a) Acquire the critical section! If you failed to acquire the critical section you must wait for the marker to be placed back in the middle of the table.<br />
   b) Immediately cross off the current value on the “shared sum” card<br />
   c) Add the value of one of your index cards to the “shared sum” use a scratch pad if needed<br />
   d) Write the new value for sum on the globally shared “shared sum” card<br />
   e) Return the cap to the marker<br />
   f) Return the marker to the middle of the table<br />
   g) Repeat steps 5a(i) – 5a(v) until you are out of index cards<br />
4) Compare the final value written on the “shared sum” with the known total (46)<br />
5) Did your team compute the correct grand total for the vector sum?<br />
6) What did you observe about how much time you spent idling versus time spent writing or calculating?</p>
<p>
<h1>Activity 5 – Vector Addition fixed with reduction</h1>
<p>
In this activity, your team will explore how a reduction (or a partial sums approach) can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also demonstrate the benefit of replacing a critical section with a reduction wherever possible.</p>
<p><strong>Time Required </strong><br />
Fifteen minutes</p>
<p><strong>Objective </strong><br />
Use some index cards, &amp; pencils to explore the concept of a critical section</p>
<p><strong>Materials</strong><br />
1)	16 numbered index cards (we call it our vector)<br />
2)	1 index card labeled “Shared Sum”<br />
3)	4 index card labeled “Partial Sum”<br />
4)	16 extra index cards are labeled “local memory”<br />
5)	Pencils</p>
<p><strong>Setup</strong><br />
1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”<br />
2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.<br />
3) The remaining “Partial sum” cards are divvied up among the processors<br />
4) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.</p>
<p><strong>Execution</strong><br />
We are now trying to synchronize access to shared memory by implementing a reduction – which amounts to a collection of partial sums computed by each processor, followed by a grand total computed by a master thread. Our goal is to get rid of the race condition we encountered earlier, and do the parallel tasks in a more efficient manner.</p>
<p>Every processor will add up his/her own partial sum that represents the total of all the vector<br />
elements assigned to him/her.</p>
<p>One processor will also be named to execute the “master thread” that adds up all the partial sum cards to create a grand total , that the master thread writes this grand total to the “shared sum” card.</p>
<p>1) Each processor should:<br />
   a) Add the values from all their assigned vector cards<br />
   b) Write the value to a “partial Sum” card<br />
   c) Give the “Partial Sum” card to the Processor who will execute the master thread<br />
2) The Master Thread only should:<br />
   a) Compute his own partial sum<br />
   b) Wait for all other partial sums to arrive<br />
   c) Compute the grand total for all partial sums<br />
   d) Write the grand total on the “Shared Sum” card</p>
<p>3) Did your team compute the correct grand total for the vector sum?<br />
What did you observe about how much time you spent idling versus time spent writing or calculating?</p>
<p>4) How effective was the reduction strategy at computing the sum in parallel versus the other methods tried?</p>
<p><strong>Review Questions</strong></p>
<p>Question 1: Describe a race condition<br />
Question 2: What does a critical section do?</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/11/05/five-role-playing-exercises-to-introduce-parallelism-concepts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Talented kids &amp; multi-core: Adding fuel to the mix</title>
		<link>http://software.intel.com/en-us/blogs/2009/07/17/talented-kids-amp-multi-core-adding-fuel-to-the-mix/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/07/17/talented-kids-amp-multi-core-adding-fuel-to-the-mix/#comments</comments>
		<pubDate>Sat, 18 Jul 2009 05:58:56 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[high school programming]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[Think Parallel]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/07/17/talented-kids-amp-multi-core-adding-fuel-to-the-mix/</guid>
		<description><![CDATA[Now I can add orchestra leader to my list of job roles at Intel. I’ve been conducting an ensemble of talented players from across industry, education and within Intel to orchestrate the first High School Parallelism Boot-camp. I’ve been crafting the flow of topics &#38; lab activities, developing some new ways to convey parallelism topics [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="auto;"><span style="small;"><span style="Times New Roman;">Now I can add orchestra leader to my list of job roles at Intel.<span style="yes;"> </span>I’ve been conducting an ensemble of talented players from across industry, education and within Intel to orchestrate the first High School Parallelism Boot-camp.<span style="yes;"> </span>I’ve been crafting the flow of topics &amp; lab activities, developing some new ways to convey parallelism topics using role play, coordinating with luminaries &amp; Intel engineers.<span style="yes;"> </span>I’ve been rounding up &amp; testing content &amp; systems for months now. Now all that work is about to pay off next week in the first ever (as far as I know) High School Parallelism boot-camp hosted at <a href="http://bths.edu/">Brooklyn Technical High School </a>and sponsored by Intel, Bank of America, Blade Network &amp; IBM.<span style="yes;"><br />
</span></span></span></p>
<p class="MsoNormal" style="auto;"><span style="Times New Roman;">Warning - new metaphor coming…<br />
</span></p>
<p class="MsoNormal" style="auto;"><span style="small;"><span style="Times New Roman;">We are about to embark on a journey.<span style="yes;"> </span>A journey to the future.<span style="yes;"> </span>A future defined by the many-core era.</span></span></p>
<p class="MsoNormal" style="auto;"><span style="small;"><span style="Times New Roman;">To take us on the journey – we need fuel.<span style="yes;"> </span>The fuel mix for this journey consists of four ingredients: multi-core hardware, parallelism training, great instructors and the fertile minds of bright high school students. We are about to light the mix off in a controlled burn next week (July 21-23) at the campus of <a href="http://bths.edu/">Brooklyn Tech HS</a>.<span style="yes;"> </span>A couple dozen HS students and faculty will participate in this hands-on parallel programming training event.<span style="yes;"> </span>We will be laying out the challenge to begin thinking in parallel and arming the students with three patterns that can be used to think about parallel problems. Then we will arm them with a couple of methods to implement parallel software on many-core HW. We will discuss some of the challenges unique to parallel programming and ways to address these challenges using SW tools &amp; new ways of thinking. Then we will show them just of few of the possibilities – a few compelling SW demos that show how “real” a virtual world powered by multi-core systems can be.<span style="yes;"> </span>Along the way we hope to arm them knowledge of how to leverage many-core systems that are headed our way. Hopefully we will refine our mixture as we learn from these students &amp; faculty how to better equip young minds.<span style="yes;"><br />
</span></span></span></p>
<p class="MsoNormal" style="auto;"><span style="Times New Roman;">I guess I want to extend James Reinder’s <a href="http://software.intel.com/en-us/blogs/2009/07/17/parallel-programming-is-fundamental-high-school-here-we-come/">point </a>and say that today high schools &amp; colleges are either teaching “the history of programming” or they are teaching parallel programming.<span style="yes;"> </span>I am excited to work with Brooklyn Technical High school and their principal, Randy Asher, who are doing something about teaching to the FUTURE.  I am also proud to work with folks like Jeff Birnbaum (Bank of America) who was the <span style="AR-SA;">inspiration </span>&amp; prime motivation for making this project happen.<br />
</span></p>
<p class="MsoNormal" style="auto;"><span style="Times New Roman;">Stand back, put your safety goggles on – we’re about to light the mixture.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/07/17/talented-kids-amp-multi-core-adding-fuel-to-the-mix/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Less Focus on Threads More Focus on Tasks</title>
		<link>http://software.intel.com/en-us/blogs/2009/04/24/less-focus-on-threads-more-focus-on-tasks/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/04/24/less-focus-on-threads-more-focus-on-tasks/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 20:52:53 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Academia]]></category>
		<category><![CDATA[clubhouse]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[Tasks]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[threads]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/04/24/less-focus-on-threads-more-focus-on-tasks/</guid>
		<description><![CDATA[Several years ago, when I looked for training courses on the subject of parallel programming for shared memory systems I found few courses being offered.  Some friends of mine and I did find a very nice course from a 3rd-party vendor on threaded programming.  The course mainly focused on "C" and using POSIX threads to [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="none;"><span style="Arial;">Several years ago, when I looked for training courses on the subject of parallel programming for shared memory systems I found few courses being offered.<span style="yes;">  </span>Some friends of mine and I did find a very nice course from a 3<sup>rd</sup>-party vendor on threaded programming.<span style="yes;">  </span>The course mainly focused on "C" and using POSIX threads to explicitly manage thread creation.<span style="yes;">  </span>The course did touch on higher level concepts such implementing a producer consumer using semaphores - but on balance - my recollection of the course was how I had to manage threads as developer. Even thread pools were largely self created &amp; self managed. The training reflected the level of maturity of the shared memory parallel programming state in the late 90's early 2000's. The key topic in this training was: Threads!</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">As I scan the horizon lately, I see an interesting pattern has emerged.<span style="yes;">  </span>Tools such as Threading Building Blocks, the new OpenMP 3.0 spec and the Executor interface from the Java.util.concurrent package, to name a few, are providing ways for developers to specify tasks to a library or runtime and allowing the library to manage the assignment of execution agents (threads) to these tasks.<span style="yes;">  </span>The central concept now: Tasks!</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">What are tasks?<span style="yes;">  </span>Tasks are logical units of work.<span style="yes;">  </span>A task may be a function call, it may be an iteration of a loop, or it may be a block of code encased in curly braces.<span style="yes;">  </span>Tasks are the job assignments, in code, the developer wants to accomplish.</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">The trick to parallel programming is finding as many independent tasks within an application as possible and then in finding as many dependent tasks as possible that can either be ordered or provided with some synchronization constructs that effectively make even these tasks conditionally independent.</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">So training for parallel programming going forward, I would argue, should focus more and more on tasks and less on threads; more on teaching developers how to identify independent tasks, and less on how specific explicit threading implementations. I believe we are reaching the point where we can grapple with the higher level abstraction of tasks and have some confidence that I can rely on a library or package to handle the thread assignment and thread management activities automatically.</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">Does this mean I can forget about Threads?<span style="yes;">  </span>No.<span style="yes;">  </span>At this point, knowledge of threads is still required to know how to deal with tasks that are not completely independent.<span style="yes;">  </span>A basic knowledge of threads, in so far as knowing that threads can run in any arbitrary order, is at least required to understand the possible side effects to threading such as race conditions or dead lock conditions.<span style="yes;">  </span>Proper synchronization of threads to data is key to eliminating these traps and synchronization on a shared memory system requires a knowledge of some threading API to create mutex or critical sections, or what-have-you.</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">At least, that’s my take.<span style="yes;">  </span>What is yours?</span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/04/24/less-focus-on-threads-more-focus-on-tasks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If you had 4 days to teach parallel programming… to high school students …</title>
		<link>http://software.intel.com/en-us/blogs/2009/03/16/if-you-had-4-days-to-teach-parallel-programming-to-high-school-students-2/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/03/16/if-you-had-4-days-to-teach-parallel-programming-to-high-school-students-2/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 19:57:06 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Academia]]></category>
		<category><![CDATA[clubhouse]]></category>
		<category><![CDATA[high school programming]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[parallel programming in high school]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/03/16/if-you-had-4-days-to-teach-parallel-programming-to-high-school-students-2/</guid>
		<description><![CDATA[If you had 4 days to teach parallel programming… to high school students …  Setup Let’s say you were given an invitation to lead a summer camp for high school students – a 4 day long day camp with students coming in from various schools and the only thing they have in common is that [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">If you had 4 days to teach parallel programming… to high school students …</span><span style="'Times New Roman';"> <br />
</span><strong><span style="'Times New Roman';"><span style="small;">Setup<br />
</span></span></strong><span style="'Times New Roman';">Let’s say you were given an invitation to lead a summer camp for high school students – a 4 day long day camp with students coming in from various schools and the only thing they have in common is that they are top notch, technically savvy kids who are avid programmers.<span style="yes;">  </span>I want this camp to attract the best and brightest and I want it to be an elite kind of opportunity for a select group of advanced students. You know that these high school students may lose interest if you sit them down for 4 days of solid lectures; to mitigate any lack of attention we’ll want a lot of hands on activities. You want these students to come out of the camp with some practical parallel programming skills – demonstrable skills that will be proven by the students actually writing and testing a successful parallel program.<span style="yes;">  </span>How do you spend the 4 days? What is your recommended scope and sequence of topics?<span style="yes;">  </span>Should I call this a bootcamp?</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';"><span style="small;">My Challenge<br />
</span></span></strong><span style="'Times New Roman';">These are the questions I am grappling with as I lay out a plan for a pilot project for parallelism at the high school level with a small group of very savvy High school programmers.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">First of all, while it’s fine to start off the camp in a purely conceptual way – talking about ideas &amp; concepts for parallelism and even walking through examples of parallelism we see every day in real life – waiting in a line at a store and wishing for more checkers, enlisting the help of friends to paint a room – at some point we have to get around to programming.<span style="yes;">  </span>So what language will I choose?<span style="yes;">  </span>Some high school students have programmed in Java, some in C/C++, and some in Java like languages like Alice from Carnegie Mellon University – so what do I choose and why?</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">I have chosen to teach the camp using C/C++.<span style="yes;">  </span>The reason for this choice is that there are some excellent diagnostic tools to help diagnose threading issues for C/C++. I feel strongly in having solid analysis tools available for students.<span style="yes;">  </span>Students should know the rudiments of debugging and have access to a good debugger when they first learn to program because it gives them a feel for the dynamics of a program.<span style="yes;">  </span>Similarly, when learning to program using parallel techniques, it is essential that a good set of parallel diagnostic tools are available.<span style="yes;">  </span>Students, especially, need tools to help them spot and avoid race conditions and deadlock conditions and “printf” just are NOT the answer. </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">Also – for students familiar only in C# or Java, the syntax differences are few and the C/C++ code examples are readily understood by students whose only background are C# or Java. </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">What is my approach?<span style="yes;">  </span>First let me say that it is not written in stone yet.<span style="yes;">  </span>I have based the first two days of the bootcamp around some excellent material provided to us by Professor Michael Quinn which I regard as an introduction to parallel programming. The third day is based on materials provided by Intel engineers and cover OpenMP, Intel® Threading Building Blocks, Intel® Thread Checker &amp; Intel® Thread Profiler.<span style="yes;">  </span>These materials will provide students a couple of tangible ways to implement parallelism in a C/C++ application. The last day is tentatively earmarked to teach students how to approach parallelizing a demo game written in C/C++.</span><span style="'Times New Roman';"> </span><span style="'Times New Roman';">Below is how my planning is shaping up so far.</span><strong><span style="'Times New Roman';"><span style="small;">Bootcamp<span style="yes;">  </span>- Day 1:<br />
</span></span></strong><strong><span style="'Times New Roman';">Parallel Puzzle de Jour<br />
</span></strong><span style="'Times New Roman';">Use puzzles as an introduction to the students to give them some real-world background in some of the problems that CE majors typically are called on to solve.</span></p>
<p> </p>
<p> </p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Introducing Parallel Programming<br />
</span></strong><span style="'Times New Roman';">Here I will define parallel computing, explain why parallel computing is becoming mainstream and explain why explicit parallel programming is necessary.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Recognizing Potential Parallelism<br />
</span></strong><span style="'Times New Roman';">This is where students recall opportunities for parallelism in the real world</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">-<span style="1;">        </span>working at a restaurant, </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">-<span style="1;">        </span>adding more checkers to alleviate long lines while shopping</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">-<span style="1;">        </span>ways of tackling class assignments in parallel sub-teams</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">Do a “students as threads” activity to carry out a parallel computation (possibilities: compute a parallel sum, perform a sorting operation, find a maximum value of an array of data)</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">Identify opportunities for parallelism in code segments and applications</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Shared Memory and Threads<br />
</span></strong><span style="'Times New Roman';">Describe the shared-memory model of parallel programming. Explore the differences between the fork/join and the general threads models. Demonstrate how to implement domain and functional decompositions using threads. Investigate whether a variable in a multithreaded program should be shared or private</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Short Introduction to OpenMP AKA Implementing Domain Decompositions<br />
</span></strong><span style="'Times New Roman';">Identify “for loops” that can be executed in parallel. Identify blocks of code suitable for parallel execution. Add OpenMP pragmas to programs that have suitable blocks of code or “for loops”.<span style="yes;">  </span>Explore the new OpenMP Task directive. <span style="1;">      </span>Demonstrate the proper use of the “single” and “nowait” directives </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';"><span style="small;">Bootcamp<span style="yes;">  </span>- Day 2:<br />
</span></span></strong><strong><span style="'Times New Roman';">Parallel Puzzle de Jour<br />
</span></strong><span style="'Times New Roman';">Review any solutions or ideas for tackling yesterdays puzzle and introduce a new puzzle to be solved by the end week.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">Confronting Race Conditions</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="'Times New Roman';">Give practical examples of ways that threads may contend for shared resources. Write an OpenMP program that contains a reduction. Describe what race conditions are and explain how to eliminate them. Define deadlock and explain ways to prevent it.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Implementing Task Decompositions<br />
</span></strong><span style="'Times New Roman';">Describe how threads can be used to implement parallel programs using task decomposition. Implement a task decomposition based on work pools. <span style="1;">  </span>Implement a task decomposition in which different threads execute different functions. Examine two case studies: 1) The N Queens Problem, 2) Fancy Web Browser</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Improving Parallel Performance<br />
</span></strong><span style="'Times New Roman';">Give reasons why one sequential algorithm may more suitable than another for parallelization. Use loop fusion, loop fission, and loop inversion to create or improve opportunities for parallel execution. Explain the pros and cons of static versus dynamic loop scheduling. Explain Load Balancing. Explain Locality. Explain why it can be difficult both to optimize load balancing and maximize locality</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';"><span style="small;">Bootcamp<span style="yes;">  </span>- Day 3:<br />
</span></span></strong><strong><span style="'Times New Roman';">Parallel Puzzle de Jour<br />
</span></strong><span style="'Times New Roman';">Review any solutions or ideas for tackling yesterdays puzzle and introduce a new puzzle to be solved by the end week.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Introduction to Threading Building Blocks<br />
</span></strong><span style="'Times New Roman';">Give an overview of Threading Building Blocks. Describe Generic Parallel Algorithms such as parallel_for, parallel_reduce, parallel_sort. Explain theTask Scheduler. Discuss Generic Highly Concurrent Containers. Explore Scalable Memory Allocation. Discuss Low-level Synchronization Primitives</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Correcting Threading Errors </span></strong><strong><span style="'Times New Roman';">with Intel® Thread Checker<br />
</span></strong><span style="'Times New Roman';">Discuss the Intel® Thread Checker. Examine race conditions. Exlore the Intel Thread Checker.<span style="yes;">  </span>Discuss some types of threading errors. Examine library thread-safety</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Tuning Threading Code </span></strong><strong><span style="'Times New Roman';">with Intel® Thread Profiler<br />
</span></strong><span style="'Times New Roman';">Discuss Intel® Thread Profiler features. Define Critical Path Analysis. Examine Thread Profiler “data views”. Review common performance issues of multithreaded applications. Examine Load imbalance. Examine Synchronization contention. Describe general optimizations to gain better performance</span><span style="'Times New Roman';"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';"><span style="small;">Bootcamp<span style="yes;">  </span>- Day 4:<br />
</span></span></strong><strong><span style="'Times New Roman';">Parallel Puzzle de Jour<br />
</span></strong><span style="'Times New Roman';">Review any solutions or ideas for tackling yesterdays puzzle and introduce a new puzzle to be solved by the end week.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Threading Games for Performance</span></strong><span style="'Times New Roman';"><span style="yes;"> <br />
</span></span><span style="'Times New Roman';">Case Studies - Destroy the Castle Demo game. Give an overview of Multi-Threading in Games. Introduce the Destroy the Castle Demo. Examine Functional Decomposition for this game. Examine Data Decomposition.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';">Parallel Puzzle Review of Solutions<br />
</span></strong><span style="'Times New Roman';">Student lead discussion of approaches they took towards the parallel puzzle de jour.<span style="yes;">  </span></span><span style="'Times New Roman';"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><strong><span style="'Times New Roman';"><span style="small;">My Challenge to You<br />
</span></span></strong><span style="'Times New Roman';">So, you’ve seen my proposal.<span style="yes;">  </span>You know my target audience. You may not agree with the selection of topics, the selection of language, or perhaps the scope or the sequence. So, tell me, how would you design a 4 day bootcamp for technically savvy high school students?</span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/03/16/if-you-had-4-days-to-teach-parallel-programming-to-high-school-students-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Let change the way we deliver ISC content</title>
		<link>http://software.intel.com/en-us/blogs/2008/09/10/let-change-the-way-we-deliver-isc-content/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/09/10/let-change-the-way-we-deliver-isc-content/#comments</comments>
		<pubDate>Thu, 11 Sep 2008 05:50:20 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Computer Science Lessons plans]]></category>
		<category><![CDATA[Faculty training]]></category>
		<category><![CDATA[online content]]></category>
		<category><![CDATA[teaching parallelism]]></category>
		<category><![CDATA[teaching programming]]></category>
		<category><![CDATA[wikiversity]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/09/10/let-change-the-way-we-deliver-isc-content/</guid>
		<description><![CDATA[The Intel Software College Course Architect Team is about to head into our planning phase for 2009.  One of the tasks we are considering is how to make our Intel posted content easier to access, easier to update, easier for faculty to incorporate lessons into their curriculum.  As you probably are aware – currently you [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">The Intel Software College Course Architect Team is about to head into our planning phase for 2009.<span style="yes;">  </span>One of the tasks we are considering is how to make our Intel posted content easier to access, easier to update, easier for faculty to incorporate lessons into their curriculum.<span style="yes;">  </span>As you probably are aware – currently you can download our standalone modules from the Intel Academic Community webpage on ISN - <a href="http://software.intel.com/en-us/articles/multi-core-courseware-content-from-intel-1/">http://software.intel.com/en-us/articles/multi-core-courseware-content-from-intel-1/</a>.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">While our current content deployment mechanism works, I fear that it may be more difficult than it needs to be in order to access our content. In most cases, our current model has a listing of modules with course description and a link to download a ZIP of our content, which includes powerpoint slides, student lab manuals, and ZIP’s of the various labs themselves. In the spirit of trying to find a more suitable delivery mode I have explored a number of online learning sites and I’d like to get your take on the pros and cons of these sites in comparison to what we currently offer.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">The two sites I am looking into currently as roll models are …drumroll…</span></p>
<p class="MsoNormal" style="list .5in;"><span style="Arial;">1)</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">Wikiversity (<a href="http://en.wikiversity.org/wiki/Wikiversity:Main_Page">http://en.wikiversity.org/wiki/Wikiversity:Main_Page</a>) and </span></p>
<p class="MsoNormal" style="list .5in;"><span style="Arial;">2)</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">MIT Opencoursware (<a href="http://ocw.mit.edu/OcwWeb/web/home/home/index.htm">http://ocw.mit.edu/OcwWeb/web/home/home/index.htm</a>)</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">I don’t have enough time in tonight’s blog to analyze both sites – so I will cover one tonight, Wikiversity, and I promise to do the other one next blog post.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">The first site I’d like to consider as a potential model for future ISC delivery is… </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"><a href="http://en.wikiversity.org/wiki/Portal:Computer_Science">http://en.wikiversity.org/wiki/Portal:Computer_Science</a>. As a newbie member of this portal – look me up – I’m zmadscientist – I thought this approach to presenting lessons was pretty intuitive.<span style="yes;">  </span>The lessons tend to be short, byte sized chunks.<span style="yes;">  </span>They tend to be more verbose and text based and not so visually oriented as my home turf where we use a lot of powerpoint with punchy bulleted concepts.<span style="yes;">  </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">I looked up a several programming related topics to get a feel for the layout.<span style="yes;">  </span>The two topics I looked up were Parallel Programming (doesn’t exist), <span style="yes;"> </span>Threading (doesn’t exist), Parallel Computing, C++ programming with the<span style="yes;">  </span>Alice Programming language.<span style="yes;">  </span>My intent was to get a gut level feel for the efficacy of the lessons and delivery mode.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">So to start - what I like about wikiversity is the simple layout and instant interaction with the lessons.<span style="yes;">  </span>This is a wiki – so the layout is not consistent across all topics – but at least the C++ topic looked well organized.<span style="yes;">  </span>And in one click I could get to a specific lesson – nice!</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">The layout for C++ is something like this:</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">1)</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">tabs across the top – resource tab laying out bulk of the content, discussion tab to get community feedback, then an edit and history tab that I am not as much into</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">2)</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">On the resource tab we see:</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">a.</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">Course title</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">b.</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">Course description, background discussion </span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">c.</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">Pre-requisites</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">d.</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">Pros &amp; Cons</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">e.</span><span style="Arial;"><span style="Times New Roman;">       </span></span><span style="Arial;">Lessons (hyperlinks to a wikiversity page specific to that one lesson)</span></p>
<p class="MsoNormal" style="none;"><span style="Arial;">f.</span><span style="Arial;"><span style="Times New Roman;">         </span></span><span style="Arial;">Enrolled – showing people registered who are interested in the topic</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">So kudos for ease of use and getting to the lessons!</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="black;"><span style="Times New Roman;"> </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">But what about dynamic content? Content like showing how a data race can develop with two threads trying to access a shared variable.  Can that be done effectively in a wiki format?<span style="yes;">  </span>I’m not convinced.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">And what about the size of each lesson.  The C++ example, hello world, takes roughly two typed pages to explain. <span style="yes;"> </span>This felt very toyish and rudimentary to me but maybe that’s appropriate for lesson 1<span style="yes;">  </span>It still feels very spoon fed to me. Is this really suitable for topics like Threading Building Blocks, Win32 threads, OpenMP 3.0, or software tools with a load of graphic content to them like VTune Analyzer or Thread Checker?<span style="yes;">  </span>We currently have over 4 hours of materials ~ 80 foils worth of compressed bulleted content on VTune analyzer alone. I can see that ballooning into a <span style="yes;"> </span>hundred pages if we took the same approach – and that’s just VTune Analyzer. <span style="yes;"> </span>I suppose this could be cut into a few tens of smaller lessons each maybe 10 pages long - but is that the right direction to take?</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">I suppose that screen shots can be embedded (we do that in powerpoint now) but I can’t find a simple way to animate graphics here – Well I suppose we could turn everything into flash tutorials but then its not really well suited to a wiki format then is it? And Flash seems too synthetic and constrained for the kinds of materials we currently offer. <span style="yes;"> </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">More advanced topics like parallel computing are described only at the highest level and no lessons are currently available – see <a href="http://en.wikiversity.org/wiki/Topic:Parallel_computing">http://en.wikiversity.org/wiki/Topic:Parallel_computing</a> – so its hard to completely envision how ISC content could play into a wikiversity like model</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="black;"><span style="Times New Roman;"> </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">Right now my take is that wikiversity offers an easier way to access content than the current ISC model – but it seems better suited to introductory materials like Programming with Alice and beginning C++. </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="black;"><span style="Times New Roman;"> </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">So, what’s your take? Should ISC morph its online content area to be more in the image of wikiversity?<span style="yes;">  </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">Give me some feedback – how can we better server you? Is a different delivery modality even the answer? It would be a huge departure from what we are doing now - and would takes a bit of work to retool.  Is it worth it?</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Arial;">Next time – I want to take a harder look at how MIT offers content thru their OpenCourseware site. <span style="yes;"> </span>Some of their material is phenomenal! – Look up <a href="http://ocw.mit.edu/OcwWeb/Physics/8-01Physics-IFall1999/CourseHome/index.htm">Classical Mechanics</a> (course 8-01) by Walter Lewin or <a href="http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/CourseHome/index.htm">Linear Algebra</a> (course 18-06) by Glibert Strang! I’ve downloaded each of these to my ipod and have been watching them repeatedly.<span style="yes;">  </span><span style="yes;"> </span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/09/10/let-change-the-way-we-deliver-isc-content/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>PGAS versus MPI and what should we teach undergraduates??</title>
		<link>http://software.intel.com/en-us/blogs/2008/04/22/pgas-versus-mpi-and-what-should-we-teach-undergraduates/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/04/22/pgas-versus-mpi-and-what-should-we-teach-undergraduates/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 16:34:38 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[IPDPS]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[PGAS]]></category>
		<category><![CDATA[Scheme]]></category>
		<category><![CDATA[Titanium]]></category>
		<category><![CDATA[Undergarduate parallelism]]></category>
		<category><![CDATA[Unified Parallel C]]></category>
		<category><![CDATA[UPC]]></category>
		<category><![CDATA[Yelick]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/04/22/pgas-versus-mpi-and-what-should-we-teach-undergraduates/</guid>
		<description><![CDATA[One of the highlights of this 22nd annual IPDPS conference was the Wednesday night panelist discussion. The discussion probed the general topic of what the current parallel programming experts (eps IPDPS faculty &#38; researchers) can teach to a new generation who will just now be cutting their teeth on MC processors and growing up in [...]]]></description>
			<content:encoded><![CDATA[<p><font face="Times New Roman">One of the highlights of this 22nd annual IPDPS conference was the Wednesday night panelist discussion. The discussion probed the general topic of what the current parallel programming experts (eps IPDPS faculty &amp; researchers) can teach to a new generation who will just now be cutting their teeth on MC processors and growing up in a non sequential programming landscape. The discussion was recorded for IEEE TV but I have not seen it posted yet on either the IPDPS site or the IEEE TV site. So for lack of being able to review the panelists discussion again in detail I have relied on my cryptic notes, scribbled furiously during the discussion. The panel consisted of several of the "earth movers" - those prominent professors or researchers in parallel computing - who have participated in massively parallel computing or more appropriately, shaped massively parallel computing landscape. I want to cover each of the panelists opinions about the future, but must start somewhere - so let me begin with Katherine Yelick's presentation.</font><font face="Times New Roman">Kathy Yelick is NERSC Division Director, Lawrence Berkeley National Lab and EECS Department, University of California at Berkeley. She received her Bachelors (1985), Masters (1985), and PhD (1991) degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Her research interests include parallel computing, memory hierarchy optimizations, programming languages and compilers. </font><font face="Times New Roman">In this panel discussion, Kathy laid out a path whereby CS students would likely be split into two types of specialties: 1) an efficiency layer - Performance oriented parallel programmers who understand cache locality and lower level parallelism mechanics and who write abstraction layers &amp; libraries upon which the rest of developer community stands<br />
2) a productivity layer - who use the lower level parallel abstractions and libraries to and who focus on application domain<br />
In Katherine's model - she would hope that ~10% of the developers who have to be deeply involved in the efficiency layer, while the majority (90%) of developers would be trained as application domain folks</font><font face="Times New Roman">Katherine stated that it would be a disaster if the split between efficiency programmers and application programmers urns out to be more like 50/50% rather than the 10/90% split proposed and believes universities should begin prepping CS students along these two disciplines and hopefully - the numeric proportion will fall to the 10/90.</font><font face="Times New Roman">She laid a case for tackling large many core application going forward with Parallel Global Addressable Systems (PGAS) languages - like Unified parallel C (UPC), Titanium, CAF etc as opposed to continued efforts in MPI. They case she laid out was similar or better performance with PGAS with substantially fewer lines of source code. It turns out UPC was discussed frequently in the sessions I attended and I am considering whether Intel Software College should add a module on UPC or Titanium etc for our University program - Thoughts?</font><font face="Times New Roman">She said there are gaps in current training at the undergraduate level that need correction. Essentially these gaps are 1) there is currently a lack of deep understanding of performance &amp; algorithmic complexity, 2) there is currently a lack of understanding of algorithms<br />
3) there is a lack of sophisticated understanding of concurrency, synchronization, non determinism, load balance etc - presumably - these gaps are less of an issue for the application domain folks but are of critical importance ot efficiency programmers of the future. This is an area where many of our Intel Software college already has some material in the pipeline and its was good to get implicit validation that our plans align with the Berkeley vision.</font><font face="Times New Roman">Katherine roughly laid out the Berkley approach, where they have begun teaching parallelism to more entry level programming students. They have incorporated cluster computing fundamentals into the introductory computer science curriculum at UC Berkeley. In that course, they have developed coursework and programming problems centered around Google's MapReduce. They used a language called Scheme to write and run MapReduce programs and can tackle parallel problems on a cluster using a special interface. Students can begin addressing data-parallel problems about two thirds of the way into this course. </font><font face="Times New Roman">They are targeting more concurrency in their systems course and now have a capstone course for seniors who will be moving off into the efficiency programming type of endeavors. At the higher level, they will be focusing on 13 Berkley "motif" or what has been called the Berkeley 13 dwarfs. These are 13 categories of applications that have similar target kernels underlying their operation. Efficiently parallelize the kernels and you solve a whole class of applications in that share that kernel or motif.</font><font face="Times New Roman">This Berkley approach is one that Intel Software College is pursuing to help seed our faculty training. We are working with a couple of the Berkley faculty teaching the Berkley motifs exploration course together with design patterns. We plan to incorporate findings and lablets and instructor materials from this course to our wiki in the June time frame.</font><font face="Times New Roman">As part of those postings we will also be posting materials we have created or materials we have received on other parallel techniques: Design Patterns, MPI, functional language examples (Erlang), Software Transactional Memory, etc.</font><font face="Times New Roman">So next time I'll recap a few more of the panelist discussions and how they may impact our curriculum postings.</font><font face="Times New Roman">If you have examples of what YOU are teaching in YOUR undergraduate courses to teach parallelism - I'd love to hear from you - please respond to my blog or better yet - post your materials for use by many universities to our wiki!</font><font face="Times New Roman">bye for now </font><font face="Times New Roman"> </font></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/04/22/pgas-versus-mpi-and-what-should-we-teach-undergraduates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Functional Languages versus threading</title>
		<link>http://software.intel.com/en-us/blogs/2008/04/22/functional-languages-versus-threading/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/04/22/functional-languages-versus-threading/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 16:26:56 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Functional Language]]></category>
		<category><![CDATA[HASKEL]]></category>
		<category><![CDATA[LISP]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/04/22/functional-languages-versus-threading/</guid>
		<description><![CDATA[I had responded to some questions in my other post (view from 22nd annual IPDPS - http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/) about functional languages.  It was suggested by Clay B that I should make a seperate post along the functional language topic - so here goes.  I asked Dr Dennis (Prf Emeritus CS at MIT) his thoughts on what [...]]]></description>
			<content:encoded><![CDATA[<p>I had responded to some questions in my other post (view from 22nd annual IPDPS - <a href="http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/">http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/</a>) about functional languages.  It was suggested by Clay B that I should make a seperate post along the functional language topic - so here goes. </p>
<p>I asked Dr Dennis (Prf Emeritus CS at MIT) his thoughts on what should be taught to undergrads to prepare them for a many core future. This was a lunchtable conversation and not a true curriculum planning session - but none the less - Dr Denis did not hesitate to mention functional languages as one of the first things that should be taught. I don't think he was implying that a functional language is the magic bullet or one size fits all solution to bringing new developers to the discipline.</p>
<p>The reasons he articulted for teaching a functional language are that:<br />
1) it is a model for parallel prgramming that avoids the sunchroonization hazards associated with threading - data races etc.<br />
2) version of parallel languages are already being taught in many schools already so there s a low barrier to entry<br />
3) Examples of functional languages can already be found in industry</p>
<p>I would argue that there will still be a need to teach students about threading (but not with as high a priority). If you teach them about threading, you have to teach them about the associated hazards. This does not contradict that we should also teach them a functional language.</p>
<p>On other fronts here at IPDPS I talked to Keshav Pingali from the University of Texas, and Vivek Sarkar from Rice University who were panelists here Wednesday night. These panelists discussed how the current parallel computation community could pass their learnings on to a new generation of develpoers just starting out in multicore. After the panel discussion, Michael Wrinn &amp; I were able to get their take on essentially the same question I asked of Dr Dennis. They indicated that there still is some value in teaching functional programming to the new crop of developers. Keshav in particular indicated that his early exposure to functional languages has influenced his coding style in positive ways - years later. They acknowledged that some implentations of functional languages have been poor performers on sequential machines in the past and so they have not really caught on in industry (Erlang is the excetion to this statement) and also they cited some difficulties with cache locality &amp; dealing with multidmensional arrays from a developers perspective. However, they also indicated that several groups have been adding extensions to these lanuages, I think Haskell was mentioned, to allow for improved handling of arrays and to address locaility issues.</p>
<p>So here are my musings after havng these discussions. Warning controvesial road ahead - use caution when entering - Here's a controversial perspective on appropriateness of functional languages as we move into many core land. Maybe the locaility issues that caused functional languages to be discounted wont be so significant in a time when we are utilizing hundreds of cores. Why? In sequential programming days and in early days of multicore - 2 cores could possibily give you a 2X speedup. But the tax we pay for poor cache locality might be hypothetically ~10X - the number thrown around here in our conversations with these disinguished professors from Texas. So any language that could gain you 2X by parallelism - but lose you 10X thru poor locality probably would die a natural death. BUT - improve cache locality fo these languages and apply to many more cores. Suppose in future we have 100 cores and that compilers for the functional languages get a second look and get needed optimizations so that we reduce cache locality issues to be a 50% penalty rather than 10X (50% penalty means we run at 1.5X speedup). Now assume that we approach 100X due to parallelism but pay 1.5X tax. Thats a net gain ~66X speedup. Compiler writers might be persuaded to optimize a functional language if there was a decent pay off. 66X might do it. If the industry had frozen in a past instant of time at only 2 cores and if compiler writers at that time thought that they might be able to reduce locality penalty to 50% would it have been a worthwhile endeavor? 2X gain/1.5X loss 1.3X net - that a 30% gain in performance - not so clear that compiler writers would be enthusiastic to do all the work required because the potential pay off would not be that large. But maybe for 66X gain thay may consider it.</p>
<p>Caveat - I know I am assuming no overhead here and we likley wont scale perfectly or anywhere near it at 100 cores etc. But if there are no synchronization hazards - and we reduce the locality penalties maybe these things will scale better than we imagine - does anyone have a quantitaive way to address this? Quick - can someone whip out an equation from Hannesy &amp; Patterson?</p>
<p>My point is that perhaps we should keep an open mind ot the possiblity of functional languages and hope that the "economy scale" and clever compiler writers might be able to bring a solution. The payoff ultimately would be fundamentally easier approaches to parallelism.</p>
<p>Who knows?</p>
<p>bobc</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/04/22/functional-languages-versus-threading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some Views from the 22nd International Parallel &amp; Distributed Processing Symposium</title>
		<link>http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/#comments</comments>
		<pubDate>Wed, 16 Apr 2008 02:30:22 +0000</pubDate>
		<dc:creator>Robert Chesebrough (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Curriculum]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Functional Programming Language]]></category>
		<category><![CDATA[MPI]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/</guid>
		<description><![CDATA[Greetings from the 22nd International Parallel &#038; Distributed Processing Symposium in Miami! I had the privilege of discussing the future of parallel programming with a number of distinguished luminaries in parallel computing! The topics discussed in side hall discussions and informal lunch table chats were varied and dynamic. Given that this is the first day [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings from the 22nd International Parallel &#038; Distributed Processing Symposium in Miami!</p>
<p>I had the privilege of discussing the future of parallel programming with a number of distinguished luminaries in parallel computing! The topics discussed in side hall discussions and informal lunch table chats were varied and dynamic. Given that this is the first day of the conference for the broader set of technical sessions (the NSF track actually started on Sunday) I have no doubt that there will be more intersting discussions throughout the rest of the week. Let me start my recap of the conference with one lunchtime discussion I had on day one.</p>
<p>Today I had the privilege of talking to Jack Dennis, Professor Emeritus of Computer Science and Engineering at MIT. Over cold cut sandwiches and iced-tea, we had two very interesting discussions. The first regards Professor Dennis’ Fresh Breeze Project at MIT and the second regards his view of how to prepare undergraduate students for the many core world we are now entering.</p>
<p>Fresh Breeze is multi-core chip design which supports composable parallel programming. Composable means that parallel programs should be able to be used as components in still larger parallel programs without a lot of fuss and without a detailed knowledge of the implementation details of the smaller parallel program. Composable in this context implies that resources, such as processor allocation &#038; memory, should be flexible and able to be reassigned according to the current needs of computations. This project adheres to six important principles for supporting modular software construction with a view towards composability of parallel programs:</p>
<ol>
<li>Information Hiding Principle: The user of a module must not need to know anything about the internal mechanism of the module to make effective use of it.</li>
<li>Invariant Behavior Principle: The functional behavior of a module must be independent of the site or context from which it is invoked.</li>
<li>Data Generality Principle: The interface to a module must be capable of passing any data object an application may require.</li>
<li>Secure Arguments Principle: The interface to a module must not allow side-effects on arguments supplied to the interface.</li>
<li>Recursive Construction Principle: A program constructed from modules must be usable as a component in building larger programs or modules.</li>
<li>System Resource Management Principle: Resource management for program modules must be performed by the computer system and not by individual program modules.</li>
</ol>
<p>Professor Dennis holds that his new multi-core design would be key to building modular parallel programs. For more details – see Prof Dennis’ website - <a href="http://csg.csail.mit.edu/Users/dennis/"><u><font color="#0000ff">http://csg.csail.mit.edu/Users/dennis/</font></u></a>.</p>
<p>The second topic we bantered around the table was how academia could prepare undergraduate CS students for the burgeoning parallel computational world that has been thrust upon them (and us). On this topic – I asked Professor Dennis this question – "If there were a handful of changes that academia could make to prepare undergraduates for this many core transition – what would they be"? This was his response:</p>
<ol>
<li>Teach undergraduates a functional language – yes functional as in LISP – which avoids synchronization hazards. MIT’s been teaching LISP for years. Another choice might be Erlang as it side steps synchronization hazards, is a simple language to learn and has been designed for high performance use in the communications industry</li>
<li>Teach undergraduates MPI as in Message passing Interface. This paradigm has been used successfully in large scale parallel programs already.</li>
<li>Give them a blended HW/SW computer architecture course, including multicore arhcitecture – expose them to cache architecture so they understand performance</li>
<li>Expose them to hazards like synchronization hazards and data races so they can avoid these hazards when they implement parallel designs</li>
<li>Revisit object oriented programming methodology, which is great for sequential programs, but is not well suited for structured parallel programming. A program can easily be written which follows OOP but can still exhibit a data race.</li>
<li>Teach students the concepts of atomicity, determinacy, and the difference between them. Atomicity is the concept that all or none of a transaction must be completed. Determinacy deals with achieving a correct result regardless of the order in which threads execute. Atomicity does not imply determinacy.</li>
</ol>
<p>Professor Dennis – thanks for lending me your sage advice on how to steer our curriculum efforts!</p>
<p>So now I ask YOU!</p>
<p>If there were three (or five) changes that academia could make to prepare undergraduates for this many core transition – what would they be? What is YOUR response?</p>
<p>Bob C</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/04/15/some-views-from-the-22nd-international-parallel-amp-distributed-processing-symposium/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

