<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogs &#187; Performance and Optimization</title>
	<atom:link href="http://software.intel.com/en-us/blogs/category/performance-optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Fri, 25 May 2012 22:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>What is Intel(r) Secure Key Technology?</title>
		<link>http://software.intel.com/en-us/blogs/2012/05/14/what-is-intelr-secure-key-technology/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/05/14/what-is-intelr-secure-key-technology/#comments</comments>
		<pubDate>Tue, 15 May 2012 00:15:00 +0000</pubDate>
		<dc:creator>Gael Holmes Hofemeier (Intel)</dc:creator>
				<category><![CDATA[Intel SW Partner Program]]></category>
		<category><![CDATA[Manageability & Security]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Ultrabook]]></category>
		<category><![CDATA[Bull Mountain]]></category>
		<category><![CDATA[Digital Random Number Generator]]></category>
		<category><![CDATA[DRNG]]></category>
		<category><![CDATA[Intel Secure Key]]></category>
		<category><![CDATA[rdrand]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/05/14/what-is-intelr-secure-key-technology/</guid>
		<description><![CDATA[In a nutshell: Intel® Secure Key, was previously code-named Bull Mountain Technology. It is the Intel name for the Intel® 64 and IA-32 Architectures instruction RDRAND and its underlying Digital Random Number Generator (DRNG) hardware implementation. Among other things, the DRNG using the RDRAND instruction is useful for generating high-quality keys for cryptographic protocols. Because [...]]]></description>
			<content:encoded><![CDATA[<p><strong>In a nutshell:</strong></p>
<p>Intel® Secure Key, was previously code-named Bull Mountain Technology. It is the Intel name for the Intel® 64 and IA-32 Architectures instruction RDRAND and its underlying Digital Random Number Generator (DRNG) hardware implementation. Among other things, the DRNG using the RDRAND instruction is useful for generating high-quality keys for cryptographic protocols.</p>
<p>Because this technology recently launched (May 2012) with the Intel(r) 3rd Generation Core processors (code-named Ivy Bridge) the <em><strong>Bull Mountain Software Implementation Guide </strong></em>has been renamed to the <em><strong>Intel(r) Digital Random Number Generator </strong><strong>Software Implementation Guide.</strong></em></p>
<p>This technology is documented and described in the <em><strong>Intel(r) </strong><strong>Digital Random Number Generator Software Implementation Guide. </strong></em> It is intended to provide a complete source of technical information on the RDRAND Instruction usage, including code examples.  Here is a recap of what this guide covers:</p>
<ul>
<li>Random Number Generator (RNG) Basics and Introduction to the DRNG. This guide describes the nature of an RNG and its pseudo- (PRNG) and true- (TRNG) implementation variants, including modern cascade construction RNGs. We then present the DRNG's position within this broader taxonomy.</li>
<li>A technical overview of the DRNG, including its component architecture, robustness features, manner of access, performance, and power requirements.</li>
<li>RDRAND Instruction usage, providing reference information on the RDRAND instruction and code examples showing its use. This includes RDRAND platform support verification and suggestions on DRNG-based libraries.</li>
</ul>
<p>This Software Implementation Guide is designed to serve a variety of readers. Software Developers who already understand the nature of RNGs may skip the first 3 sections and simply refer to the RDRAND instruction reference and code examples.</p>
<p>Here is the link for the  <a href="http://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide/">Intel® Digital Random Number Generator (DRNG) Software Implementation Guide</a>.</p>
<p>Questions?  Please visit our <a href="http://software.intel.com/en-us/forums/intel-vpro-software-development/">forum</a> and start a discussion.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/05/14/what-is-intelr-secure-key-technology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TACC symposium and programming two SMP-on-a-chip devices</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/26/tacc-symposium-and-programming-two-smp-on-a-chip-devices/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/26/tacc-symposium-and-programming-two-smp-on-a-chip-devices/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 04:28:46 +0000</pubDate>
		<dc:creator>James Reinders (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Cilk Plus]]></category>
		<category><![CDATA[Intel MIC]]></category>
		<category><![CDATA[Knights Corner]]></category>
		<category><![CDATA[Knights Ferry]]></category>
		<category><![CDATA[many-core]]></category>
		<category><![CDATA[multi-core]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[SCC]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/26/tacc-symposium-and-programming-two-smp-on-a-chip-devices/</guid>
		<description><![CDATA[one presenter exclaimed “Time spent optimizing for MIC is time well spent because it optimizes your code for non-MIC processors at the same time.”]]></description>
			<content:encoded><![CDATA[<p>Real results for many-core processors illustrate the power of a familiar configuration (SMP) even when reduced to a single chip. SMP on-a-chip can use the same applications, same tools, offer the same flexibility and pose familiar challenges that are solved by familiar techniques and skills.</p>
<p><a href="http://www.tacc.utexas.edu/ti-hpcs12/program"><img class="size-full wp-image-47144 aligncenter" title="Screen shot 2012-04-26 at 8.05.50 PM" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/Screen-shot-2012-04-26-at-8.05.50-PM.png" alt="" width="500" /></a><a href="http://www.tacc.utexas.edu/ti-hpcs12/program"><img class="aligncenter size-full wp-image-47145" title="tacc" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/tacc.jpg" alt="" width="500" /></a></p>
<p>I recently attended a symposium, co-sponsored by TACC and Intel, at the Texas Advanced Computing Center (TACC) in Austin where the programming of two many-core devices were discussed. One was a research chip designed to push some limits and allow interesting research on a device that lacks many things a product would require. The research chip is known as Intel’s <a href="http://techresearch.intel.com/ProjectDetails.aspx?Id=1">Single-Chip Cloud Computer (SCC)</a>. The other many-core device was, a prototype of our new Intel Many Integrated Core (MIC) Architecture, the Knights Ferry co-processor. The deadline for papers precluded inclusion of results from pre-production Knights Corner co-processors which will be the first Intel MIC co-processor products. There was a lot of whispering in the hallways about the excitement of starting work with Knights Corner co-processors.</p>
<p>The papers, and the half day tutorial, at the “<a href="http://www.tacc.utexas.edu/ti-hpcs12/program">TACC-Intel Highly Parallel Computing Symposium</a>” all had strong elements relating to familiar parallel programming challenges: scaling and vectorization. This is because both devices are built on Intel Pentium processor cores hooked together with their design for a connection fabric on the same piece of silicon.</p>
<p>Simply put, they are both SMP on-a-chip (symmetric multi-processors) devices, with somewhat different design goals.</p>
<p>At Intel, we have been convinced that putting a familiar generally programmable SMP on-a-chip is a good idea. It has a familiarity in programmability which proves to have many benefits. SCC was built for research into many facets of highly parallel devices. Knights Corner is designed for production usage and is optimized for power and highly parallel workloads. Knights Corner is well suited for HPC applications that already run on SMP systems. Presenter after presenter who talked about using the prototype Knight Ferry mentioned how applications “just worked."</p>
<p>I like to say, “Programming is hard, and so is parallel programming.” It follows that making an SMP or an SMP on-a-chip get maximum performance may not quite be rocket science, but it is no walk in the park. So, there was plenty of room for the papers to discuss the challenges of tuning for any SMP system.</p>
<p>What was really striking was how optimizations for Knights Ferry co-processors were applicable to SMP systems in general. Several authors commented on how their work to get better scaling or better vectorization for Knights Ferry also improved the performance of the same code compiled to run on an Intel Xeon processor based SMP system.  This performance-reuse is very significant, and one presenter exclaimed “Time spent optimizing for MIC is time <em>well spent</em> because it optimizes your code for non-MIC processors at the same time.”</p>
<p>All the papers and presentations (including my keynote) are available on-line now at <a href="http://www.tacc.utexas.edu/ti-hpcs12/program">http://www.tacc.utexas.edu/ti-hpcs12/program</a></p>
<p>Here are some notes from a few of the talks:</p>
<p>Dr. Robert Harkness, gave an engaging talk entitled “Experiences with ENZO on the Intel Many Integrated Core Architecture.” I enjoyed his comment that “we always programming for the future” because they “never have enough compute power.” He looked at multiple programming models, but had the best results using the “dusty” MPI based program that he had running on an SMP before Knights Ferry. He did his work on MPICH 1.2.7p1 because Intel did not supply an MPI with the Knights Ferry systems. He said it was obsolete but very easy to build and use. He reported that one person (not a dedicated programmer) was able to build everything (a quarter million lines of code) in a single week without any application source code modifications at all. The week, it seems, was spent hunting down libraries and recompiling them including MPICH. His results scaled very well.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide01.png"><img class="aligncenter size-full wp-image-47130" title="slide0" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide01.png" alt="" width="500" /></a></p>
<p>His conclusions (from slide 30 of his presentation) were: “Intel MIC is the best way forward for large-scale codes which cannot use the existing GPGPU model (even with directives).”</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide1.png"><img class="aligncenter size-full wp-image-47131" title="slide1" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide1.png" alt="" width="500" /></a></p>
<p>A talk by Theron Voran, with the National Center of Atmospheric Research, looked at using Knights Ferry for Climate Science. He started by saying "We have large bodies of code laying around. We don't want to rewrite in new languages for restrictive architectures." He had several good introduction slides including a comparison of accelerators vs. multicore and many-core devices.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/Slide21.png"><img class="aligncenter size-full wp-image-47133" title="Slide2" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/Slide21.png" alt="" width="500" /></a></p>
<p>Here the challenges of vectorization offered opportunities for future work. Compiler hints, loop restructuring and relate activities should enhance performance on Xeon-based and MIC-based SMP systems, as well as work on improving scalability on more and more cores. Even with these challenges, the authors noted “Relative ease in porting codes” (recompiling) and the belief that computational capabilities of MIC will be worthwhile.</p>
<p>Ryan Hulguin, with the University of Tennessee, looked at CFD solvers on Knights Ferry. He looked at two methods, one based on Euler equations (for inviscid fluid flows) and another based on the BGK model Boltzmann equation (for rarefied gas flows). Performance results showed OpenMP to be effective on Knights Ferry, and that the SMP programming challenges of vectorization and having good concurrency held true on Knights Ferry as well.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide4.png"><img class="aligncenter size-full wp-image-47134" title="slide4" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide4.png" alt="" width="500" /></a></p>
<p>A talk on Dense Linear Algebra Factorization, from David Hudak at the Ohio Supercomputing Center, talked about Heterogeneous Programming Challenges. David is a Wolverine working in a Buckeye world. My heart goes out to him. I really enjoyed his separation of short-term issues that distract us from the real long-term challenges that will stay with us.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide5.png"><img class="aligncenter size-full wp-image-47135" title="slide5" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide5.png" alt="" width="500" /></a></p>
<p>The talk compared a QR factorization implemented in OpenMP with a Cilk Plus implementation. Both performed well. The authors emphasized that guidance to Vectorize and use lots of tasks, proved to work.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide31.png"><img class="aligncenter size-full wp-image-47138" title="slide3" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/slide31.png" alt="" width="500" /></a></p>
<p>I’ve written more than I set out to write, so I’ll stop here. The SCC related papers were very interesting as well, ranging from Tim Mattson’s overview of the program to papers showing research results from investigations using SCC. The other MIC related papers are all worthy as well, including an excellent paper on early experiences with MVAPICH2 doing Intra-MIC MPI communication. Amazing things you can do on an SMP on-a-chip… it runs a real Linux after all!</p>
<p>It is very common for demos to start with an ‘ssh’ (shell) to one of the Knights Ferry processors… and then running the application natively from the command line. SMP on-a-chip, indeed.  Too bad I can’t convince Intel to name it that.  Even if I did, it would probably be chipSMP™ model 8650plus XS. Nevermind, Knights Corner is fine by me.</p>
<p>The papers and talks can be at <a href="http://www.tacc.utexas.edu/ti-hpcs12/program">http://www.tacc.utexas.edu/ti-hpcs12/program</a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/26/tacc-symposium-and-programming-two-smp-on-a-chip-devices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel Announces the New Intel® SDK for OpenCL* Applications 2012</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 11:38:33 +0000</pubDate>
		<dc:creator>Arnon Peleg (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA["Intel OpenCL SDK"]]></category>
		<category><![CDATA["Intel OpenCL"]]></category>
		<category><![CDATA[openCL]]></category>
		<category><![CDATA[vcsource_product_oclsdk]]></category>
		<category><![CDATA[vcsource_type_event]]></category>
		<category><![CDATA[vcsource_type_news]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/</guid>
		<description><![CDATA[In support of the recent announcement of the 3rd Generation Intel® Core™ Processors, Intel has released the Intel® SDK for OpenCL* Applications 2012. For the first time, OpenCL* developers using Intel® architecture can utilize compute resources across both Intel® Processors and Intel® HD Graphics Driver 4000/2500]]></description>
			<content:encoded><![CDATA[<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/OpenCL_Logo_RGB.jpg"><img class="size-thumbnail wp-image-47080 alignnone" title="OpenCL_Logo_RGB" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/OpenCL_Logo_RGB-150x150.jpg" alt="" width="64" height="64" /></a></p>
<p>In support of the recent announcement of the<a href="http://www.intel.com/content/www/us/en/processors/core/core-processor-family.html"> 3<sup>rd</sup> Generation Intel® Core™ Processors</a>, Intel has released the Intel® SDK for OpenCL* Applications 2012. For the first time, OpenCL* developers using Intel® architecture can utilize compute resources across both Intel® Processors and Intel® HD Graphics Driver 4000/2500</p>
<p>From a person who, for the last couple of years has closely followed the emergence of the OpenCL standard, this announcement was something worth waiting for.  Less than a year ago, on this blog, I posted the news that the <a title="Permanent Link to Intel® OpenCL SDK 1.1 gold released" href="http://software.intel.com/en-us/blogs/2011/06/29/intel-opencl-sdk-11-gold-released/">Intel® OpenCL SDK 1.1 gold  was released</a>,  This was the first production OpenCL implementation from Intel targeting Intel® processors on Windows* OS. This current announcement is special, the Intel SDK for OpenCL Applications 2012 now supports not only the CPU but also the Intel HD Graphics 4000/2500 for Windows* 7 users.  We’ve come a long way in a year.</p>
<p style="text-align: center;"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/product_overview.jpg"><img class="aligncenter size-medium wp-image-47079" title="product_overview" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/product_overview-300x300.jpg" alt="Introducing the Intel(R) SDK For OpenCL* Applications" width="170" height="170" /></a></p>
<p>OpenCL <a href="http://www.intel.com/content/www/us/en/processors/core/core-processor-family.html">on the 3<sup>rd</sup> Generation Intel® Core Processor Family</a> extends Intel’s line of tools and APIs on Intel platforms and adds interoperability with other graphics APIs like DirectX*, OpenGL* and Intel® Media SDK, directly on the Intel HD Graphics device.</p>
<p>So what else is new in this release?</p>
<ul>
<li>A Single OpenCL* platform enables shared context for OpenCL applications running on both the CPU and Intel HD Graphics 4000/2500. The OpenCL platform with both CPU and HD Graphics devices is available seamlessly on the <a href="http://www.intel.com/p/en_US/support/detect/graphics">Intel® HD Graphics Drivers</a>.</li>
<li>Interoperability with the <a href="http://www.intel.com/software/mediasdk">Intel Media SDK</a> with no memory copy overhead</li>
<li>Improved performance for OpenCL applications running on Intel® Xeon® Processors and Intel® Core™ Processors. This CPU support is also available for Linux* OS developers.</li>
<li>Intel® SDK for OpenCL* applications development tools includes an offline compiler and a step-by-step OpenCL Kernel debugger (for CPU) integrated in Microsoft Visual Studio* 2010 integrated development environment.</li>
<li>10 OpenCL code samples, three of them new, are now available for independent download.</li>
</ul>
<p>The list above is just a sample of what is available with this new SDK. I recommend you read <a href="http://software.intel.com/file/43384">the product brief</a> or watch the <a href="http://software.intel.com/en-us/videos/channel/visual-computing/new-intel%C2%AE-sdk-for-opencl-applications-2012/1571382381001">introduction video</a> to get started with this new SDK.</p>
<p><strong>Download the SDK for free at <a href="http://www.intel.com/software/opencl">www.intel.com/software/opencl</a> and begin optimizing your applications for the 3<sup>rd</sup>Generation Intel® Core™ Processors today.</strong></p>
<p>Don’t forget to follow us on Twitter at <a href="https://twitter.com/#!/IntelOpenCL">@IntelOpenCL</a></p>
<p>&nbsp;</p>
<p style="text-align: center;"><object codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,47,0" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" height="300" width="345" id="flashObj"><param value="http://c.brightcove.com/services/viewer/federated_f9?isVid=1" name="movie" /><param value="#FFFFFF" name="bgcolor" /><param value="videoId=1571382381001&amp;playerID=741496470001&amp;playerKey=AQ~~,AAAArH1stHk~,LuRqJUw7MaeYQkat5frTpWWPINh71g7p&amp;domain=embed&amp;dynamicStreaming=true" name="flashVars" /><param value="http://admin.brightcove.com" name="base" /><param value="false" name="seamlesstabbing" /><param value="true" name="allowFullScreen" /><param value="true" name="swLiveConnect" /><param value="always" name="allowScriptAccess" /><embed pluginspage="http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash" allowscriptaccess="always" swliveconnect="true" allowfullscreen="true" type="application/x-shockwave-flash" seamlesstabbing="false" height="300" width="345" name="flashObj" base="http://admin.brightcove.com" flashvars="videoId=1571382381001&amp;playerID=741496470001&amp;playerKey=AQ~~,AAAArH1stHk~,LuRqJUw7MaeYQkat5frTpWWPINh71g7p&amp;domain=embed&amp;dynamicStreaming=true" bgcolor="#FFFFFF" src="http://c.brightcove.com/services/viewer/federated_f9?isVid=1"></embed></object></p>
<p>&nbsp;</p>
<p><strong><a href="https://twitter.com/#!/IntelOpenCL"></a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/25/intel-announces-the-new-intel-sdk-for-opencl-applications-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIMD tuning with ASM pt. 1 - Stars &amp; Constellations</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/24/simd-tuning-with-asm-pt-1-stars-constellations/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/24/simd-tuning-with-asm-pt-1-stars-constellations/#comments</comments>
		<pubDate>Tue, 24 Apr 2012 23:19:49 +0000</pubDate>
		<dc:creator>Matt Walsh (Intel)</dc:creator>
				<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[Intel(R) Performance Tuning Utility]]></category>
		<category><![CDATA[Intel® Streaming SIMD Extensions 4]]></category>
		<category><![CDATA[SSE]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/24/simd-tuning-with-asm-pt-1-stars-constellations/</guid>
		<description><![CDATA[ASM? You mean assembly language? I haven't looked at that since my senior project! How arcane! And compilers are so smart anymore, why should I care? I used to feel the same way...albeit with a latent desire to learn it as I wish I knew Latin. Then one day I found myself out of options [...]]]></description>
			<content:encoded><![CDATA[<p>ASM?  You mean assembly language?  I haven't looked at that since my senior project!  How arcane!  And compilers are so smart anymore, why should I care?</p>
<p>I used to feel the same way...albeit with a latent desire to learn it as I wish I knew Latin.  Then one day I found myself out of options on my SIMD code generation project.  The compilers were great, but making progress was like building a ship in the bottle.  I was playing a game I know you've played too: "Let's Guess What the Compiler Will Do"!</p>
<p>I got tired of that game and bit the bullet.  I did ASM dumps and tried to understand them.  At first it appeared to me as a chaotic mess...like stars to someone who's never learned the constellations.  As time went on though I found Orion!  And Ursa Major too!  Sideribus apparuit!  That is, the patterns jumped out and became easy.  Before I knew it, diving into ASM became part of my routine.  </p>
<p>I want to share my know-how with you.  Each post I'll give you a program and take apart the ASM that we care about using the Intel® C++ Compiler for Linux*.  I guess I'll expect you to have a basic understanding of ASM, registers and the like...though I won't expect much.  Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/24/simd-tuning-with-asm-pt-1-stars-constellations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>History of … one CPU instructions: Part 1. LDDQU/movdqu explained</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/16/history-of-one-cpu-instructions-part-1-lddqumovdqu-explained/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/16/history-of-one-cpu-instructions-part-1-lddqumovdqu-explained/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 09:48:21 +0000</pubDate>
		<dc:creator>Maxym Dmytrychenko (Intel)</dc:creator>
				<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Ultrabook]]></category>
		<category><![CDATA[assembler]]></category>
		<category><![CDATA[CPU]]></category>
		<category><![CDATA[instruction]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/16/history-of-one-cpu-instructions-part-1-lddqumovdqu-explained/</guid>
		<description><![CDATA[Once upon the time and back to 2000, Intel brought to market NetBurst microarchitecture (http://en.wikipedia.org/wiki/NetBurst_%28microarchitecture%29 )  with Pentium 4 CPUs . At 2004, with its Prescott revision/core and as a part of SSE3 instruction set, we’ve got LDDQU instruction, Where the main focus area used to be - Video Encoding: The most compute-intensive part of [...]]]></description>
			<content:encoded><![CDATA[<p>Once upon the time and back to 2000, Intel brought to market NetBurst microarchitecture (<a href="http://en.wikipedia.org/wiki/NetBurst_%28microarchitecture%29">http://en.wikipedia.org/wiki/NetBurst_%28microarchitecture%29</a> )  with Pentium 4 CPUs .<br />
At 2004, with its Prescott revision/core and as a part of SSE3 instruction set, we’ve got LDDQU instruction,</p>
<p>Where the main focus area used to be - <strong>Video Encoding:</strong><br />
The most compute-intensive part of a video encoder is usually Motion Estimation (ME) where blocks from the<br />
current frame are checked against blocks from the previous frame to find the best match. Many metrics can<br />
be used to define the best match. The most common is the L1 metric: the sum of absolute differences. Due to<br />
the nature of ME, loads of the blocks from the previous frame are unaligned whereas loads of the blocks from<br />
the current frame are aligned. Unaligned loads suffer two penalties:<br />
• cost of handling the unaligned access<br />
• impact of the cache line splits<br />
The NetBurst microarchitecture does not support a uop to load 128-bit unaligned data. For that reason, 128-bit<br />
unaligned load instructions, such as movups and movdqu, are emulated with microcode, using two 64-<br />
bit loads whose results are merged to form the 128-bit result. In addition to the cost of the emulation, unaligned<br />
loads are penalized by the cost of handling cache line splits if the access crosses a 64-byte boundary.<br />
SSE3 adds lddqu to solve the cache line split problem on 128-bit unaligned loads. The instruction works by<br />
loading a 32-byte block aligned on a 16-byte boundary, extracting the 16 bytes corresponding to the unaligned<br />
access. Because the instruction loads more bytes than requested, some usage restrictions apply. Lddqu should<br />
be avoided on Uncached (UC) and Write-Combining (USWC) memory regions. Also, by its implementation,<br />
lddqu should be avoided in situations where store-load forwarding is expected. In load-only situations, and with<br />
memory regions that are not UC or USWC, lddqu can advantageously replace movdqu/movups/movupd.<br />
The code below shows an example of using the new instruction. Both code sequences are similar except that<br />
the load unaligned (movdqu) is replaced by the new unaligned load (lddqu). With the assumption that 25%<br />
of the unaligned loads are across a cache line, the new instruction can improve the performance of ME by up to<br />
30%. MPEG-4 encoders have demonstrated speedups greater than 10%.</p>
<p>Now some code snippet,</p>
<p><strong>Motion Estimator without SSE3:</strong><br />
movdqa xmm0, &lt;current&gt;<br />
movdqu xmm1, &lt;previous&gt;<br />
psadbw xmm0, xmm1<br />
paddw xmm2, xmm0</p>
<p><strong>Motion Estimator with SSE3:</strong><br />
movdqa xmm0, &lt;current&gt;<br />
lddqu xmm1, &lt;previous&gt;<br />
psadbw xmm0, xmm1<br />
paddw xmm2, xmm0</p>
<p>from <a href="http://download.intel.com/technology/itj/2004/volume08issue01/art01_microarchitecture/vol8iss1_art01.pdf">http://download.intel.com/technology/itj/2004/volume08issue01/art01_microarchitecture/vol8iss1_art01.pdf</a></p>
<p>A bit later there happened to be some follow ups, where most noticeable:<br />
<a href="http://software.intel.com/en-us/forums/showthread.php?t=56271">http://software.intel.com/en-us/forums/showthread.php?t=56271</a><br />
and<br />
<a href="http://x264dev.multimedia.cx/archives/8">http://x264dev.multimedia.cx/archives/8</a></p>
<p>so, in summary: starting from Intel Core 2 brand ( Core microarchitecture , from mid 2006, Merom CPU and higher) up to the future: lddqu does the same thing as movdqu</p>
<p>In the other words:<br />
if CPU supports Supplemental Streaming SIMD Extensions 3 (SSSE3) -&gt; lddqu does the same thing as movdqu,<br />
If CPU doesn’t support SSSE3 but supports SSE3 -&gt; go for lddqu<br />
(and note that story about memory types )</p>
<p>And the last point – from the patenting point of view, be aware about patent number: 6721866<br />
<a href="http://www.google.com/patents/US6721866">http://www.google.com/patents/US6721866</a><br />
as approach been used is actually protected.</p>
<p>Ultrabooks, on the other side, as been a cutting edge are incline to use even more advanced technology feature, called Quick Sync Video or QSV which is to allows for all related to video decode and encode activities be offloaded from the main CPU to the integrated graphics, meaning be faster or power smarter. </p>
<p>About development in this area - just note a key link for now : <a href="http://software.intel.com/en-us/articles/vcsource-tools-media-sdk/">http://software.intel.com/en-us/articles/vcsource-tools-media-sdk/</a> </p>
<p>PS: FYI one and good place for “all Intel’s microarchitectures” view: <a href="http://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures">http://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures</a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/16/history-of-one-cpu-instructions-part-1-lddqumovdqu-explained/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My wife bought an Ultrabook – and LOVES it!</title>
		<link>http://software.intel.com/en-us/blogs/2012/04/02/my-wife-bought-an-ultrabook-and-loves-it/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/04/02/my-wife-bought-an-ultrabook-and-loves-it/#comments</comments>
		<pubDate>Tue, 03 Apr 2012 05:17:31 +0000</pubDate>
		<dc:creator>Matt Ployhar (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Intel SW Partner Program]]></category>
		<category><![CDATA[Intel® AppUp Developer Program]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[Ultrabook]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[PC Gaming Ultrabook]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/04/02/my-wife-bought-an-ultrabook-and-loves-it/</guid>
		<description><![CDATA[Right now we have 4 PC laptops in our house; 5 if you count the iPad 2 being a ‘personal computing’ device. There’s my work HP Pavilion dv6, my personal Alienware M11x, her former Dell XPS M1530, which just got replaced by the Asus Zen book UX 31. In my sixteen years of being in [...]]]></description>
			<content:encoded><![CDATA[<p>Right now we have 4 PC laptops in our house; 5 if you count the iPad 2 being a ‘personal computing’ device.   There’s my work HP Pavilion dv6, my personal Alienware M11x, her former Dell XPS M1530, which just got replaced by the Asus Zen book UX 31.  In my sixteen years of being in the tech industry, and thirteen being with my wife, I’ve never seen her get so excited, and delighted, about technology and or a PC.  The only other time that gets this close would have been when I bought her an iPhone.  Sure…. We love our iPad 2, but tend to use it more for what’s termed ‘snacking’, or simply just casually surfing the internet, looking something up, perusing the occasional YouTube video, etc.   So this got me thinking that if something like an Ultrabook can have that sort of an impact on my wife, and reach a broader demographic than myself, then it warrants taking a closer look at.  </p>
<p>So what things does she like most about it? The below is in her words.</p>
<p>1) She loves the design, how sleek it is, and the brushed metal appearance.<br />
2) Loves the small form factor – fits in most of her handbags.<br />
3) Loves the Keyboard. Likes the spacing between the keys &#038; the way they feel.<br />
4) Setup was seamless, found all her ‘piles of different devices’.  “Right out of the box everything worked”.<br />
5) Liked the fact she didn’t have to download a bunch of updates.  Was up and running quickly.<br />
6) The Solid State drive.  (I asked her how she knew about that) – ‘because she read up on it’.<br />
7) Boots up super-fast.<br />
8) Likes the attention to detail.<br />
9) Out of box experience was great.  Wasn’t like unpacking something from just a bunch of cardboard.<br />
10) Likes the Case it came with, it’s like an envelope case.<br />
11) Loves the battery life.</p>
<p>	Ok… so I realize this is a sample of one; but I’m struck at how quickly she rattled off all the above features without even thinking about it.  So… about ten minutes later I asked her – ‘So what do you like about the iPad 2’?   (Note: It took her about three times the length of time to list the following things)</p>
<p>1) Touch screen.<br />
2) Size of the Form Factor.<br />
3) Convenience that it offers in being able to multi-task.<br />
4) Can play games on it.<br />
5) Good for reading stuff.<br />
6) Quickly checking email.</p>
<p>That’s where it ended… and then about three minutes later she says … ‘well, now with my Ultrabook, the iPad has now pretty much been relegated to being a kitchen gadget’.<br />
Interesting….<br />
So then I flipped the bit and asked her – ‘Is there anything you don’t like about your Ultrabook?’ – Answer: “not yet”.   IMO that's pretty cool.</p>
<p>Ok - so now onto some gratuitous pics of most of these devices.   (Note:  I didn’t include my Alienware M11x this time around).   In the foreground – bottom to top:  iPad 2, Asus Ultrabook, HP dv6, and then the Dell M1530<br />
<a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1251.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1251-224x300.jpg" alt="4 PC devices" title="IMG_1251" width="224" height="300" class="aligncenter size-medium wp-image-46409" /></a></p>
<p>In this next pic.. I’m comparing the thickness of the Dell to the Asus Ultrabook.<br />
<a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1254.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1254-300x224.jpg" alt="Ultrabook on top of Dell 1530" title="IMG_1254" width="300" height="224" class="aligncenter size-medium wp-image-46410" /></a</p>
<p>In this following pic I’m comparing the thickness of the Ultrabook (on the bottom) as compared to the iPad 2<br />
<a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1256.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1256-300x224.jpg" alt="iPad 2 on top of UB" title="IMG_1256" width="300" height="224" class="aligncenter size-medium wp-image-46411" /></a></p>
<p>In this final pic I’m comparing the iPad 2 (I had to put the case back on it in order to prop it up), the Ultrabook, and then the HP dv6<br />
<a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1261.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/04/IMG_1261-300x224.jpg" alt="iPad 2 - UB - HP" title="IMG_1261" width="300" height="224" class="aligncenter size-medium wp-image-46412" /></a> </p>
<p>For those concerned about the dimension of weight.  The Dell XPS for example weighs 5.9lbs  (2.6 kg), the Ultrabook comes in at 2.9lbs (1.3 kg).  This weight factor alone is one of the biggest selling points for me.  The best part is that I’m seeing little to no tradeoffs yet with regards to overall performance. These devices are packing a pretty serious punch.</p>
<p>So – in a nutshell I’m having some serious PC Laptop envy right now.   I might wait a few more months though.  For those that have been following the Ultrabook category – we should also start seeing the Ultrabooks that also integrate ‘touch’ – and convert into being either a Laptop and or a Tablet when you want it.   At any rate, I’m very sold on the concept, and yes, I’m keeping a very close eye on ensuring that all the games we PC Gamers love to play – play well on these!  Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/04/02/my-wife-bought-an-ultrabook-and-loves-it/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dualbooting Windows 7 and Windows 8</title>
		<link>http://software.intel.com/en-us/blogs/2012/03/20/dualbooting-windows-7-and-windows-8/</link>
		<comments>http://software.intel.com/en-us/blogs/2012/03/20/dualbooting-windows-7-and-windows-8/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 22:08:52 +0000</pubDate>
		<dc:creator>Rami Radi (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Intel SW Partner Program]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[Site News & Announcements]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dual boot]]></category>
		<category><![CDATA[dual booting]]></category>
		<category><![CDATA[dualboot]]></category>
		<category><![CDATA[Windows 8]]></category>
		<category><![CDATA[Windows 8 Consumer Preview]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2012/03/20/dualbooting-windows-7-and-windows-8/</guid>
		<description><![CDATA[The Windows 8 Consumer Preview ISO image became public a few days ago, which is available here, so I am sure a lot of people are interested in trying it out on their development systems without replacing their current Windows 7 installation. If you've ever dual booted a system before, the procedure for doing it [...]]]></description>
			<content:encoded><![CDATA[<p>The Windows 8 Consumer Preview ISO image became public a few days ago, which is available <a href="http://windows.microsoft.com/en-US/windows-8/iso">here</a>, so I am sure a lot of people are interested in trying it out on their development systems without replacing their current Windows 7 installation.</p>
<p>If you've ever dual booted a system before, the procedure for doing it for Windows 8 is not all that different. In summary, all you need to do is create a new partition for Windows 8, install it on that partition, and then edit your new boot menu if you want to keep Windows 7 as the default OS.</p>
<p><strong>Step One: Download and burn the Windows 8 Consumer Preview</strong></p>
<p>• Assuming that you downloaded the Consumer preview ISO image from the link above, you can use the <a href="http://www.microsoftstore.com/store/msstore/html/pbPage.Help_Win7_usbdvd_dwnTool">“ Microsoft Windows 7 USB/DVD Download Tool</a> to either burn the ISO image to a DVD disc or a USB drive. The tool is free, and very small, and installation instructions are available in the site itself and are very simple. Of course if you prefer to use other burning software like ImgBurn, you can do that too.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot4.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot4-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45518" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot1.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot1-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45519" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot5.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot5-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45520" /></a></p>
<p><strong>Step Two: Create a New Partition</strong></p>
<p>• Before you start, make sure to make a backup of your data and files. We will be creating new partitions and installing a new OS, so anything could go wrong, and you don't want to lose your everything. For paranoid people like me, I like taking "bare metal" backups of my systems with a wonderful open source and free tool called <a href="http://redobackup.org/">Redo Backup</a>. A bare metal backup takes a complete image of your hard drive, with all of its partitions. That way, I am able to restore my entire system the way it was exactly if needed. Going into more details about backups however is another topic.</p>
<p>• When you're ready, from within Windows 7, we will create some space for Windows 8 by using Windows' Disk Management. Click on the Start Menu and right click on "Computer", then click "Manage", and in the window that appears, click on "Disk Management" in the left sidebar.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot91.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot91-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45522" /></a></p>
<p>• Find your system hard disk in the graphical list that appears in the bottom pane. Right-click on it and then click "Shrink Volume".  20 GBs is a reasonable size that is not too small and not too big for the new Windows 8 partition, so shrink it down so you have at least 20GB of space left on the end of the drive, and click OK. Of course if you think you need more than 20 GB (if you are going to do intensive development and/or testing), or less than 20GB (if you don’t have enough space on your Windows 7 partition), then please feel free to choose a different size.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot10.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot10-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45523" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot11.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot11-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45524" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot12.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot12-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45525" /></a></p>
<p>• Then, click on the "Unallocated" block of that drive that appears and click "New Simple Volume". Click Next on the next few windows until you reach the "Format Partition" window. Here, give it a volume label you'll recognize (like "Windows 8") and click Next. It should format the drive for you. Now you're all set to install Windows 8.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot13.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot13-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45527" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot14.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot14-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45528" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot151.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot151-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45530" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot16.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot16-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45531" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot17.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot17-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45532" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot18.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot18-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45533" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot19.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/screenshot19-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45534" /></a></p>
<p><strong>Step Three: Install Windows 8</strong></p>
<p>• Now reboot your system, and go into your BIOS settings (for most systems, you need to press F2 or DEL). Now make sure your computer is set to boot from CD or USB as a first priority (depending on what medium you have decided to use earlier). This may be different from system to system though. Now reboot.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/IMG_0013.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/IMG_0013-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45535" /></a></p>
<p>• Now you should boot into the Windows 8 installer. It looks very similar to the Windows 7 installer, so it should be familiar. Pick your language and hit "Install Now”.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/1.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/1-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45536" /></a></p>
<p>• Enter the Product Key available on the Windows 8 Consumer Preview download page.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/2.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/2-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45537" /></a></p>
<p>• Now choose "Custom" when asked what type of install you'd like to perform. Then find the new partition you created on the list of drives shown. Make sure it's the right one, because remember, you are about to write over whatever is on it.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/3.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/3-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45538" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/4.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/4-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45545" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/5.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/5-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45546" /></a></p>
<p>• Hit "Next" and let the installer do its thing. When you're done, your computer should reboot into Windows 8. It'll probably reboot one more time after it does, then you will see the Windows 8 Start screen.</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/6.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/6-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45541" /></a></p>
<p><strong>Step Four: Make Windows 7 the Default OS Again<strong></p>
<p>• You'll notice when you first boot up into Windows 8 the new graphical boot menu that will let you choose between Windows 7 and Windows 8. Windows 8 will be the default, meaning if you don't manually choose Windows 7 from the menu, your computer will boot into Windows 8 after 3 seconds, unless you interrupt it. If this is not something you want, follow the steps below to make Windows 7 the default OS again.</p>
<p>• On the boot menu, click on the button at the bottom that says "Change Defaults or Choose Other Options", and hit "Choose the Default Operating System". From there, you can pick Windows 7 from the menu. From now on, your computer will boot into Windows 7 by default</p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/78.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/78-300x167.jpg" alt="" width="300" height="167" class="alignnone size-medium wp-image-45543" /></a></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/1.png"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2012/03/1-300x168.png" alt="" width="300" height="168" class="alignnone size-medium wp-image-45548" /></a></p>
<p>Thats it. Enjoy using the Windows 8 Consumer Preview, on your dualboot system.</p>
<p>Rami</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2012/03/20/dualbooting-windows-7-and-windows-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ultrabooks are here and so is our new community!</title>
		<link>http://software.intel.com/en-us/blogs/2011/12/31/ultrabooks-are-here-and-so-is-our-new-community/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/12/31/ultrabooks-are-here-and-so-is-our-new-community/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 16:33:42 +0000</pubDate>
		<dc:creator>Jeffrey Rott (Intel)</dc:creator>
				<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[Ultrabook]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/12/31/ultrabooks-are-here-and-so-is-our-new-community/</guid>
		<description><![CDATA[Without a doubt, one of the most exciting developments in the tech world for 2011 was the introduction of the Ultrabook.  We put a reference design out there and OEMs took it and ran with it.  Most of the models that arrived were slim, sleek, powerful and yet power-efficient.  I brought one of the current [...]]]></description>
			<content:encoded><![CDATA[<p>Without a doubt, one of the most exciting developments in the tech world for 2011 was the introduction of the Ultrabook.  We put a reference design out there and OEMs took it and ran with it.  Most of the models that arrived were slim, sleek, powerful and yet power-efficient.  I brought one of the current ultrabooks with me on holiday this year, knowing that it would be played with by my tech-savvy extended family, and it quickly found it’s way onto everyone’s gift list.  My guess is that their response will be replicated worldwide.  Once you see it, touch it and use it – you just have to have one.</p>
<p>With all the excitement going on, I couldn’t be more energized about being the community manager for this new form factor.  The community is structured around the main opportunities for ISVs to focus their development efforts at.  At the moment, those main areas are Power- efficiency, Performance and Graphics.</p>
<p>In the Power-efficiency section you’ll learn about and get help with the things you can do to make sure that your software is playing its role in keeping the ultrabook’s battery life as long as possible.  The only true way to get the most out of the batteries is if the hardware and the software are both playing “green” together.</p>
<p>On the Performance side, the current ultrabooks are powered by 2nd Generation Intel® Core™ processors.  These are some powerful processors!  And just like with desktops and standard laptops, your software needs to take advantage of the performance offered by these multicore chips.  So this section will be dedicated to helping you thread your applications to run on multiple processing cores.</p>
<p>And in the Graphics area, you’ll get the information and insight to help you look your best on the new ultrabooks.  With these new form factors, there’s not much room for discrete cards.  In fact, I’m not even sure how they got the processors in there.  So you’ll likely be depending on the performance of the integrated graphics processing to deliver your visuals to your customers.  This area will be the spot to help you put your best foot forward…visually.</p>
<p>This, of course, is just how to engage with the current model releases.  2012 and 2013 promise to bring about some exciting developments for ultrabooks, with new features, designs and operating systems introduced.  And with these new developments, the opportunities for ISVs will continue to grow.  And this community will be the place for you to go to figure out what your next steps should be.  But for now go ahead and get your applications optimized for the power-efficient performance and graphics capabilities of ultrabooks as these enabling vectors will surely be the foundation of on which future feature will be built on.</p>
<p>&nbsp;</p>
<p><a href="http://software.intel.com/en-us/ultrabook">Visit the Ultrabook Community here.</a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/12/31/ultrabooks-are-here-and-so-is-our-new-community/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Register for Intel(R) Technical Presentation &quot;Analysis of hybrid applications with the Intel(R) Cluster Studio XE 2012&quot;</title>
		<link>http://software.intel.com/en-us/blogs/2011/12/02/register-for-intelr-technical-presentation-analysis-of-hybrid-applications-with-the-intelr-cluster-studio-xe-2012/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/12/02/register-for-intelr-technical-presentation-analysis-of-hybrid-applications-with-the-intelr-cluster-studio-xe-2012/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 18:31:53 +0000</pubDate>
		<dc:creator>RAVI (Intel)</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Embedded Computing]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/12/02/register-for-intelr-technical-presentation-analysis-of-hybrid-applications-with-the-intelr-cluster-studio-xe-2012/</guid>
		<description><![CDATA[<a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/12/csxe_sm.png"><img class="alignnone" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/12/csxe_sm.png" alt=""  /></a>
Gergana Slavova, Technical Consulting Engineer, will be presenting "Analysis of hybrid applications with the Intel(R) Cluster Studio XE 2012" on Dec 7th at 9am PDT. Please register!]]></description>
			<content:encoded><![CDATA[<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/11/vtune_small.png"></a></p>
<div><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/12/csxe_sm.png"></a></div>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/12/csxe_sm.png"></p>
<div class="wp-caption alignnone" style="width: 121px"><img title="Intel(R) Cluster Studio XE" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/12/csxe_sm.png" alt="Intel(R) Cluster Studio XE" width="111" height="143" /><p class="wp-caption-text">Intel(R) Cluster Studio XE</p></div>
<p> </p>
<p></a></p>
<p>Gergana Slavova, Technical Consulting Engineer, will be presenting on Dec 7th at 9am PDT on the following topic:</p>
<p style="padding-left: 30px;"><strong>Analysis of hybrid applications with the Intel(R) Cluster Studio XE 2012</strong></p>
<p>Please register for this presentation using the following link:</p>
<p style="padding-left: 30px;"><a title="https://www1.gotomeeting.com/register/369788936" href="https://www1.gotomeeting.com/register/369788936" target="_blank">https://www1.gotomeeting.com/register/369788936</a><a href="https://www1.gotomeeting.com/register/934042048"></a></p>
<p>Here is a short abstract of the presentation:</p>
<p style="padding-left: 30px;">With the launch of Intel® Cluster Studio XE 2012, Intel enhanced the premium software development tools package for clusters with the inclusion of MPI support in Intel® Parallel Studio XE, and added new features for better scalability and improved performance. This session will introduce you to all MPI components of the new Intel® Cluster Studio XE 2012. You’ll learn how to use the new and more scalable startup mechanism to run MPI applications well up to 90000 cores, you’ll take a dive into benchmark data, and the improvements and details of the mpitune tool, and you’ll see, in an interactive demo, key elements and new scalability features of Intel® Trace Analyzer and Collector.  Finally, you’ll be shown how to enable the new MPI support in the Intel® VTune™ Amplifier XE and Intel® Inspector XE tools.</p>
<p>Here is a short bio of the presenter: </p>
<p style="padding-left: 30px;">Gergana Slavova received her bachelor’s degree in computer science at the University of Illinois at Urbana-Champaign in 2005. Following graduation, she joined the Intel Software and Services Group as a Technical Consulting Engineer, a position she has held for the past five years.  She works in the high performance computing area where she provides technical support, training, and consulting expertise for a suite of MPI and cluster development tools.</p>
<p>Please register for the presentation now and attend it on Dec 7th at 9am PDT. You can ask Gergana questions during the second half of the presentation.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/12/02/register-for-intelr-technical-presentation-analysis-of-hybrid-applications-with-the-intelr-cluster-studio-xe-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paving the Road to OpenMP 4</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/21/paving-the-road-to-openmp-4/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/21/paving-the-road-to-openmp-4/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 15:16:28 +0000</pubDate>
		<dc:creator>Michael Klemm (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Software Tools]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/21/paving-the-road-to-openmp-4/</guid>
		<description><![CDATA[The dust of SC’11 starts to settle and several announcements around OpenMP have been made in Seattle. There has been a change in the OpenMP Architecture Review Board and Language Committee. Several new members have joined the committee and started to actively participate in the development of future OpenMP versions. Also, Michael Wong (IBM) has [...]]]></description>
			<content:encoded><![CDATA[<p>The dust of SC’11 starts to settle and several announcements around OpenMP have been made in Seattle. </p>
<p>There has been a change in the OpenMP Architecture Review Board and Language Committee. Several new members have joined the committee and started to actively participate in the development of future OpenMP versions. Also, Michael Wong (IBM) has been elected as the new CEO of the OpenMP ARB. Michael will lead the ARB towards the release of the next version of the OpenMP API Specification. Michael brings in a lot of expertise in being one of IBM’s representatives in the OpenMP committee as well as other standard bodies such as C++. At SC’11 he has introduced himself at SC'11 and gave a great inaugural speech at the OpenMP BOF session. You can find the slide deck <a href="http://openmp.org/wp/presos/SCBOF11.pdf">here</a>.</p>
<p>In June 2011, the version 3.1 of the OpenMP 3.1 API Specification has been released and of course the current version of Intel® Composer XE 2011 already the supports the 3.1 specification. The GNU Compiler Collection is expected to support it when GCC 4.7 is released. </p>
<p>In the meantime my friends in the OpenMP Language Committee continue working on version 4.0 of the OpenMP API Specification. Version 4.0 is planned for Nov 2012; SC’12 would be a great opportunity to announce the release of a new specification, isn’t it? </p>
<p>There are several big additions that are under heavy investigation by the OpenMP Language Committee members. Accelerators and coprocessors, standardized affinity and new features around OpenMP tasks are just a few features that we are working on. </p>
<p>My sub-group works on specifying an error model for the OpenMP API, so that programmers finally are able to handle runtime errors and C++ exceptions accordingly. This will make writing safe code less cumbersome. Programmers will be able to detect both runtime and user-defined error conditions (e.g. C++ exceptions) and react accordingly. During the last year, we have made some very good progress on investigating the requirements for the new features and carefully investigated existing approaches for error handling in parallel programming languages.</p>
<p>In summary, the future of OpenMP looks bright from an OpenMP Language Committee perspective. Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/21/paving-the-road-to-openmp-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MIC architecture support by software tools - SC11 wrap-up</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 00:24:45 +0000</pubDate>
		<dc:creator>James Reinders (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Knights Corner]]></category>
		<category><![CDATA[Knights Ferry]]></category>
		<category><![CDATA[MIC]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/</guid>
		<description><![CDATA[This week we demonstrated the Knights Corner co-processor at SC11 and we had many developers demonstrating real results with the prototype systems. During the "SC11 season," a number of tool vendors announced they will be providing versions of their software tailored to supporting MIC architecture, starting with the Knights Corner co-processor. Here are the ones I know [...]]]></description>
			<content:encoded><![CDATA[<p>This week we demonstrated the <a href="http://software.intel.com/en-us/blogs/tag/knights-corner/">Knights Corner</a> co-processor at <a href="http://sc11.supercomputing.org/">SC11</a> and we had many developers demonstrating real results with the prototype systems.</p>
<p>During the "SC11 season," a number of tool vendors announced they will be providing versions of their software tailored to supporting MIC architecture, starting with the <a href="http://software.intel.com/en-us/blogs/tag/knights-corner/">Knights Corner</a> co-processor.</p>
<p>Here are the ones I know about and can share (there are more who will make their own announcement in the future):</p>
<ul>
<li><a href="http://www.roguewave.com/company/news-events/press-releases/2011/rw_mic_support.aspx">IMSL Library, Rogue Wave</a></li>
<li><a href="http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/">NAG Libraries, NAG Ltd.</a></li>
<li><a href="http://www.platform.com/press-releases/2011/PlatformAnnouncesSupportforIntelManyIntegratedCoreArchitectureBasedProducts%20/">Platform HPC, Platform LSF and Platform Cluster Manager, Platform Computing</a></li>
<li><a href="http://www.altair.com/newsdetail.aspx?news_id=10609&amp;news_country=en-US">PBS Works, Altair</a></li>
<li><a href="http://www.roguewave.com/company/news-events/press-releases/2011/rw_mic_support.aspx">Totalview debugger, Rogue Wave</a></li>
</ul>
<p><em>[editor's note... additional announcements (post-SC11) include:</em></p>
<ul>
<li><a href="http://www.caps-entreprise.com/fr/page/index.php?id=85">CAPS directive-based HMPP compiler, CAPS</a></li>
</ul>
<p><em>]</em></p>
<p>Of course... <a href="http://intel.com/software/products">Intel tools</a> as well, many of which we on display at the show in conjunction with Knights Ferry platforms.</p>
<p>There were also countless applications, many open source, that have been recompiled for MIC architecture and were being discussed around the show. Some I remember are NWChem, ENZO, ELK, MADNESS, MPI, GA, and Python... and I know I forget quite a few. Of course, Linux has been ported (and is running on both Knights Ferry and Knights Corner).</p>
<p>For additional tools announcements, please let me know, or post a comment!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/17/mic-architecture-support-by-software-tools-sc11-wrap-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>quick chat about MIC architecture with Mike Dewar, NAG</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 23:37:39 +0000</pubDate>
		<dc:creator>James Reinders (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[MIC]]></category>
		<category><![CDATA[NAG]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/</guid>
		<description><![CDATA[I ran into Mike Dewar at SC11 today as the exhibition draws to a close.  Mike is the CTO of NAG Ltd. - a company we've had the good fortune to work with for years. NAG is one of a handful of companies that have been providing feedback on our Knights Ferry (prototype MIC architecture). [...]]]></description>
			<content:encoded><![CDATA[<p>I ran into Mike Dewar at <a title="SC11" href="http://sc11.supercomputing.org">SC11</a> today as the exhibition draws to a close.  Mike is the CTO of <a href="http://www.nag.com/">NAG Ltd.</a> - a company we've had the good fortune to work with for years.</p>
<p>NAG is one of a handful of companies that have been providing feedback on our Knights Ferry (prototype MIC architecture).</p>
<p>Mike told me: "We found porting existing routines from the NAG Library to the Intel <a href="http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html">Many Integrated Core Architecture</a> (MIC) to be a relatively quick and painless process. The team was impressed at the way Intel has extended their existing software tools to support the MIC environment, allowing them to work in a familiar and productive environment."</p>
<p>I quizzed Mike on what it took to get it running on Knights Ferry, and he did share the one type of tuning they have to explore. Since they use OpenMP which has generally meant that the number of threads is more like 10-20 instead of the 120 threads they use on Knights Ferry.  I'll have to write more about that later - scaling and vectorization are keys as multicore and many-core grow. No mystery there. The good news is that their use of OpenMP made this a straightforward challenge they understood. It was not a mystery to them. They also make good use of <a href="http://intel.com/go/MKL">MKL</a> in their library as well, and of course we support that for MIC architecture.</p>
<p>It is great to know that NAG users will have the opportunity to continue using NAG software with <a href="http://software.intel.com/en-us/blogs/tag/knights-corner/">Knights Corner</a>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/17/quick-chat-about-mic-architecture-with-mike-dewar-nag/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Seeing One TeraFlop/sec, the software side, and feeling a bit emotional</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/17/seeing-one-teraflopsec-the-software-side-and-feeling-a-bit-emotional/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/17/seeing-one-teraflopsec-the-software-side-and-feeling-a-bit-emotional/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 17:27:54 +0000</pubDate>
		<dc:creator>James Reinders (Intel)</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[Knights]]></category>
		<category><![CDATA[Knights Corner]]></category>
		<category><![CDATA[MIC]]></category>
		<category><![CDATA[MKL]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/17/seeing-one-teraflopsec-the-software-side-and-feeling-a-bit-emotional/</guid>
		<description><![CDATA[I've known this day was coming - but when I saw Knights Corner clearly sustaining a TeraFlop (DGEMM, wide range of block sizes) per second - I was surprised by my emotional reaction inside. Hard to describe; it was a good feeling. Tuesday November 15, 2011, we showed a Knights Corner co-processor for the first time [...]]]></description>
			<content:encoded><![CDATA[<p>I've known this day was coming - but when I saw Knights Corner clearly sustaining a TeraFlop (DGEMM, wide range of block sizes) per second - I was surprised by my emotional reaction inside. Hard to describe; it was a good feeling.</p>
<p>Tuesday November 15, 2011, we showed a <a href="http://newsroom.intel.com/community/intel_newsroom/blog/2011/11/15/intel-reveals-details-of-next-generation-high-performance-computing-platforms">Knights Corner co-processor</a> for the first time outside Intel. It is fresh silicon - first silicon - which is always exciting (if it works at all). Not only does it work, we were able to boot Linux on it and demonstrate it doing a very real sustained TeraFLOP/s. We ran DGEMM with many block sizes (with a lot of consistency, which is something that not all hardware and software can do). Our Math Kernel Library product will include DGEMM for Knights Corner when it comes out as a product, so this will be reproducible by all.</p>
<p>To our knowledge - we demonstrated the world's fastest DGEMM, and the first to go above one TeraFLOP/s on DGEMM.  And it is a conservative measure: real, sustained TeraFLOP/s (not "raw" or other theoretical measures). And it is doing it now, not just on "paper." That part really hit home as I looked at it.</p>
<p>I knew what I would see, Knights Corner was not a surprise to me. But when I saw it, and could reach out and touch it - I was struck by the power.  I was part of the <a title="ASCI Red" href="http://en.wikipedia.org/wiki/ASCI_Red">ASCI Red</a> project between Intel and Sandia National Labs, that built the world's first TeraFLOP/s computer. We got to the same point (one TeraFLOP/s), before we finished building the machine, in December 1996. Now, we've done it again... this time with a single processor. Both used x86 processors from Intel - ASCI Red used over nine thousand Pentium Pro processors (later upgraded to Pentium II Xeon processors to be the first past 2 TeraFLOP/s), and now on a single pre-production Knights Corner to do the same.</p>
<p>Obviously, both projects involve a lot of people both inside and outside Intel. We have a great team inside Intel, and great partners, that can all feel good about both accomplishments. I'm happy to be one of a handful of people involved in both "firsts" to a TeraFLOP/s.</p>
<p>And, the trend continues. By the end of this decade, we should see a TeraFLOP/s from a 20W part (simple math: ExaFLOP/s at 20MW, means a TeraFLOP/s will be 20W).  A DGEMM sustained TeraFLOP/s in a notebook... it's coming. For now, we have Knights Corner which is plenty amazing.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/17/seeing-one-teraflopsec-the-software-side-and-feeling-a-bit-emotional/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open Parallel: Optimizing Web Performance with TBB</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/16/open-parallel-optimizing-web-performance-with-tbb/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/16/open-parallel-optimizing-web-performance-with-tbb/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 22:39:36 +0000</pubDate>
		<dc:creator>Nicolas Erdody</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[Software Tools]]></category>
		<category><![CDATA[HipHop]]></category>
		<category><![CDATA[Intel Software Partner Program]]></category>
		<category><![CDATA[James Reinders]]></category>
		<category><![CDATA[multi-core]]></category>
		<category><![CDATA[parallel programming]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[TBB]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/16/open-parallel-optimizing-web-performance-with-tbb/</guid>
		<description><![CDATA[Open Parallel is a research and development company that focuses on parallel programming and multicore development. We are a bunch of highly skilled geeks from various backgrounds that work together on problems in parallel programming and software development for multicore and manycore platforms. At LinuxConf (LCA2010) James Reinders gave a talk about the Threading Building [...]]]></description>
			<content:encoded><![CDATA[<p><strong><a href="www.openparallel.com">Open Parallel</a></strong> is a research and development company that focuses on parallel programming and multicore development. We are a bunch of highly skilled geeks from various backgrounds that work together on problems in parallel programming and software development for multicore and manycore platforms.</p>
<p>At LinuxConf (LCA2010) <strong>James Reinders</strong> gave a talk about the Threading Building Blocks (<a href="http://threadingbuildingblocks.org/">TBB</a>) library, a C++ threading library that sets out to make multicore programming more accessible to the average programmer. We took this idea on board and explored the possibilities of opening up this approach to an even wider audience, namely the audience of web application developers working in script languages.</p>
<p>Many websites require a non-trivial amount of per-request processing in the application layer, perhaps to retrieve, consolidate or otherwise manipulate data. Achieving better performance at this level improves response times and the overall user experience. Even when processing time at application level is not critical, parallelizing access to database and web service back-end layers can yield substantial improvements in perceived performance.</p>
<p>This drove our goal of adding TBB support into <strong>PHP</strong> and <strong>Perl</strong>, starting with <strong><a href="http://en.wikipedia.org/wiki/HipHop_%28software%29">HipHop</a></strong> as the PHP implementation of choice and later on adding Perl support to the game.</p>
<p>HipHop is a PHP to C++ cross compiler that was developed by Facebook to cut down on resource needs and speed up the execution times of their gigantic web infrastructure that was started on a classic PHP/MySQL stack and now has to scale to hundreds of millions of users. The HipHop project is a PHP implementation that is thread safe and already uses TBB for some memory management. We started by extending the existing support and added first only the new *parallel_for* function. Later, we added concurrent data structures and re-implemented our first approach.</p>
<p>What we have now is a robust implementation of *parallel_for* and *parallel_reduce* with the data structures needed to support them. What we learned on the way was both, very enlightening and quite frustrating at times. Our aim to make TBB more widely accessible was reached by getting the language extension into HipHop but we also tried to get it into Zend PHP. This turned out to only work with a language compatibility module that does not provide the full glory we can offer on the HipHop platform. The reason for this is the architecture of the PHP interpreter.</p>
<p>Implementing threading into language interpreters turns out to be very hard. There are two dormant/failed approaches in Perl and every attempt in PHP has failed so far. The core developers on both sides are very much in doubt if it is a path worth going down at all. The problem is global locking and copying/sharing of data structures that are thread local. Our Perl implementation is a starting point that could influence not only the Perl community but other interpreter designers and interpreter developers as well.</p>
<p>In the Perl community we are trying to lobby for a const keyword that would lock a data structure and remove the need to copy it into every thread. The ability to make something immutable is missing in Perl and PHP and this makes the startup cost of any worker thread very expensive. For the Perl library we wrote a lazy clone module that would only clone a data structure if the worker thread really accesses it. That way we only penalize the worker thread for accessing data - we can possibly get around cloning structures at all if they are not accessed within this task.</p>
<p>In our work with the PHP HipHop compiler we also wrote a patch set for WordPress and enhanced WordPress with our new *parallel_for* language extension. This trial brought us instant success in reduced page load times. The patch set for WordPress only replaced some key *foreach* loops with *parallel_for* and was our first real success with the TBB library in PHP. Based on that success we started out to re-implement our initial approach and tidy up our patch set for HipHop to make it more accessible to others.</p>
<p>The Perl project worked towards a Perl module that can be used to get access to TBB functions directly. We also started out to implement the core memory structures and then built on top of those the *parallel_for* functionality. The module we have now is stable enough to demonstrate the gains we can get by using TBB in Perl.</p>
<p>To round the project off we implemented two little tools as real world demo and as working code to look at. The demo is based around the HTML5 geo tag which is present in modern browsers and can be read with a Javascript API. In the HipHop version we use it to read the current Lat/Lon from the accessing browser and then parse the Twitter firehose to find tweets with embedded image URLs.</p>
<p>In the Perl demo we query Flickr and fetch a grid of 4x4 images, cache them locally and then render one big image out of scaled versions of the single images. The demos are running on <strong><a href="http://geopic.me">geopic.me</a></strong></p>
<p>To sum up our experience with TBB and script languages we know now that threading interpreters buries its very own set of challenges but we were able to get further than others did on the same mission by using TBB. The libraries we produced so far - which are open source and can be found on <strong>our <a href="https://github.com/openparallel/">github</a> account</strong> - will be further developed and maintained.</p>
<p>We will continue working on both platforms to expose the power of multicore CPUs to developers in an approachable way. Along the way we also produced a number of more detailed white papers covering various aspects of the project:</p>
<p>* <strong><a href="http://openparallel.com/2011/05/11/threading-perl-using-tbb-the-cpan-module-and-white-paper/">threads::tbb</a></strong><br />
* <strong><a href="http://openparallel.files.wordpress.com/2010/09/tbb-in-wordpress-oct-10.pdf">TBB in WordPress</a></strong><br />
* <strong><a href="http://openparallel.files.wordpress.com/2010/09/wordpress-on-hiphop-nov-10.pdf">WordPress on HipHop</a></strong></p>
<p>Get in touch if you are interested in these projects or have questions about the work we did. There is further information on our website <strong><a href="www.openparallel.com">OpenParallel.com</a></strong></p>
<p>Contact: <strong><a href="http://openparallel.com/contact-us/">Nicolas Erdody</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/16/open-parallel-optimizing-web-performance-with-tbb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AES Counter Mode details (Intel AES-NI implementation)</title>
		<link>http://software.intel.com/en-us/blogs/2011/11/11/aes-counter-mode-details-intel-aes-ni-implementation/</link>
		<comments>http://software.intel.com/en-us/blogs/2011/11/11/aes-counter-mode-details-intel-aes-ni-implementation/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 16:29:24 +0000</pubDate>
		<dc:creator>Nicolae Popovici (Intel)</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Performance and Optimization]]></category>
		<category><![CDATA[AES]]></category>
		<category><![CDATA[AES Counter Mode]]></category>
		<category><![CDATA[AES-NI]]></category>
		<category><![CDATA[Intel AES-NI library]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2011/11/11/aes-counter-mode-details-intel-aes-ni-implementation/</guid>
		<description><![CDATA[In this article we’ll take a closer look at AES counter (CTR) mode implementation from Intel® AES-NI library (it can be downloaded from http://software.intel.com/en-us/articles/download-the-intel-aesni-sample-library/). AES stands for Advanced Encryption Standard and it is a symmetric encryption standard. More detailed information about AES at http://de.wikipedia.org/wiki/Advanced_Encryption_Standard. AES-NI refers to Intel® Advanced Encryption Standard (AES) Instructions Set which [...]]]></description>
			<content:encoded><![CDATA[<p>In this article we’ll take a closer look at AES counter (CTR) mode implementation from Intel® AES-NI library (it can be downloaded from http://software.intel.com/en-us/articles/download-the-intel-aesni-sample-library/). </p>
<p>AES stands for Advanced Encryption Standard and it is a symmetric encryption standard. More detailed information about AES at http://de.wikipedia.org/wiki/Advanced_Encryption_Standard.</p>
<p>AES-NI refers to Intel® Advanced Encryption Standard (AES) Instructions Set which is comprised of 7 new instructions targeting different phases from the AES encryption/decryption standard. [More details can be found here (http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/).]</p>
<p>Block ciphers can be used to encrypt/decrypt a stream of data in a number of ways, which are called modes of operation. The details of the counter mode can be looked up in a number of places (e.g. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation), but at a high level it encrypts successive values of a “counter”, and then XORs the input data with the encrypted counter values.</p>
<p>The size of the counter is the same as the block size of the cipher, which in the case of AES is 128 bits. The definition of counter mode does not specify how the counter is “incremented”, as long as values do not repeat for a sufficiently long number of increments. Things such as an linear feedback shift register (LFSR) could be used for a counter, but in practice the increment function is almost always some form of addition by 1.</p>
<p>The exact definition of the increment function is therefore defined at a higher-level, and there is in general a different definition for each application. Typically, the counter consists of a fixed portion (typically the IV or Initialization Vector) and a variable portion. Only the variable portion changes during the increment operation, so if the variable portion is n-bits wide, then the counter will repeat after 2^n increments.</p>
<p>The implementation of counter mode in the Intel AES-NI sample library implements a Big-Endian 32-bit increment. That is, the most significant 32 bits of the counter are incremented by 1 (when viewed as a big-endian integer), and the remaining 96 bits are unchanged. This is the definition required by GCM (Galois Counter Mode).</p>
<p>In the following code excerpt of the iEnc192_CTR function from intel_aes_lib/asm/x64/iaesx64.s file the paddd SIMD instruction is used to implement the 32 bit Big-Endian increment function.<br />
 lp192encctrsingle:<br />
        movdqa xmm0,xmm5<br />
        pshufb  xmm0, xmm6 ; byte swap counter back<br />
        movdqa  xmm4,[rcx+0*16]<br />
        paddd   xmm5,[counter_add_one wrt rip]<br />
        add rdx, 16<br />
        pxor xmm0, xmm4<br />
        aesenc1_u [rcx+1*16]<br />
        ….</p>
<p>If some other increment function is desired (e.g. a Big-Endian 64-bit increment, or a Little-Endian increment), then there are two options:<br />
-	To modify the existing code to implement the different increment function (which will in general give the best performance) or<br />
-	To write a new function which implements desired increment function and use the AES-NI library functions iEnc128(), iEnc192(), or iEnc256() to encrypt the counter values. </p>
<p>The first option could be for example achieved by replacing paddd with paddq in the above code excerpt and therefore changing to a 64 bit Big-Endian increment function instead of the 32 bit one (to get the correct behavior for any input stream length paddd must be replaced in the load_and_inc4 macro as well).</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2011/11/11/aes-counter-mode-details-intel-aes-ni-implementation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

