<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Intel® Software Network (FR) &#187; ISN France</title>
	<atom:link href="http://software.intel.com/fr-fr/blogs/category/isn-france/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/fr-fr/blogs</link>
	<description></description>
	<lastBuildDate>Mon, 14 May 2012 06:49:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Create a Ubuntu 11.04 LiveUSB to use Intel® Parallel Studio XE</title>
		<link>http://software.intel.com/fr-fr/blogs/2012/05/14/create-a-ubuntu-1104-liveusb-to-use-intel-parallel-studio-xe/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2012/05/14/create-a-ubuntu-1104-liveusb-to-use-intel-parallel-studio-xe/#comments</comments>
		<pubDate>Mon, 14 May 2012 06:49:52 +0000</pubDate>
		<dc:creator>Xavier Hallade (Intel)</dc:creator>
				<category><![CDATA[Acceler8]]></category>
		<category><![CDATA[ISN France]]></category>
		<category><![CDATA[programmation parallèle]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2012/05/14/create-a-ubuntu-1104-liveusb-to-use-intel-parallel-studio-xe/</guid>
		<description><![CDATA[You need a license for Intel® Parallel Studio XE for Linux and and at least a 4GB USB Key. Get an ISO image of Ubuntu 11.04. Create a new Ubuntu 11.04 LiveUSB, with persistence mode enabled (you can specify a size of 1mo for the persistence file, you will overwrite it with a ~3Go file [...]]]></description>
			<content:encoded><![CDATA[<p>You need a <a href="https://registrationcenter.intel.com/RegCenter/AutoGen.aspx?ProductID=1538&amp;AccountID=&amp;EmailID=&amp;ProgramID=&amp;RequestDt=&amp;rm=EVAL&amp;lang=">license</a> for Intel® Parallel Studio XE for Linux and and at least a 4GB USB Key.</p>
<p>Get an <a href="http://releases.ubuntu.com/natty/ubuntu-11.04-desktop-amd64.iso">ISO image</a> of Ubuntu 11.04.</p>
<p>Create a new Ubuntu 11.04 LiveUSB, with persistence mode enabled (you can specify a size of 1mo for the persistence file, you will overwrite it with a ~3Go file in the next step).</p>
<p>To do that from Windows, you can use <a href="http://www.linuxliveusb.com/">LiLi</a> :<br />
<a href="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/screenshot-lili.png"><img src="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/screenshot-lili-180x300.png" alt="" width="180" height="300" class="aligncenter size-medium wp-image-675" /></a><br />
but any other tool like Unetbootin or USB Universal Installer is fine.</p>
<p>Download <a href="http://intel-software-academic-program.com/download/ubuntu/casper-rw.zip">casper-rw.zip</a> and unzip casper-rw to the root of your freshly built Ubuntu LiveUSB :<br />
<a href="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/screenshot-casper.png"><img src="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/screenshot-casper-300x155.png" alt="" width="300" height="155" class="aligncenter size-medium wp-image-674" /></a><br />
It is a persistence file that contains an installation of Parallel Studio XE 2011 SP1 Update1.</p>
<p>Create a new folder at the root of your key, named "intel-licenses", and put your .lic file inside it.</p>
<p>Now you are ready to boot on this LiveUSB and directly use Intel® tools to accelerate your code <img src='http://software.intel.com/fr-fr/blogs/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2012/05/14/create-a-ubuntu-1104-liveusb-to-use-intel-parallel-studio-xe/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Maximum Subarray Problem - Algorithmic Optimizations</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/11/28/the-maximum-subarray-problem-algorithmic-optimizations/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/11/28/the-maximum-subarray-problem-algorithmic-optimizations/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 08:55:10 +0000</pubDate>
		<dc:creator>candreolli</dc:creator>
				<category><![CDATA[Acceler8]]></category>
		<category><![CDATA[ISN France]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/11/28/the-maximum-subarray-problem-algorithmic-optimizations/</guid>
		<description><![CDATA[Acceler8 contest Acceler8 contest Andreolli Cédric - Garcia Pascal - Templé Arthur Date: October 15th 2011 - November 15th 2011 Abstract: This report explains the approach we used for resolving the ``Maximum Subarray Problem'' during the Intel Acceler8 contest. We are two students in fourth year and a teacher at INSA of Rennes. The idea [...]]]></description>
			<content:encoded><![CDATA[<p><TITLE>Acceler8 contest</TITLE></p>
<p><P><br />
<H1 ALIGN="CENTER">Acceler8 contest</H1><br />
<P ALIGN="CENTER"><STRONG><SPAN CLASS="textbf">Andreolli</SPAN> Cédric - <SPAN CLASS="textbf">Garcia</SPAN> Pascal - <SPAN CLASS="textbf">Templé</SPAN> Arthur</STRONG><br />
</P><br />
<BR><P ALIGN="CENTER"><B>Date:</B> October 15th 2011 - November 15th 2011</P></p>
<p><HR></p>
<p><H3>Abstract:</H3><br />
<DIV CLASS="ABSTRACT"><br />
This report explains the approach we used for resolving the ``Maximum Subarray Problem'' during the <SPAN CLASS="textit">Intel Acceler8</SPAN> contest. We are two students in fourth year and a teacher at <SPAN CLASS="textit">INSA</SPAN> of Rennes.<br />
The idea of the contest was to build an algorithm able to scale on computers with large number of cores. Here are the different steps of the development process. We hope you will enjoy reading it as much as we enjoyed the contest.<br />
</DIV><br />
<P></p>
<p><BR></p>
<p><H1><A NAME="SECTION00020000000000000000"><br />
Introduction</A><br />
</H1></p>
<p><H2><A NAME="SECTION00021000000000000000"><br />
The maximum subarray problem</A><br />
</H2><br />
The first step was, of course, to understand the problem. The <SPAN CLASS="textit">maximum subarray problem</SPAN> is a well known algorithmic problem. It consists of finding the rectangle (a submatrix) with the maximum area in a matrix of integers.</p>
<p><P><br />
A lot of documentation about this problem can be found on the <SPAN CLASS="textit">Internet</SPAN>. First, we started to study and test some algorithms we found such as the <SPAN CLASS="textit">Kadane</SPAN> algorithm. </p>
<p><P><br />
We choose to use <SPAN CLASS="textit">C++</SPAN> for solving the problem. <SPAN CLASS="textit">C++</SPAN> offers the advantage to be quite low level if you need it, but you also have access to higher level objects such as vectors, lists, etc. Besides, we are currently learning this language at <SPAN CLASS="textit">INSA</SPAN> so it was a good opportunity to practice.</p>
<p><P></p>
<p><H2><A NAME="SECTION00022000000000000000"><br />
OpenMP</A><br />
</H2><br />
For the parallelization part, we chose to use <SPAN CLASS="textit">OpenMP</SPAN>. We made this choice for two main reasons. </p>
<p><P><br />
First, the video tutorial was really easy to understand and went over a lot of functionalities really helpful for what we planned to do. Furthermore, the <SPAN CLASS="textit">MTL</SPAN> did not recquire a lot of specific settings to work with <SPAN CLASS="textit">OpenMP</SPAN> and we thought it was a good idea not to loose time on compilation problems. </p>
<p><P><br />
The second reason is that none of us ever used it and it is always interesting to discover new libraries. One of the main interest that offers <SPAN CLASS="textit">OpenMP</SPAN> is that it allows you to incrementally parallelized your code. With really few changes, you can improve the speed of a sequential program and this is one of the big interest we found in this library.</p>
<p><P><br />
Finally, <SPAN CLASS="textit">OpenMP</SPAN> offers the advantage of adding very few lines of code. For example, you do not have to use multiple semaphores or mutexes to protect your critical datas.</p>
<p><P></p>
<p><H2><A NAME="SECTION00023000000000000000"><br />
Different tasks</A><br />
</H2><br />
Once we finished to discover the possibilities of <SPAN CLASS="textit">OpenMP</SPAN>, we decided to split the project into some independent tasks.<br />
The next sections explain those different tasks. You will also find the program documentation in the <code>doc</code> directory.</p>
<p><P></p>
<p><H1><A NAME="SECTION00030000000000000000"><br />
Reading the files</A><br />
</H1></p>
<p><H2><A NAME="SECTION00031000000000000000"><br />
Problems encountered</A><br />
</H2><br />
As we decided to work with <SPAN CLASS="textit">C++</SPAN>, we started to create a really simple file reader. So we used the basic <SPAN CLASS="textit">STL</SPAN> operations at first and we proceeded as follows:</p>
<p><BR><br />
<IMG WIDTH="567" HEIGHT="153" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img2.png" ALT="\begin{lstlisting}<br />
std::ifstream file(fileName, std::ios::in);<br />
std::string line;...<br />
...ringstream::in);<br />
while(tmp&#187;num){<br />
//Parsing code here<br />
}<br />
}<br />
}<br />
\end{lstlisting}"><br />
<BR><br />
But actually, the line:<br />
<BR><br />
<IMG WIDTH="566" HEIGHT="25" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img3.png" ALT="\begin{lstlisting}<br />
while(tmp&#187;num){<br />
\end{lstlisting}"><br />
<BR><br />
had really bad performances. We then decided to write our own integer parser. After few tests, it was really faster. Then we started to think about the possibility to parallelize this step.</p>
<p><P><br />
We first took some time to see what was going on when we read a file and parsed it into a vector. As a matter of fact, we could see that the cores were absolutly not busy during this operation.<br />
As there were a lot of hard drive access, the cores were spending most of their time to wait until data arrived. It was not optimal but for us, it was not possible to parallelize the file reading operation because there is only one bus between the hard drive and the main memory.</p>
<p><P></p>
<p><H2><A NAME="SECTION00032000000000000000"><br />
The way we resolved it</A><br />
</H2><br />
Finally, after more tries, we realized that loading the whole file into main memory was quite fast. We call <code>buffer</code> this array of characters.<br />
We then imagined a trick to use parallelization to speed-up the parsing of the files.<br />
<BR><br />
Actually, once the file is in main memory, it is really fast to run throught it. So we decided to go two times through the whole file.<br />
The first time, to get the matrix dimensions, the second, to parse the file.</p>
<p><P></p>
<p><H3><A NAME="SECTION00032100000000000000"><br />
Getting the matrix dimensions</A><br />
</H3><br />
At this point, we wanted to spend the least time we can on this step but as the <code>buffer</code> is an array of characters, it is really difficult to parallelize the whole process.<br />
So we decided to sequentially read the first line (until a '\n' is found) and count the number of columns of the matrix thanks to the white spaces.<br />
As the input file must respect some specifications, we are sure that the number of columns of the matrix is equal to the number of spaces on a line plus one.<br />
Once this step is over, we just need to rush through the rest of the file to count the '\n'. This last step can easily be parallelized with<br />
<SPAN CLASS="textit">OpenMP</SPAN> and a <code>#pragma omp for</code> directive.</p>
<p><P><br />
During this process, we register the addresses of the new lines into a vector named <code>addressTab</code>.</p>
<p><P></p>
<p><H3><A NAME="SECTION00032200000000000000"><br />
Parsing the file</A><br />
</H3><br />
Once we get the matrix dimensions, we get a <code>vector</code> (<code>addressTab</code>) which contains the addresses of all new lines in <code>buffer</code>.<br />
We can now parallelize the file parsing. Depending on the number of cores we have, we split <code>buffer</code> into different parts based on the new lines addresses (see Figure 1).</p>
<p><P></p>
<p><DIV ALIGN="CENTER"><A NAME="fig:parsing"></A><A NAME="79"></A><br />
<TABLE><br />
<CAPTION ALIGN="BOTTOM"><STRONG>Figure 1:</STRONG><br />
The file parsing parallelization</CAPTION><br />
<TR><TD><br />
<DIV ALIGN="CENTER"><br />
<IMG WIDTH="436" HEIGHT="276" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/readfile1.png" ALT="Image readfile1"><br />
</DIV></TD></TR><br />
</TABLE><br />
</DIV></p>
<p><A NAME="file"></A>The goal of the process is to fill a two dimensions <code>vector</code>. We will call this <code>vector</code>: <code>matrix</code>. The elements in <code>addressTab</code> corresponds to the addresses of the beginning of each new lines in <code>buffer</code> and as a core is at least responsible of an entire line, it can put the parsed numbers to the correct position in <code>matrix</code>.<br />
<BR><br />
<P><br />
Our algorithm for solving the maximum subarray problem is faster if the matrix of <em>n</em> rows and <em>m</em> columns is such that <em>n</em> &le; <em>m</em>. So we have two different functions for generating an optimal matrix (<code>readLinesOrdered</code> and <code>readLinesReversed</code>).<br />
We did not factorize this part of the code because of optimization concerns. Indeed, this would have add a test condition for every numbers we had to put in <code>matrix</code>. On big files, this could have been an important waste of time.</p>
<p><P></p>
<p><H1><A NAME="SECTION00040000000000000000"><br />
The one dimension algorithm</A><br />
</H1><br />
We actually do not have a lot to tell here.<br />
We started to work on this algorithm and wrote some parallelized functions to do it, but as it is already a very fast sequential algorithm (<br />
<em>O(n)</em> complexity), the improvments were not significant. Actually, the time needed for solving the problem, was not significant compare to the time needed to read and parse the file.</p>
<p><P><br />
The two dimensions problem was hard enough, so we decided not to spend more time on this particular case.</p>
<p><P></p>
<p><H1><A NAME="SECTION00050000000000000000"><br />
The two dimensions algorithm</A><br />
</H1></p>
<p><H2><A NAME="SECTION00051000000000000000"><br />
The Kadane algorithm</A><br />
</H2><br />
As said before, the two dimensions maximum subarray problem is a well known problem and it is possible to find some documentations on the <SPAN CLASS="textit">Internet</SPAN>.<br />
We decided to work on the <SPAN CLASS="textit">Kadane</SPAN> algorithm which is quite simple to understand.<br />
We started to work on the parallelization process for this algorithm. The two dimensions <SPAN CLASS="textit">Kadane</SPAN> algorithm is a generalization of the one dimension case.<br />
It is based on three overlapped <code>for</code> loops.<br />
<BR><br />
<P><br />
As we chose to use <SPAN CLASS="textit">OpenMP</SPAN>, we had two main approaches. One was to use the <code>#pragma omp for</code> directive, the second one was to use the <code>#pragma omp task</code> one.<br />
We started with the easy solution (the <code>#pragma omp for</code>). After running some tests, we could see that the cores were not busy during all the process. That is the reason why we imagined a solution with the <code>#pragma omp task</code> directive.<br />
<BR><br />
<P><br />
The idea was to compute the number of tasks we wanted to create (let's call it <code>numberOfTasks</code>) and launch the tasks. Each task is responsible for computing the maximal sum (and the associated coordinates) for a part of the matrix.<br />
The first <code>for</code> loop in the <SPAN CLASS="textit">Kadane</SPAN> algorithm iterates on the rows of the matrix.<br />
The parallelization is done by dividing the loop in <code>numberOfTasks</code> tasks. Each thread assigned to a task realize a <code>for</code> loop but instead of incrementing by one, we increment by <code>numberOfTasks</code>.<br />
Here is the corresponding part of the code (where <em>n</em> is number of rows of the matrix):<br />
<BR><br />
<IMG WIDTH="566" HEIGHT="191" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img9.png" ALT="\begin{lstlisting}<br />
...<br />
for (unsigned int rowStart = taskNumber; rowStart &lt; n; ro...<br />
...<br />
...<br />
for (...) {<br />
...<br />
for(...) {<br />
...<br />
}<br />
...<br />
}<br />
...<br />
}<br />
\par<br />
\end{lstlisting}"><br />
<BR></p>
<p><P><br />
It is important to create more tasks than the number of cores. Indeed, each task does not have the same computation time and in this case, we would have to wait for the longest tasks at the end of the function. Increasing the number of tasks will reduce the time we have to wait because the tasks are shorter.</p>
<p><P></p>
<p><H2><A NAME="SECTION00052000000000000000"><br />
Our algorithm</A><br />
</H2></p>
<p><P><br />
After having parallelized the two dimensions <SPAN CLASS="textit">Kadane</SPAN> algorithm, we started to work on the algorithm itself.<br />
First we had the idea to create a new one-dimension array (called <code>maxSumStartingAtRow</code>) which contained for each index <code>i</code> an upper bound on the maximal sum you can obtained for a sub-matrix starting from i to the end of the original matrix.</p>
<p><P><br />
We used the <code>maxSumStartingAtRow</code> array to break the second <code>for</code> loop if the current maximal sum found in the current task was already bigger than the upper bound on the maximal sum (lines 9 to 12 in figure <A HREF="kadane">2</A>).</p>
<p><P></p>
<p><DIV ALIGN="CENTER"><A NAME="fig:kadane"></A><A NAME="102"></A><br />
<TABLE><br />
<CAPTION ALIGN="BOTTOM"><STRONG>Figure 2:</STRONG><br />
Part of our Kadane algorithm.</CAPTION><br />
<TR><TD><IMG WIDTH="583" HEIGHT="293" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img12.png" ALT="\begin{figure}\begin{lstlisting}[numbers=left, numberstyle=\footnotesize , stepn...<br />
...<br />
break;<br />
}<br />
...<br />
for(...){<br />
...<br />
}<br />
...<br />
}<br />
...<br />
}<br />
\end{lstlisting}<br />
\end{figure}"></TD></TR><br />
</TABLE><br />
</DIV></p>
<p><P><br />
For the generation of <code>maxSumStartingAtRow</code>, we first did a preprocessing operation which was parallelized. This operation started from the bottom of the <code>matrix</code>, and computed some one-dimension <SPAN CLASS="textit">Kadane</SPAN> and added the value from the bottom to the top of <code>maxSumStartingAtRow</code> as illustrated in figure <A HREF="fill">3</A>. </p>
<p><P></p>
<p><DIV ALIGN="CENTER"><A NAME="fig:fill"></A><A NAME="111"></A><br />
<TABLE><br />
<CAPTION ALIGN="BOTTOM"><STRONG>Figure 3:</STRONG><br />
The maxSumStartingAtRow generation</CAPTION><br />
<TR><TD><br />
<DIV ALIGN="CENTER"><br />
<IMG WIDTH="382" HEIGHT="237" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/ssum.png" ALT="Image ssum"><br />
</DIV></TD></TR><br />
</TABLE><br />
</DIV></p>
<p><P><br />
With this solution, the problem was that <code>maxSumStartingAtRow</code> was a really bad estimation of the real maximum sum starting at a row. This is quite easy to understand with the example in figure <A HREF="bad">4</A>. </p>
<p><DIV ALIGN="CENTER"><A NAME="fig:bad"></A><A NAME="119"></A><br />
<TABLE><br />
<CAPTION ALIGN="BOTTOM"><STRONG>Figure 4:</STRONG><br />
The problem with maxSumStartingAtRow</CAPTION><br />
<TR><TD><br />
<DIV ALIGN="CENTER"><br />
<IMG WIDTH="422" HEIGHT="237" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/ssum2.png" ALT="Image ssum2"><br />
</DIV></TD></TR><br />
</TABLE><br />
</DIV></p>
<p><P><br />
A solution we found was to compute at regular intervals some <SPAN CLASS="textit">Kadane</SPAN> in two dimensions. Indeed, on huge arrays, this last algorithm is way more accurate. The two dimensions algorithm was used to decrease the difference between the real and the computed values in <code>maxSumStartingAtRow</code>.</p>
<p><P><br />
The last problem we had was about the necessary time needed to compute this preprocessing operation. Even with this preprocessing, the solving time of the two dimensions algorithm was reduced, but on big arrays (<em>10000 x 10000</em>) the total time was only decreased by few seconds (due to a long preprocessing).<br />
<BR><br />
<P><br />
Finally, our last two dimensions algorithm does not use preprocessing. The trick is that the classical <SPAN CLASS="textit">Kadane</SPAN> two dimensions algorithm spends its time computing sums from a row to an other. This allows us to use the solve part of our algorithm to fill the <code>maxSumStartingAtRow</code> (the following piece of code is placed on line 19 in figure <A HREF="kadane">2</A>):</p>
<p><P><br />
<BR><br />
<IMG WIDTH="568" HEIGHT="102" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img15.png" ALT="\begin{lstlisting}<br />
...<br />
if (!pruningOccured) {<br />
..."><br />
<BR></p>
<p><P><br />
The <code>maxSumStartingAtRow</code> <code>vector</code> is initialized with the next line:<br />
<BR><br />
<IMG WIDTH="566" HEIGHT="25" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img16.png" ALT="\begin{lstlisting}<br />
for (int i = 0; i &lt; n; ++i) maxSumStartingAtRow[i] = LONG_MAX&#187;2;<br />
\end{lstlisting}"><br />
<BR><br />
As we use the <code>maxSumStartingAtRow</code> in an addition, we want to avoid overflow. This is the reason why we divide by 4 the <code>LONG_MAX</code> value.<br />
<BR><br />
<P><br />
Finally, we add a variable <code>bestSoFar</code> shared by all threads to indicate the best value found so far by all achieved tasks. This value is used to initialized the <code>sum</code> variable (line 2 in figure <A HREF="kadane">2</A>. We replace <code>sum = 0</code> by <code>sum = bestSoFar</code>) to cut part of the matrix based on the best values found on other tasks.<br />
<BR><br />
<P><br />
Note that having more tasks than available cores is important for our pruning method too. Because threads can communicate more often partial results to others and in doing so they help each other to prune some part of the computation.<br />
<BR><br />
<P><br />
To conclude, the complexity of our <SPAN CLASS="textit">Kadane</SPAN> algorithm is still in <em>O(n² x m)</em>, but due to the cut we use in the second for loop, most of the time, we can improve the speed of the resolution.</p>
<p><P></p>
<p><H1><A NAME="SECTION00060000000000000000"><br />
The final algorithm</A><br />
</H1></p>
<p><P><br />
The final part of the algorithm is realy simple, the method is in the <code>MaxSubArrayPb</code> class <code>static void computeMaxSubArray(char* fileName)</code>.<br />
It only uses the different functions we wrote. Here are the different tasks executed by the algorithm:<br />
<DL><br />
<DT><STRONG>Load the file: </STRONG></DT><br />
<DD>This just load the file into main memory.</p>
<p></DD><br />
<DT><STRONG>Get the matrix size: </STRONG></DT><br />
<DD>Here, the only goal is to get the dimensions of the input matrix and to fill a vector with the addresses of all the lines. </p>
<p></DD><br />
<DT><STRONG>Find the good orientation: </STRONG></DT><br />
<DD>As explained before, our algorithm is way more efficient with some particular arrangments of the initial matrix. </p>
<p></DD><br />
<DT><STRONG>Generate the vector : </STRONG></DT><br />
<DD>This operation turn the input file into a <SPAN CLASS="textit">C++</SPAN> two dimensions vector.</p>
<p></DD><br />
<DT><STRONG>Launch the good algorithm : </STRONG></DT><br />
<DD>Regarding the number of rows of the input vector, we choose to launch the one dimension <SPAN CLASS="textit">Kadane</SPAN><br />
algorithm or our two dimensions algorithm.</p>
<p></DD><br />
<DT><STRONG>Reverse the result if necessary : </STRONG></DT><br />
<DD>If the third step reversed the matrix, we need to rotate the result to have the good output coordinates.</p>
<p></DD><br />
<DT><STRONG>Print the result : </STRONG></DT><br />
<DD>Probably no need of explanation here <img src='http://software.intel.com/fr-fr/blogs/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .<br />
</DD><br />
</DL><br />
The main function, in the <code>main.cpp</code> file, only calls the <code>computeMaxSubArray</code> method on each files passed as parameters.<br />
It also defines the number of threads the algorithms have to use.<br />
As it is really short, here is the code :<br />
<BR><br />
<IMG WIDTH="567" HEIGHT="153" ALIGN="BOTTOM" BORDER="0" SRC="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/img20.png" ALT="\begin{lstlisting}<br />
int main(int argc, char* argv[]){<br />
if(argc &lt; 3){<br />
cout&#171;''Par...<br />
...){<br />
MaxSubArrayPb::computeMaxSubArray(argv[i]);<br />
}<br />
return 0;<br />
}<br />
\end{lstlisting}"><br />
<BR><br />
<H1><A NAME="SECTION00070000000000000000"><br />
Conclusion</A><br />
</H1><br />
This constest was really interesting in a lot of different aspects. First of all, it involved team work between teacher and students wich was really rewarding.<br />
We all learned a lot of thing on a topic we didn't know well.<br />
As computers have more more and more cores, this kind of computation is probably going to become a very important issue in the futur application devloppment.<br />
This contest was the occasion to discover existing technologies. It was also the occasion to pratice on a 40 cores computer, a thing that is not possible every day.<br />
The topic of the contest, the maximum subarray problem, was an interesting problem to try to parallelize.<br />
It was quite simple to understand and it allowed us to use multiple ways to parallelize our program.</p>
<p><P><br />
The available resources, put at our disposal by <SPAN CLASS="textit">Intel</SPAN> were adapted to beginners in the parallel computing learning. We enjoyed learning by watching the video tutorial.</p>
<p><P><br />
To conclude, it was a real rich experience and we want to thank <SPAN CLASS="textit">Intel</SPAN> for the organization of this contest.</p>
<p><P><br />
Finally, you can download our full packages. </p>
<p>The first one, MTL_package.zip contains the files we sent for the contest. The makefile is adapted for the <em>MTL</em>.<br />
The second one, Normal_package.zip, should run on your personnal computer. You just need to have g++ 4.5.1 or later installed on your PC.</p>
<p><a href='http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/MTL_package.zip'>MTL_package.zip</a><br />
<a href='http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/Normal_package.zip'>Normal_package.zip</a></p>
<p>In both zip files, you will find the same explanations in the detailled_explannations.pdf file. You will also have access to the doxygen documentation.</p>
<p>We hope you enjoyed reading this article.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/11/28/the-maximum-subarray-problem-algorithmic-optimizations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Subarray Problem - A static NUMA-Aware approach</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/11/24/subarray-problem-a-static-numa-aware-approach/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/11/24/subarray-problem-a-static-numa-aware-approach/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 12:19:44 +0000</pubDate>
		<dc:creator>krahnack</dc:creator>
				<category><![CDATA[Acceler8]]></category>
		<category><![CDATA[ISN France]]></category>
		<category><![CDATA[programmation parallèle]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/11/24/subarray-problem-a-static-numa-aware-approach/</guid>
		<description><![CDATA[The subarray problem on a n*m matrix is sequentially solved using an algorithm known as the Kadane 2D algorithm. This algorithm has a O(n²m) complexity. The sequential algorithm is written using 3 loops : for i in (0..n) // &#60;- We parallelize that for j in (i..n) for k in (0..m) //do work with matrix[j][k] [...]]]></description>
			<content:encoded><![CDATA[
<div>The subarray problem on a n*m matrix is sequentially solved using an algorithm known as the Kadane 2D algorithm. This algorithm has a O(n²m) complexity. The sequential algorithm is written using 3 loops :
      </div>
<pre>
         for i in (0..n)   // &lt;- We parallelize that
		 for j in (i..n)
			 for k in (0..m)
			    //do work with matrix[j][k]
      </pre>
<div>Our solution does not try to optimize the work performed inside the inner loop, so we skip the details of what is actually done inside. We chose to parallelize only the outer loop (index <b>i</b>).</div>
<div>In order to parallelize the outer loop on K cores, we chose to split it into K tasks of equal duration. This approach has several advantages :</p>
<ul>
<li>The algorithm is very simple : there is no need to steal work or do complex load balancing between the K cores.</li>
<li>Each thread works on big continuous portions of the matrix, which maximizes cache usage.</li>
<li>We know in advance what the threads are going to do and which data are going to be accessed so we can do smart NUMA optimizations.</li>
</ul></div>
<div>In this article, we explain: how we achieved to split the work into K equal tasks and how we optimized the treatment of these tasks.</div>
<h2>1-Creating K tasks of equal duration</h2>
<table>
<tr>
<td>
			 <img src="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/splitting.png" style="padding:15px"></img><br />
			 <label><b>Fig. 1</b> - <i>K=4 equal areas in a triangle</i></label>
		</td>
<td style="padding-left:30px">
<div>In order to split a <tt>for i (0..n)</tt> loop into K tasks, one often create K tasks <tt>[i=0..n/K],[i=n/K..2*n/K]...[i=(K-1)*n/K,K]</tt>. However, this simple solution does not work well in our case because the second loop (index <b>j</b>) starts at index <b>i</b>. This means that when <tt>i==0</tt>, <tt>n</tt> iterations are done in the second loop and when <tt>i==n-1</tt> only <tt>1</tt> iteration is done in the second loop! The amount of work depending on <b>i</b> is represented in Figure 1. This figure represents an example of the work to be done on a 250*m matrix. When <tt>i==0</tt>, 250 iterations are done; when <tt>i==250</tt>, only 1 iteration is done. The total quantity of work to be done is equal to the area of the triangle.</div>
<div>Splitting the work into K equal tasks is equivalent to creating K equal areas inside the above mentioned triangle.</div>
<div>For example, in Figure 1, representing the work to be done on a 250*m matrix, a close-to-optimal partionning is the following :</p>
<ul>
<li>Thread 0 doing i (0-34) = 7939 <b>j</b> iterations (area A1)</li>
<li>Thread 1 doing i (34-74) = 7875 <b>j</b> iterations (area A2)</li>
<li>Thread 2 doing i (74-125) = 7860 <b>j</b> iterations (area A3)</li>
<li>Thread 3 doing i (125-250) = 7701 <b>j</b> iterations (area A4)</li>
</ul>
<p>			With this partionning, there is at most a 3% difference in the number of iterations performed by each thread.
		     </p></div>
</td>
</tr>
</table>
<div>
	In order to find the last index that a thread <b>idx</b> should process (e.g., 34 for thread 0 in the above example), we use the following formula:</p>
<pre>
    int last_index = 0;
    do {
	    last_index++;
    } while((last_index)*(n) - (last_index+1)*(last_index)/2 &lt; (idx+1) * n * (n - 1) / 2 / K);
	 </pre>
<p>Where <tt>n</tt> in the number of lines of the matrix, <tt>idx</tt> is the thread number and <tt>K</tt> the number of threads.
      </div>
<div>
	This loop increments <tt>last_index</tt> until the amount of work done between <tt>i=0</tt> and <tt>i=last_index</tt> is equal to <tt>idx*(total-work-to-be-done/number-of-workers)</tt>. The calculation of "the amount of work done" is the calculation of the area of a trapeze. (E.g., on figure 1 the area A1, the work done by thread 0, represents the area of a trapeze.)
    </div>
<div>Actually this could also be calculated with the following formula:</p>
<pre>
    last_index = 2*n - (&radic;<span style="text-decoration:overline">(4*n*n-4*n+1)*K*K+((-4*<b>idx</b>-4)*n*n+(4*<b>idx</b>+4)*n)*K</span>+(2*n-1)*K)</span>/(2*K);
	</pre>
<p>	... but is is actually slower than doing the loop! (We think that the compiler is doing really smart things and that the loop is actually optimized and transformed into a much more efficient formula.)</p>
<h2>2-NUMA optimizations</h2>
<div>As mentioned earlier, we also do NUMA optimizations. <img src='http://software.intel.com/fr-fr/blogs/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  In order to improve the locality of the memory accessed by the threads, we have:</p>
<ul>
<li>Created a thread pool per NUMA node in the system. Each thread pool is totally independent from the others. Each thread pool is controlled by a master thread scheduled on the same NUMA node as the pool it controls.</li>
<li>The creation of the K tasks is done in parallel by each master thread (actually each thread creates K/4 tasks since there is 4 NUMA nodes on the MTL).</li>
<li>Before giving the tasks to its workers, each master thread <b>duplicates the matrix on the local NUMA node</b>. This ensures that, when the matrix does not fit in cache, the worker threads fetch data from their local memory. This optimization actually give a <b>+25%</b> performance boost at 40 cores. Lessons learned: pay attention to the data locality. <img src='http://software.intel.com/fr-fr/blogs/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </li>
<li>(Note for those who might think that it is an incredible waste of memory: a 10K*10K matrix occupies 380MB in RAM. The MTL machines has 64GB. So one copy per node = a "waste" of 1.5GB = 2.3% of the memory of the machine = really negligible compared to the gain.)</li>
</ul></div>
<h2>3-Other performance optimizations</h2>
<div>
<ul>
<li>Our approach falls back on the sequential algorithm when the parallel algorithm is considered too costly. (E.g. the cost of duplicating the matrix and managing the thread pool cannot be amortized.)</li>
<li>Since the subarray algorithm is of O(n²m) complexity, it is sometimes worth to transpose the matrix before any computation, in order to have n&lt;m. Experiments showed that transposing becomes worthy as soon as the difference in complexity is above 5K operations.</li>
<li>Both reading and transposing the matrix are done in parallel using our thread pool. The input file is mapped in memory and each reader thread is responsible to parse 800Ko of the input file and creates a partial matrix corresponding to what it has read. All submatrices are then merged using a simple memcpy operation.</li>
</ul></div>
<h2>4-Figure for nerds</h2>
<div>Time to present some results!</div>
<div>
	      <img src="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/speedup.png" style="padding-bottom:15px"></img><br /><b>Fig 2</b> - <i>Speedup of our algorithm on a 10K*10K matrix</i><br />
              The algorithm has an near optimal speedup between 10 and 40 cores (x3.94) and between 1 and 40 cores (x36.8). This means that, according to Amdhal's law, more than 99.77% of our code is parallel. For those interested, it takes 5.9s at 40 cores to parse a 10K*10K matrix.<br />
	      We think that the speedup seen by the Intel team might have been a little lower due to our static partitionning of data: on the final test 2 cores were fully loaded, which means that our partionning was no longer optimal. Nevertheless our solution seems to have behaved quite nicely even when (intuitively) load balancing could have been required.
     </div>
<p></p>
<h2>5-Code</h2>
<div>Finally, here's a link to <a href='http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/solution.zip'>our code</a></div>
<p></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/11/24/subarray-problem-a-static-numa-aware-approach/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Acceler8 est fini, quelle expérience !</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/08/01/acceler8-est-fini-quelle-exprience/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/08/01/acceler8-est-fini-quelle-exprience/#comments</comments>
		<pubDate>Mon, 01 Aug 2011 08:30:47 +0000</pubDate>
		<dc:creator>farcellier</dc:creator>
				<category><![CDATA[Acceler8]]></category>
		<category><![CDATA[ISN France]]></category>
		<category><![CDATA[programmation parallèle]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/08/01/acceler8-est-fini-quelle-exprience/</guid>
		<description><![CDATA[Mardi matin, quelle surprise agréable de réceptionner les récompenses du concours acceler8. Ils venaient d'être expédiés la veille. Après 2 mois de travail intensif, c'est donc une page qui se tourne. Le concours acceler8 est bien fini. Ce fut un évènement intense et enrichissant. Nous ne pensions pas quand nous nous sommes lancés dans l'aventure [...]]]></description>
			<content:encoded><![CDATA[<p>Mardi matin, quelle surprise agréable de réceptionner les récompenses du concours acceler8.<br />
Ils venaient d'être expédiés la veille.</p>
<p><img alt="" src="https://lh3.googleusercontent.com/-fHuvZokFgLg/Ti6l9zoCP9I/AAAAAAAAAHI/79t5jEV_z2c/2011-07-26+10.37.40.jpg" class="aligncenter" width="640" height="480" /></p>
<p>Après 2 mois de travail intensif, c'est donc une page qui se tourne. Le concours acceler8 est bien fini. Ce fut un évènement intense et enrichissant. Nous ne pensions pas quand nous nous sommes lancés dans l'aventure que celle-ci nous mènerait si loin.</p>
<p>Le parallélisme est aujourd'hui sur toutes les lèvres. Cependant, nous nous attendions pas à découvrir un univers aussi<br />
riche. Le <a href="http://software.intel.com/fr-fr/articles/acceler8_recherche_nombres_premiers_particuliers_solution/">premier problème</a> sous son apparente simplicité s'est révélé bien plus corsé et pimenté que nous ne l'attendions. Jusqu'à la dernière demi heure, nous n'avons cessé d'y réfléchir et de chercher à améliorer le temps d'exécution de notre programme.</p>
<p>Le <a href="http://software.intel.com/fr-fr/articles/acceler8_recherche_nombres_premiers_particuliers_solution/">second</a>, plus difficile, nous a fait transpiré plus d'une fois. L'expérience du premier s'est révélé formatrice et c'est après un travail de longues haleines que nous sommes parvenus à fournir un programme efficace.</p>
<p>Ces 2 netbooks ne sont pas seulement une récompense. C'est un rappel des efforts que nous avons fourni pour s'améliorer en permanence. C'est aussi un rappel des efforts qu'ils nous restent encore à fournir pour nous améliorer.</p>
<p><img src="https://lh3.googleusercontent.com/-shvjwprfobY/Ti6l9zI2ntI/AAAAAAAAAHM/p0DhyNFnuKA/s640/2011-07-26+12.34.16.jpg" alt="2eme Eeepc du concours acceler8" /></p>
<p>En nous conviant à ce voyage sur le chemin du parallélisme, Intel nous a permis de faire un bout de chemin dans ce domaine. Tout au long de ce défi, ils nous ont guidé sur cette voie. Le partage que ce soit avec les organisateurs ou avec les autres concurrents a rendu cette expérience unique.</p>
<p>J'espère que d'autres défis sur des domaines aussi pointus seront organisés avec la même passion et la même volonté de permettre à<br />
des étudiants de découvrir des domaines parfois laissés en marge des programmes scolaires.</p>
<p>Fabien</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/08/01/acceler8-est-fini-quelle-exprience/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>La description des nouvelles instructions de Haswell</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/06/20/la-description-des-nouvelles-instructions-de-haswell/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/06/20/la-description-des-nouvelles-instructions-de-haswell/#comments</comments>
		<pubDate>Mon, 20 Jun 2011 13:00:10 +0000</pubDate>
		<dc:creator>AnthonyC</dc:creator>
				<category><![CDATA[ISN France]]></category>
		<category><![CDATA[programmation parallèle]]></category>
		<category><![CDATA[AVX]]></category>
		<category><![CDATA[Haswell]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/06/20/la-description-des-nouvelles-instructions-de-haswell/</guid>
		<description><![CDATA[Intel vient de rendre publique les détails sur la prochaine génération des architectures X86. Arrivant en premier dans nos microarchitectures Intel en 2013 sous le nom de code "Haswell", les nouvelles instructions accèlerent une large catégorie d'applications et de modèles d'usage. Téléchargez la référence de programmation Intel® Advanced Vector Extensions Programming (319433-011). Cette build viendra [...]]]></description>
			<content:encoded><![CDATA[<p>Intel vient de rendre publique les détails sur la prochaine génération des architectures X86. Arrivant en premier dans nos microarchitectures Intel en 2013 sous le nom de code "Haswell", les nouvelles instructions accèlerent une large catégorie d'applications et de modèles d'usage. <a href="http://software.intel.com/file/36945">Téléchargez la référence de programmation  Intel® Advanced Vector Extensions Programming (319433-011)</a>.</p>
<p>Cette build viendra sur les instrutions dans l'architecture Ivy Bridge, incluant le générateur digital de nombre aléatoire, les accelerateurs float16, et étend les extensions de l'Intel Advanced Vector ( Intel AVX) qu'Intel a lancé en 2011. </p>
<p>Ces instructions pourront se partager dans les deux catégories suivantes :</p>
<p><strong>AVX2 -Les types de donnés Entier étendus à 256-bit SIMD</strong>. Le support des entiers AVX2’s est particulièrement utile pour traiter les données visuelles généralement rencontrées dans l'imagerie et dans le traitement de vidéos pour les particulier.  Avec Haswell, nous avons l'Intel® Advanced Vector Extensions (Intel® AVX) pour virgules flottantes, et aussi AVX2 pour les type de données Entier.<br />
<img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/06/img1.png" alt="instructions" /></p>
<p>Les instructions de manipulations de Bit sont très utiles pour les base de données compressées, le hashing, l'arithmetic des grands nombres, mais aussi une large varieté de code à utilisation plutôt générale.<br />
<img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/06/img2.png" alt="fze" /></p>
<p>Regroupement des codes de vectorisation avec des éléments de données non-adjacents.<br />
Les rassemblements de Haswell sont masqués pour la sécurité ( comme les charges conditionnelles et les stockages introduits dans Intel AVX ) , ce qui favorise leur utilisation dans des codes avec coupures ou d'autres conditionnelles.<br />
<img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/06/img3.png" alt="vector" /><br />
<strong>Any-to-Any permutes</strong> – Des opérations de Shuffling incroyablement utiles. Haswell ajoute le support pour la granularité  DWORD et QWORD dans un registre entier de  256-bit.<br />
<img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/06/img4.png" alt="img" /> </p>
<p><strong>Vector-Vector Shifts</strong>:Nous avons ajouté des déplacements avec le contrôle du déplacement vectoriel. Ils sont critiques dans la vectorisation de boucle avec déplacement variables.<br />
<img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/06/img5.png" alt="o" /></p>
<p><strong>Floating Point Multiply Accumulate</strong> – Cette fonction  améliore de manière significative les pics de flops et fournit une précision accrue pour l'amélioration de l'usage des mathématiques transcendantes. Elles sont très largement utilisées dans le monde du calcul intensif, dans l'imagerie professionnelle, mais aussi dans la reconnaissance faciale. Ils opèrent sur des , 128-bit packed simple et double précision de type scalar, mais aussi sur des 256-bit packed single et double précision du même  type. [Ces instructions ont été décrites précedemment, dans les spécifications initiales de l'Intel AVX.].<br />
<img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2011/06/img6.png" alt="avx" /></p>
<p>Les inscructions vectorisées basées sur l'état de registre étendue ( 256-bit) ajouté dans Intel AVX sont supportées par tous les systèmes qui supportent l'Intel AVX.<br />
Pour les développeurs, il est intéressant de noter que les instructions couvrent plusieurs feuilles CPUID. Vous devriez être prudent de vérifier toutes les bits applicables avant d'utiliser ces instructions.<br />
<a href="http://software.intel.com/file/36945">Veuillez lire les spécifications </a>et restez en ligne pour les outils de support pendant les prochains mois.. </p>
<p>Mark Buxton<br />
Software Engineer<br />
Intel Corporation</p>
<p>( Ceci est la version traduite de l'article originale qui se trouve ici : http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/ ) </p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/06/20/la-description-des-nouvelles-instructions-de-haswell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Devenez l&#039;ambassadeur de la marque Intel</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/05/23/devenez-lambassadeur-de-la-marque-intel/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/05/23/devenez-lambassadeur-de-la-marque-intel/#comments</comments>
		<pubDate>Mon, 23 May 2011 09:46:41 +0000</pubDate>
		<dc:creator>Anthony Charbonnier (Intel)</dc:creator>
				<category><![CDATA[Gaming]]></category>
		<category><![CDATA[ISN France]]></category>
		<category><![CDATA[job]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/05/23/devenez-lambassadeur-de-la-marque-intel/</guid>
		<description><![CDATA[Devenez l’ambassadrice ou l’ambassadeur de la marque Intel ! Nous recherchons plusieurs personnes pour l’animation du tout nouveau showroom Intel, qui ouvrira prochainement au sein d’un grand magasin parisien. Vous aimez les innovations et êtes passioné(e) de nouvelles technologies ? Alors inscrivez-vous au casting en envoyant CV et photos à chloe.janvier@pwpassociates.com. Nous vous répondrons dans [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Devenez l’ambassadrice ou l’ambassadeur de la marque Intel ! </strong></p>
<p>Nous recherchons plusieurs personnes pour l’animation du tout nouveau showroom Intel, qui ouvrira prochainement au sein d’un grand magasin parisien.</p>
<p>Vous aimez les innovations et êtes passioné(e) de nouvelles technologies ?</p>
<p>Alors inscrivez-vous au casting en envoyant CV et photos à chloe.janvier@pwpassociates.com. Nous vous répondrons dans les plus brefs délais ! </p>
<p>Voir plus d"infos <a href="http://www.pwpassociates.com/intel/page02.php">ici</a></p>
<p><a href="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/intel-e1306143833149.png"><img src="http://software.intel.com/fr-fr/blogs/wordpress/wp-content/uploads/intel-e1306143833149.png" alt="" title="intel" width="200" height="132" class="aligncenter size-full wp-image-213" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/05/23/devenez-lambassadeur-de-la-marque-intel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meilleures questions autour de la programmation parallèle</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/04/19/meilleures-questions-autour-de-la-programmation-parallle/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/04/19/meilleures-questions-autour-de-la-programmation-parallle/#comments</comments>
		<pubDate>Tue, 19 Apr 2011 11:18:25 +0000</pubDate>
		<dc:creator>Anthony Charbonnier (Intel)</dc:creator>
				<category><![CDATA[Acceler8]]></category>
		<category><![CDATA[ISN France]]></category>
		<category><![CDATA[programmation parallèle]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/04/19/meilleures-questions-autour-de-la-programmation-parallle/</guid>
		<description><![CDATA[Bonjour à vous Grâce au concours de programmation parallèle, de nombreux threads très intéressants sont apparus sur le forum. Je vais m'efforcer de vous les lister pour les trouver plus rapidement . VinceRev écrit son avis et ses méthodes sur l'accès au cache et l'architecture des processeurs. voir le forum accès au cache et architecture [...]]]></description>
			<content:encoded><![CDATA[<p>Bonjour à vous </p>
<p>Grâce au concours de programmation parallèle, de nombreux threads très intéressants sont apparus sur le forum.<br />
Je vais m'efforcer de vous les lister pour les trouver plus rapidement .</p>
<p><em>VinceRev</em> écrit son avis et ses méthodes sur l'<strong>accès au cache et l'architecture des processeurs</strong>.<br />
voir le forum <a href="http://software.intel.com/fr-fr/forums/showthread.php?t=82178&amp;o=a&amp;s=lr">accès au cache et architecture des processeurs </a> </p>
<p><em>Shaolan</em> a ouvert un thread sur la <a href="http://software.intel.com/fr-fr/forums/showthread.php?t=82048&amp;o=a&amp;s=lr">découverte d'OpenMP</a> , très intéressant à lire.</p>
<p>Encore <em>Shaolan</em>, pour cette fois une réflexion sur l'<a href="http://software.intel.com/fr-fr/forums/showthread.php?t=82117&amp;o=a&amp;s=lr">optimisation de la programmation<br />
</a></p>
<p>Et enfin, <em>VinceRev</em> nous offre un très bel article sur le temps d'<a href="http://software.intel.com/fr-fr/articles/timer-temps-execution-C/">execution des programmes au niveau du cache</a> .</p>
<p>Ce post sera mis à jour avec les meilleures contributions au fur et à mesure que nos utilisateurs garniront le forum ! </p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/04/19/meilleures-questions-autour-de-la-programmation-parallle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyseur de performances Intel</title>
		<link>http://software.intel.com/fr-fr/blogs/2011/01/07/analyseur-de-performances-intel/</link>
		<comments>http://software.intel.com/fr-fr/blogs/2011/01/07/analyseur-de-performances-intel/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 15:33:52 +0000</pubDate>
		<dc:creator>AnthonyC</dc:creator>
				<category><![CDATA[ISN France]]></category>
		<category><![CDATA[France]]></category>
		<category><![CDATA[GPA]]></category>
		<category><![CDATA[visual computing]]></category>

		<guid isPermaLink="false">http://software.intel.com/fr-fr/blogs/2011/01/07/analyseur-de-performances-intel/</guid>
		<description><![CDATA[Gagnez des nouveaux cients et préparez vos logiciels pour les ordinateurs portables de plus en plus utilisés par les joueurs nomades. Les suites de logiciel d'Analyseurs de Performance Graphique Intel® (Graphics Performance Analyzers/ Intel® GPA ) fournissent des anlyses en profondeur pour vous aider à optimiser au mieux vos jeux basés sur les API Microsoft [...]]]></description>
			<content:encoded><![CDATA[<p>Gagnez des nouveaux cients et préparez vos logiciels pour les   ordinateurs portables de plus en plus utilisés par les joueurs nomades.   Les suites de logiciel d'Analyseurs de Performance Graphique Intel®   (Graphics Performance Analyzers/ Intel® GPA ) fournissent des anlyses en   profondeur pour vous aider à optimiser au mieux vos jeux basés sur les   API Microsoft DirectX* et sur les ordinateurs équipés des cartes  Intel®  HD Graphics. Utilisez les analyseurs graphiques de performance  pour  porter vos applications sur de nouveaux environnements ou encore  pour  développer des logiciels innovants pour le futur du jeu mobile. </p>
<ul>
<li><b>Support des Microarchitecture </b><b>Intel® Core™ et Intel® HD Graphics .</b> Utilisez un outil conçu spécifiquement pour analyser et optimiser les   jeux pour les dernières plateformes à savoir les processeurs Intel®   Core™ avec des  Intel®  HD Graphics </li>
<li><b>Analyse en profondeur et en temps réel.</b> Idenfitiez les facteurs limitants, essayez des solutions , et voyez les résultats en temps réel.</li>
<li><b>Profil de plateforme.</b> Visualisez les performances de vos jeux   sur les CPU et GPU pour vous assurer de tirer parti au maximum de la   puissance de calcul disponible.</li>
<li><b>Interfance intuitive.</b> Accèdez à une suite complète d'outils depuis une interfance très simple d'utilisation hautement configurable.</li>
<li><b>Librairies ouvertes et programmables.</b> Adaptez les outils à vos propres besoins.</li>
</ul>
<h1 class="sectionHeading">Ciblez de nouveaux clients</h1>
<p>Développez votre base de clients en vous assurant que vos applications   sont optimisés pour les millions de PCs équipés avec l'Intel HD   Graphics. Préparez vous pour un monde mobile croissant de manière   exponentiel en optimisant vos logiciels pour les ordinateurs basés sur   la technologie Intel. </p>
<h1 class="sectionHeading">Accèdez à une suite d'outils complète</h1>
<p>Les analyseurs de performances graphiques d'Intel proposent une suite   complète d'outils pour une analyse en prodonfeur des problèmes de   performances. Ces outils sont développés pour travailler à votre   manière, pour pouvoir utiliser immédiatement les outils proposés de la   meilleure des façon. </p>
<h1 class="sectionHeading">Téléchargez le brief complet de la suite<br /></h1>
<p><a href="http://software.intel.com/file/29368">Intel® Graphics Performance Analyzers: Game Development for a Mobile World</a> [PDF 204KB]</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/fr-fr/blogs/2011/01/07/analyseur-de-performances-intel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

