<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Fri, 10 Feb 2012 14:31:34 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/parallel/type/code/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles Feed</title>
    <link>http://software.intel.com/en-us/articles/parallel/type/code/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Threading Challenge 2011 - Winning Entries</title>
      <description><![CDATA[ <strong><span class="sectionHeading">Threading Challenge 2011 - Entries Submitted (Code and Write-up)<br /></span><br /><br /></strong>
<p>Below you will find the winning entries by problem for the Threading Challenge 2011.  Please feel free to review and join us in the <strong>forum</strong> dedicated to each problem to discuss.<br /><br /><strong><span >Master Level<br /></span><br /></strong><strong>Problem 1 - P1:M1 Masyu Puzzle<br /><br />1st Place:    john_e_lilley         <a href="http://software.intel.com/file/38801">Code</a>     <a href="http://software.intel.com/file/38802">Write-up<br /></a>2nd Place:   Diao Rui                 <a href="http://software.intel.com/file/38803">Code</a>     <a href="http://software.intel.com/file/38804">Write-up<br /></a>3rd Place:    akki                       <a href="http://software.intel.com/file/38805">Code</a>     <a href="http://software.intel.com/file/38806">Write-up<br /></a></strong><br />Comment on the <a href="http://software.intel.com/en-us/forums/threading-challenge-2011-p1-m1-masyu-puzzle/"><strong>Masyu Puzzle Dedicated Forum</strong></a></p>
<p><strong>Problem 2 - P1:M2 Tiling Rectangles<br /></strong><strong><br /></strong></p>
<p><strong>1st Place:    akki                       </strong><a href="http://software.intel.com/file/38816"><strong>Code</strong></a><strong>      </strong><a href="http://software.intel.com/file/38817"><strong>Write-up<br /></strong></a><strong>2nd Place:   denghui0815         </strong><a href="http://software.intel.com/file/38818"><strong>Code</strong></a><strong>      </strong><a href="http://software.intel.com/file/38819"><strong>Write-up<br /></strong></a><strong>3rd Place:    protocolocon        </strong><a href="http://software.intel.com/file/38820"><strong>Code</strong></a><strong>      </strong><a href="http://software.intel.com/file/38821"><strong>Write-up<br /></strong></a><br />Comment on<strong> </strong><a href="http://software.intel.com/en-us/forums/threading-challenge-2011-p1-m2-tiling-rectangles/"><strong>Tiling Rectangles Dedicated Forum<br /></strong></a><br /><strong>Problem 3 - P1:M3 Parallelized Parser and Formula Interpreter</strong></p>
<p><br /><strong>1st Place:    akki                       <a href="http://software.intel.com/file/38822">Code</a>     <a href="http://software.intel.com/file/38823">Write-up<br /></a>2nd Place:   john_e_lilley         <a href="http://software.intel.com/file/38824">Code</a>     <a href="http://software.intel.com/file/38825">Write-up<br /></a>3rd Place:    denghui0815         <a href="http://software.intel.com/file/38826">Code</a>     <a href="http://software.intel.com/file/38827">Write-up<br /></a><br /></strong></p>
<p> Comment on the <a href="http://software.intel.com/en-us/forums/threading-challenge-2011-p1-m3-parallelized-parser-formula-interpreter/"><strong>Parallelized Parser and Formula Interpreter Dedicated Forum<br /></strong></a><br /><strong><span >Apprentice Level<br /><br /></span>Problem 1 - P1:A1  Maze of Life<br /><br />1st Place:    VoVanx86             <a href="http://software.intel.com/file/38810">Code<br /></a>2nd Place:   krivyakin              <a href="http://software.intel.com/file/38811">Code<br /></a>3rd Place:    jmfernandez        <a href="http://software.intel.com/file/38812">Code<br /></a><br /></strong>Comment on <a href="http://software.intel.com/en-us/forums/threading-challenge-2011-p1-a1-maze-of-life/"><strong>Maze of Life Dedicated Forum<br /></strong></a><br /><strong>Problem 2 - P1:A2  Sums of Consecutive Primes<br /><br />1st Place:    dotcsw                 <a href="http://software.intel.com/file/38814">Code<br /></a>2nd Place:   VoVanx86            <a href="http://software.intel.com/file/38813">Code<br /></a>3rd Place:    jmfernandez       <a href="http://software.intel.com/file/38815">Code<br /></a><br /></strong>Comment on the <strong><a href="http://software.intel.com/en-us/forums/threading-challenge-2011-p1-a2-consecutive-primes/">Sums of Consecutive Primes Dedicated Forum<br /></a></strong><br /><strong>Problem 3 - P1:A3  Running Numbers<br /><br />1st Place:    dotcsw                <a href="http://software.intel.com/file/38987">Code<br /></a>2nd Place:   vdave                 <a href="http://software.intel.com/file/38989">Code<br /></a>3rd Place:    kolkir                 <a href="http://software.intel.com/file/38990">Code</a></strong><br /><br />Comment on the <a href="http://software.intel.com/en-us/forums/threading-challenge-2011-p1-a3-running-numbers/"><strong>Running Numbers Dedicated Forum</strong></a></p> ]]></description>
      <link>http://software.intel.com/en-us/articles/threading-challenge-2011-winning-entries/</link>
      <pubDate>Thu, 06 Oct 2011 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/threading-challenge-2011-winning-entries/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/threading-challenge-2011-winning-entries/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>River Trail: Parallel Web Applications</title>
      <description><![CDATA[ <link media="screen" href="http://software.intel.com/media/gamedev/css/3302_Intel_VC_01.css?v=11" type="text/css" rel="stylesheet" />
<link media="screen" href="http://software.intel.com/file/23729" type="text/css" rel="stylesheet" />
<table width="100" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top">
<div id="left_container">
<div id="header_content"></div>
<div id="left_content_container2">
<div id="showcase_01">
<div >
<p>In a world where the web browser is the user's window into computing, browser applications must leverage all available computing resources to provide the best possible user experience. Today web applications do not take full advantage of parallel client hardware due to the lack of appropriate programming models.</p>
<p>ParallelJS technology (code named River Trail) puts the parallel compute power of client's hardware into the hands of the web developer while staying within the safe and secure boundaries of the familiar JavaScript programming paradigm.</p>
<p>River Trail gently extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer. By leveraging multiple CPU cores and vector instructions, River Trail achieves significant, up to an order of magnitude speedup over sequential JavaScript.</p>
<br /><a alt="click to accept" href="http://software.intel.comjavascript:void(0)" onclick="ndownload('https://github.com/RiverTrail/RiverTrail/wiki')" title="click to accept"><img src="http://software.intel.com/file/25370" border="0" /></a>
<p> </p>
</div>
<div >
<p>
<object height="203" width="360" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" id="v_4436_2070" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000">
<param name="id" value="v_4436_2070" />
<param name="name" value="v_4436_2070" />
<param name="flashvars" value="file=http://software.intel.com/media/videos/2/9/6/4/7/2/c/IDF11_Labs_Parallel_Web_V1.mp4&amp;image=http://software.intel.com/media/videos/2/9/6/4/7/2/c/296472c9542ad4d4788d543508116cbc_player.jpg&amp;autostart=false&amp;bufferlength=5&amp;allowfullscreen=true&amp;plugins=http://software.intel.com/common/swf/listen&amp;title=Parallel+Web+Applications+at+IDF+2011" />
<param name="allowfullscreen" value="true" />
<param name="src" value="http://software.intel.com/common/swf/mediaplayer.swf" /><embed height="203" width="360" src="http://software.intel.com/common/swf/mediaplayer.swf" allowfullscreen="true" flashvars="file=http://software.intel.com/media/videos/2/9/6/4/7/2/c/IDF11_Labs_Parallel_Web_V1.mp4&amp;image=http://software.intel.com/media/videos/2/9/6/4/7/2/c/296472c9542ad4d4788d543508116cbc_player.jpg&amp;autostart=false&amp;bufferlength=5&amp;allowfullscreen=true&amp;plugins=http://software.intel.com/common/swf/listen&amp;title=Parallel+Web+Applications+at+IDF+2011" name="v_4436_2070" id="v_4436_2070" type="application/x-shockwave-flash"></embed>
</object>
</p>
<p ><a href="http://software.intel.com/en-us/videos/parallel-web-applications-at-idf-2011">Link to larger copy of the video </a><b>(click to view larger)</b> <br /><br /><b>Register: </b><a target="_blank" href="http://www.theregister.co.uk/2011/09/17/intel_parallel_javascript/">Intel extends JavaScript for parallel programming</a> <br /><b>My Broadband:</b> <a target="_blank" href="http://mybroadband.co.za/news/software/34032-multi-core-javascript-project-from-intel-now-downloadable.html">Multi-core JavaScript project from Intel now downloadable</a><br /><b>LAPTOP Magazine:</b> <a target="_blank" href="http://blog.laptopmag.com/parallel-js-promises-faster-more-fluid-browser-apps">Parallel JS Promises Faster, More Fluid Browser Apps</a><br /><b>InfoQ.com:</b> <a target="_blank" href="http://www.infoq.com/news/2011/09/javascript-parallel-processing">JavaScript Extension that Adds Parallel Processing Capabilities Unveiled by Intel</a><br /><b>iProgrammer:</b> <a target="_blank" href="http://www.i-programmer.info/news/167-javascript/3068-river-trail-intels-parallel-javascript.html">River Trail - Intel's parallel JavaScript</a></p>
</div>
<br clear="all" />
<p><img src="http://software.intel.com/file/38995" /></p>
</div>
</div>
</div>
</td>
</tr>
</tbody>
</table> ]]></description>
      <link>http://software.intel.com/en-us/articles/river-trail-parallel-web-applications/</link>
      <pubDate>Thu, 06 Oct 2011 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/river-trail-parallel-web-applications/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/river-trail-parallel-web-applications/</guid>
      <category>Parallel Programming</category>
      <category>Visual Computing</category>
      <category>Code &amp; Downloads</category>
    </item>
    <item>
      <title>Using Intel® TBB in network applications: Network Router emulator</title>
      <description><![CDATA[ <p><b>Introduction</b></p>
<p>Intel® Threading Building Blocks is used in wide range of applications. If performance makes sense and multi core platform is used, TBB is good thing to be added to C++ program. Network applications are usually highly-loaded as they process huge amount of traffic and processing time constraints are high. This article is intended to show how TBB can be used in network packet processing software, improving its productivity and processing time.</p>
<p>For a sample project I've created a simplified Network Router emulator. Network Router is a device that routes and transmits IP (Internet Protocol) packets in local area network (LAN). It connects several PCs, provides them access to Internet and internal network. The device has several internal network interfaces and one external.</p>
<p>The sample project emulates Network Router logic. It provides the following functionality:</p>
<ul>
<li>Input packets from file - the application is just a model so there is no need for real interconnection with network interface. Reading from file emulates real reading from network interface.</li>
<li>NAT - Network Address Translation. The router has only one external IP address, but packets should be delivered to several internal devices behind the router. NAT allows port and IP mapping from external to internal and vice versa.</li>
<li>IP routing - delivering packets to appropriate router NIC (Network Interface Controller) according to destination IP.</li>
<li>Bandwidth management - some traffic is real time and it's critical to deliver these packets as quick as possible (e.g. voice over IP). The VoIP protocols maintain telephone conversation and delays would degrade quality. The router can prioritize these critical packets so they can be processed quicker.</li>
</ul>
<p>I've created two versions of Network Router: serial and parallel. The latter uses Intel® Threading Building Blocks. I'll describe how TBB was used in the project and will provide performance results of the program parallelization.</p>
<p><b>Network Router implementation</b></p>
<p>Network router emulator gets packets from file and processes them. Packet processing includes Bandwidth management, NAT translation and IP routing. Packets are processed by several program modules. These processing modules are ordered sequentially, like in assembly line. This is common composition of packet processing application. Input file is a text file, each line represents one IP packet. There is separate thread that reads packets by big chunks.</p>
<p>Intel® TBB has tbb::pipeline class that provides high level framework for such kind of program structure. It has filters that process packets on each stage. Each packet goes through the pipeline and is processed step by step by its filters. One packet is processed sequentially - from first filter to second, than third, etc. However processing of one packet is independent from another, so filters can operate in parallel.</p>
<p ><br />Network Router scheme<br /><img height="256" width="531" src="http://software.intel.com/file/36534"  /></p>
<p><br /><br />Main function:</p>
<pre name="code" class="cpp">#include &lt;iostream&gt; 
#include &lt;sstream&gt;
#include &lt;fstream&gt;
#include &lt;vector&gt;
#include &lt;algorithm&gt;
#include &lt;ittnotify.h&gt;
#include &lt;tbb/pipeline.h&gt;
#include &lt;tbb/concurrent_hash_map.h&gt;
#include &lt;tbb/atomic.h&gt;
#include &lt;tbb/concurrent_queue.h&gt;
#include &lt;tbb/compat/thread&gt;
// Redirects calls to "new" and "delete" to TBB thread safe allocators
#include &lt;tbb/tbbmalloc_proxy.h&gt;

using namespace tbb;
using namespace std;

class bandwidth_manager_t;
class network_adress_translator_t;
class ip_router_t;
class compute_t;
typedef vector&lt;packet_trace_t&gt; packet_chunk_t;

int chunk_size = 1600;
concurrent_queue&lt;packet_chunk_t&gt; chunk_queue;
atomic&lt;bool&gt; stop_flag;

int main(int argc, char* argv[])
{
	ip_addr_t external_ip;
	nic_t external_nic;	
	nat_table_t nat_table;	// NAT table   
	ip_config_t ip_config;	// Router network configuration 					
	int ntokens = 24;	
	
	get_args (argc, argv);	
    ifstream config_file (config_file_name);

    if (!config_file) {
        cerr &lt;&lt; "Cannot open config file " &lt;&lt; config_file_name &lt;&lt; "\n";
        exit (1);
    }		
	if (! initialize_router (external_ip, external_nic, 
                            ip_config, config_file)) exit (1);	
	
	thread input_thread(input_function);

	// packet processing objects
	bandwidth_manager_t bwm;	
	network_adress_translator_t nat(external_ip, external_nic, nat_table);
	ip_router_t ip_router(external_ip, external_nic, ip_config);		

__itt_resume();
	bool stop_pipeline = false;	
	
	parallel_pipeline(ntokens,		
		make_filter&lt;void, packet_chunk_t*&gt;(		// Input filter
			filter::parallel,
			[&amp;](flow_control&amp; fc)-&gt; packet_chunk_t*{				
				
				if (stop_pipeline){					
					fc.stop();
				}				
				packet_chunk_t* packet_chunk = new packet_chunk_t(chunk_size);
					
				if(!chunk_queue.try_pop(*packet_chunk)){				
					if (stop_flag) {
						stop_pipeline = true;
					}
				}				
				return packet_chunk;
			}
		)&amp;	// Bandwidth manager filter
		make_filter&lt;packet_chunk_t*, packet_chunk_t*&gt;(		
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk)-&gt; packet_chunk_t*{								
				
				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){
					packet_trace_t packet;
					packet = (*packet_chunk)[i];				
					
					if (packet.nic == empty){
						break;
					}
					else{
						bwm.prioritize(packet);									
						compute_t compute;
						compute.work();						
					}										
				}
				std::sort(packet_chunk-&gt;begin(), packet_chunk-&gt;end(),
							packet_comparator);
				return packet_chunk;	
			}
		)&amp;	// NAT filter
		make_filter&lt;packet_chunk_t*, packet_chunk_t*&gt;(	
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk)-&gt; packet_chunk_t*{

				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){	
					packet_trace_t packet;

					packet = (*packet_chunk)[i];					
					if (packet.nic == empty)
						break;
					else{				
						nat.map(packet);
						compute_t compute;
						compute.work();	
					}
				}				
				return packet_chunk;
			}
		)&amp;	// IP routing filter
		make_filter&lt;packet_chunk_t*, packet_chunk_t*&gt;(		
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk)-&gt; packet_chunk_t*{			

				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){						
					packet_trace_t packet;
					packet = (*packet_chunk)[i];
					
					if (packet.nic == empty)
						break;
					else{				
						ip_router.route(packet);
						compute_t compute;
						compute.work();	
					}
				}				
				return packet_chunk;
			}
		)&amp;	// Output filter
		make_filter&lt;packet_chunk_t*, void&gt;(	
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk){														
				
				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){						
					packet_trace_t packet;
					packet = (*packet_chunk)[i];	
					compute_t compute;
					compute.work();	

					if (packet.nic == empty)
						break;
				}	
				// No output is required , just drop packets
				delete packet_chunk; 
			}
		)
	);	
__itt_pause();

	cout &lt;&lt; "\nAll packets are processed\n\n";		
	return 0;
}</pre>
<br />
<p>First part is "preparation" - creating objects, reading command line, opening files and initializing. Configuration file contains router interfaces info. Objects bwm, nat and ip_router are packet processing objects. They use containers nat_table and ip_config for storing NAT and IP tables.</p>
<p>The core component of Network Router is pipeline. It is implemented using tbb::parallel_pipeline() function, that takes number of tokens and list of filters as arguments. The element of work that is passed through the pipeline is of type packet_chunk_t. Parameter ntokens controls maximum number of concurrently processed elements. It has value 24 because the project was tested on 24-core machine and making it bigger wouldn't make an effect.</p>
<p>Pipeline filters perform some work execution, particularly packet processing in this application. Filters can be serial or parallel. This mode is controlled by filter parameter that is filter::parallel for all filters. This means that any filter can process some elements at the same time.</p>
<p>First filter extracts packet chunk from chunk_queue and passes it to second filter. Second filter performs bandwidth management operations on each packet from chunk. bwm module assigns priorities to packets according to protocol. Then packets in chunk are sorted by priority. This allows critical traffic to be processed as early as possible.  Subsequent filters make NAT mapping and IP routing. Last filter is output, but for simplicity real output is not done. Packets are just dropped.</p>
<p>Packet chunk is used as pipeline token because it's big enough. If single packets were passed through pipeline there would be too much transitions between threads, and overhead would be bigger than positive effect.</p>
<p>The __itt_resume() and __itt_pause() functions are used by Intel® VTune<sup>TM</sup> Amplifier XE that was used for performance measurements. These API functions mark the beginning and the end of area of interest.</p>
<p>Object compute of type compute_t makes workload for CPU. It just performs additional computations to simulate computing in real systems. The application doesn't perform the entire job needed for processing and routing packets in real life network equipment. It is just model framework of real application, so there is not enough CPU usage. Method compute_t:: work()starts computing "N Queens" algorithm.</p>
<p>Input file opening and reading is a job of separate thread. It is instantiated using std::thread class that is a part of new upcoming C++ 11 standard.</p>
<p><b>Serial implementation</b></p>
<p>To understand effect from parallelization a serial version was created. It has similar structure. The only difference is that parallel_pipeline is replaced with simple while loop.</p>
<p >Network router serial scheme<br /><br /><img height="248" width="459" src="http://software.intel.com/file/36533" /></p>
<p>While loop (replacing parallel_pipeline):</p>
<pre name="code" class="cpp">__itt_resume();
	bool stop = false;

	while (!stop){
		packet_chunk_t packet_chunk(chunk_size);
		
		if(!chunk_queue.try_pop(packet_chunk)){				
			if (stop_flag) {
				stop = true;
			}
		}		
		
		for(int i=0; i &lt; packet_chunk.size(); i++){
			packet_trace_t packet = packet_chunk[i];;			
			bwm.prioritize(packet);	
			compute_t compute;
			compute.work();									
		}
		std::sort(packet_chunk.begin(), packet_chunk.end(), packet_comparator);
		for(int i=0; i &lt; packet_chunk.size(); i++){
			packet_trace_t packet = packet_chunk[i];				
			nat.map(packet);
			compute_t compute;
			compute.work();		
			ip_router.route(packet);				
			compute.work();							
			compute.work();								
		}
	}
__itt_pause();</pre>
<p><br />There are four calls of compute.work() - the same number as in TBB version. This is going to be the most CPU time consuming function, so it's fair to have same number of calls to it.</p>
<p><b>Data structures</b></p>
<p>Input file has the following format:</p>
<p class="code">eth3 104.44.44.10 10.230.30.03 4003 5003 ftp<br />eth3 104.44.44.10 10.230.30.03 4003 5003 rtp<br />eth0 134.77.77.30 104.44.44.10 2004 4003 sip<br />eth3 104.44.44.10 10.230.30.03 4003 5003 http</p>
<p>Each line represents one packet. It has network interface, source, destination IP and port, protocol. Packet is stored in packet_trace_t structure:</p>
<pre name="code" class="cpp">typedef struct {
	nic_t nic;			// network interface where packet arrived
	ip_addr_t destIp;		// destination IP
	ip_addr_t srcIp;		// source IP
	port_t destPort;		// destination port
	port_t srcPort;		// source port 
	protocol_t protocol;	// protocol type (rtp, ftp, http, sip, etc)
	int priority;			// packet priority
} packet_trace_t;
</pre>
<br />NAT table and IP configuration table are stores in tbb::concurrent_hash_map. Packet chunk is stored in std::vector and chunk queue is of type tbb::concurrent_queue:<br /><br />
<pre name="code" class="cpp">typedef concurrent_hash_map&lt;port_t, address*, string_comparator&gt; nat_table_t; 
typedef concurrent_hash_map&lt;ip_addr_t, nic_t, string_comparator&gt; ip_config_t; 
typedef vector&lt;packet_trace_t&gt; packet_chunk_t;
concurrent_queue&lt;packet_chunk_t&gt; chunk_queue;
</pre>
<br />Input file reading is made by separate thread that executes input_function. The input_function opens file and reads it. Reading is performed by chunks that are passed to chunk queue. TBB containers are thread-safe, so main thread can read from the chunk queue at the same time without making additional synchronization manually. Input thread function:<br /><br />
<pre name="code" class="cpp">void input_function(){	
    ifstream in_file (in_file_name);
    if (!in_file) {
        cerr &lt;&lt; "Cannot open input file " &lt;&lt; in_file_name &lt;&lt; "\n";
        exit (1);
    }
	stop_flag = false;	
	
	while(in_file.good()){			
		packet_chunk_t packet_chunk(chunk_size);
								
		for(int i=0; i&lt;chunk_size; i++){
			packet_trace_t packet;
			in_file &gt;&gt; packet;					
			packet_chunk[i] = packet;			
		}
		chunk_queue.push(packet_chunk);			
	}
	stop_flag = true;
}</pre>
<br />
<p><b>Performance measurements</b></p>
<p>The goals of this project were to achieve good performance and scalability by using TBB. For measurements the following setup was used:</p>
<p>CPU: 4 processors Intel® Xeon X7460, 2,66 Ghz, 24 physical cores total <br />RAM: 16 GB <br />OS: Microsoft Windows Server® Enterprise 2008 SP2 <br />Workload: input file: 113405 packets (5,1 MB) <br />Measurement tool: Intel® VTune<sup>TM</sup> Amplifier XE 2011 <br />Analysis type: Concurrency with default settings</p>
There were performed two tests: for serial and for parallel versions. Below are summaries from the two analyses. Left is for serial and right is for TBB versions:<br /><br />
<p ><img height="326" width="599" src="http://software.intel.com/file/36538" /></p>
<br />
<p>It's seen that CPU time is similar. This is sum of CPU times of all cores of the system. But elapsed time is very different. This is clock time that the application takes for processing. In serial version it is near the value of overall CPU time. In TBB version it is 19 times less. So the application worked 19 times faster.</p>
CPU usage for serial version:<br /><br />
<p ><img height="265" width="766" src="http://software.intel.com/file/36535" /></p>
<br />CPU usage for TBB version:<br /><br />
<p ><img height="258" width="770" src="http://software.intel.com/file/36536" /></p>
<br /><br />
<p>Average number of utilized cores for TBB version is 20.5 and most of the processing time all 24 were used. This demonstrates that application is scalable enough and can use almost all cores on multi-core system.</p>
Bottom-up view of serial application shows that almost all the time is spent for computing module simulating real workload:<br /><br />
<p ><img height="298" width="856" src="http://software.intel.com/file/36537" /></p>
<br /><br />In TBB version picture is very similar, main hotspot is the same compute_t::do_work method. However it's mostly indicated with green that means good CPU utilization. Also there are more functions in the list because of using TBB constructions:<br /><br />
<p ><img height="424" width="770" src="http://software.intel.com/file/36540" /></p>
<br /><br />
<p>The results provided show good performance results for TBB-based application. However keep in mind the following conditions:</p>
<p>1) There were used Amplifier XE API functions __itt_resume() and __itt_pause() that bound measured area. The result show performance of tbb::parallel_pipeline for TBB version and while loop for serial version. Measurements of overall application work will give a little bit different results.</p>
<p>2) Simulated job was used to utilize CPU. The compute_t class computes algorithm of "N queens" task. Real processing is different.  If there would be not enough job for CPU, file input would consume relatively more time. So in real application scalability and performance gain can be worse.</p>
<p><strong>Conclusion</strong></p>
This sample project shows possibility of using TBB in composing Network packet processing applications and applicability of tbb::pipeline. These approaches can be applied in IP routing switches, telecommunication servers (VoIP telephony, video conferencing), various gateways and proxies, etc.  Like any hardly-loaded application network software can win from enabling multi-threading. And it is simple and effective to use Intel® Threading Building Blocks for managing parallelism in your application.
<div><br /></div>
<div>The full project source code:</div>
<div><a target="_blank" href="http://software.intel.com/file/36623">NetworkRouter.cpp</a></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/network-router-emulator/</link>
      <pubDate>Mon, 23 May 2011 13:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/network-router-emulator/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/network-router-emulator/</guid>
      <category>Parallel Programming</category>
      <category>Tools</category>
      <category>Intel Software Network communities</category>
      <category>Intel Software Network communities</category>
    </item>
    <item>
      <title>Threading Challenge 2010 Phase 1 - Additional Submitted Entries (Master Level)</title>
      <description><![CDATA[ <strong>Threading Challenge 2010 Phase 1 - Entries Submitted (Codes)<br /><br /><br /></strong>
<p>Below you will find some of the submitted entries by problem for Phase 1 of the Threading Challenge 2010.  Please feel free to review and join us in the <strong>forum</strong> dedicated to each problem to discuss.<br /><br /><strong>Master Level<br /><br />Problem 1 - Hosoya Index             <strong>Comment on </strong><a href="http://software.intel.com/en-us/forums/threading-challenge-2010-hosoya-index/"><strong>dedicated forum for Hosoya Index</strong><br /></a><br />Entry Submitted by Michael_Uelschen:     <a href="http://software.intel.com/file/33153">Code</a>        <a href="http://software.intel.com/file/33154">Write-up<br /></a><br />Entry Submitted by sukhaj:     <a href="http://software.intel.com/file/33155">Write-up<br /></a><br /></strong></p> ]]></description>
      <link>http://software.intel.com/en-us/articles/threading-challenge-2010-phase-1-additional-submitted-entries-master-level/</link>
      <pubDate>Wed, 22 Dec 2010 21:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/threading-challenge-2010-phase-1-additional-submitted-entries-master-level/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/threading-challenge-2010-phase-1-additional-submitted-entries-master-level/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Fireflies - Scalable Ambient Effects</title>
      <description><![CDATA[ <link media="screen" href="http://software.intel.com/media/gamedev/css/3302_Intel_VC_01.css?v=11" type="text/css" rel="stylesheet" />
<link media="screen" href="http://software.intel.com/file/23729" type="text/css" rel="stylesheet" />
<table width="100" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top">
<div id="left_container">
<div id="header_content"><a href="http://software.intel.com/en-us/visual-computing/" title="Visual Computing Developer Community"><img height="96" width="727" src="http://software.intel.com/file/20493/" border="0" /></a></div>
<div id="left_content_container2"><!-- START left content -->
<div id="showcase_01">
<div >
<h2>Scalable Ambient Effects (Fireflies)</h2>
<p>Fireflies is a tech sample demonstrating a scalable ambient effect. In this sample, the ambient effect is a swarm of fireflies that scatter and reform into a walking character. Using Intel TBB, the firefly flight trajectory calculations performed per frame are distributed across multiple threads. By changing the number of simulated fireflies programmatically the ambient effect can be scaled to better match the performance of the platform it is running on.</p>
<p><a href="http://software.intel.comjavascript:void(0)" onclick="ndownload('http://software.intel.com/file/33362')" title="Fireflies Source"><img src="http://software.intel.com/file/25370" border="0" /></a><br /><br /><a href="http://software.intel.comjavascript:void(0)" onclick="ndownload('http://software.intel.com/file/33363')" title="Fireflies Installer" class="filedownload"><img src="http://software.intel.com/file/25371" border="0" /></a></p>
</div>
<div >
<p>
<object height="203" width="360" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000">
<param name="flashvars" value="file=http://software.intel.com/media/videos/e/f/2/a/4/b/e/Eliezer_Payzer_Firefly_Demo_V5.mp4&amp;image=http://software.intel.com/media/videos/e/f/2/a/4/b/e/ef2a4be5473ab0b3cc286e67b1f59f44_player.jpg&amp;autostart=false&amp;bufferlength=5&amp;allowfullscreen=true&amp;plugins=http://software.intel.com/common/swf/listen&amp;title=Ambient+Scalable+Effects+Fireflies+Demo+" />
<param name="allowfullscreen" value="true" />
<param name="src" value="http://software.intel.com/common/swf/mediaplayer.swf" /><embed src="http://software.intel.com/common/swf/mediaplayer.swf" allowfullscreen="true" flashvars="file=http://software.intel.com/media/videos/e/f/2/a/4/b/e/Eliezer_Payzer_Firefly_Demo_V5.mp4&amp;image=http://software.intel.com/media/videos/e/f/2/a/4/b/e/ef2a4be5473ab0b3cc286e67b1f59f44_player.jpg&amp;autostart=false&amp;bufferlength=5&amp;allowfullscreen=true&amp;plugins=http://software.intel.com/common/swf/listen&amp;title=Ambient+Scalable+Effects+Fireflies+Demo+" type="application/x-shockwave-flash" height="203" width="360"></embed>
</object>
</p>
<center><a href="http://software.intel.com/en-us/videos/ambient-scalable-effects-fireflies-demo-1/?wapkw=(fireflies">Fireflies Video (larger screen)</a></center>
<p><b><br />Read:</b> <a href="http://software.intel.com/en-us/articles/scalable-ambient-effects/" title="Scalable Ambient Effects">Scalable Ambient Effects<br /></a><b>Blog Post:</b> <a href="http://software.intel.com/en-us/blogs/2010/12/06/multithreaded-man-explodes-into-fireflies/" title="Multithreaded Man Explodes Into Fireflies">Multithreaded, Man Explodes Into Fireflies!</a></p>
</div>
<br clear="all" />
<div>
<table bgcolor="#ffffff" width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td><img height="37" width="531" src="http://software.intel.com/file/25372" /></td>
<td></td>
</tr>
</tbody>
</table>
<table bgcolor="#ffffff" cellpadding="0" bordercolor="#ffffff" cellspacing="6" border="0">
<tbody>
<tr>
<td width="214" valign="top">
<div align="center"><a href="http://software.intel.com/file/32677"><img src="http://software.intel.com/file/32607" alt="Fireflies_screenshot1_web.jpg" /></a></div>
</td>
<td width="234" valign="top">
<div align="center"><a href="http://software.intel.com/file/32678" title="Fireflies image 2"><img src="http://software.intel.com/file/32608" alt="Fireflies_screenshot2_web.jpg" title="Fireflies_screenshot2_web.jpg" /></a></div>
</td>
<td width="256" valign="top">
<div align="center"><a href="http://software.intel.com/file/32680" title="Fireflies image 3"><img src="http://software.intel.com/file/32609" alt="Fireflies_screenshot3_web.jpg" title="Fireflies_screenshot3_web.jpg" /></a></div>
</td>
</tr>
<tr>
<td valign="top">
<table align="center" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<p><i>Fireflies flock to form a walking character</i></p>
</td>
</tr>
</tbody>
</table>
</td>
<td valign="top">
<table align="center" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<p><i>Fireflies scatter and flock </i><a href="http://software.intel.com/file/23694/"></a></p>
</td>
</tr>
</tbody>
</table>
</td>
<td valign="top">
<div align="center">
<table align="center" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td width="161" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<p align="center">The sample can run in multithreaded as well as serial mode to better see the performance benefit of multithreading an ambient effect.</p>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<br /><br /><!-- start of 3 column table -->
<table width="695" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="695" rowspan="2" valign="top">
<table width="695" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top"><img height="8" width="697" src="http://software.intel.com/file/22889" /></td>
</tr>
<tr>
<td valign="top" class="panel_bg_02">
<table width="695" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="12" rowspan="2"><img height="8" width="12" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
<td valign="top" height="4"><img height="8" width="8" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top">
<table width="100%" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td align="center" width="33%" valign="top" height="19" class="arrow"><span ><b>What is it?</b></span></td>
<td align="center" width="33%" valign="top" height="19" class="arrow"><span ><b>System Requirements</b></span></td>
<td align="center" width="33%" valign="top" height="19" class="arrow"><span ><b><a href="http://software.intel.com/en-us/articles/code/">Additional Code Samples</a></b></span></td>
</tr>
<tr>
<td align="left" width="33%" valign="top" height="19" class="arrow"></td>
<td align="left" width="33%" valign="top" height="19" class="arrow"></td>
<td align="left" width="33%" valign="top" height="19" class="arrow"></td>
</tr>
<tr>
<td align="left" width="33%" valign="top" height="19">
<ul>
<li>Threaded particle system using <a href="http://www.threadingbuildingblocks.org/">Intel® Threading Building Blocks</a></li>
<li>Scalable Ambient Effects </li>
</ul>
</td>
<td align="left" width="33%" valign="top" height="19"><ol type="1">
<li>CPU: Dual core or better (Intel® Core™ i5 or better suggested)</li>
<li>GFX: DX9c capable graphics card </li>
<li>OS: Microsoft Windows Vista* or Microsoft Windows 7*</li>
<li>MEM: 2 GB of RAM or better </li>
<li>Software: <ol type="1">
<li>DirectX SDK (June 2010 release or later)</li>
<li>Build with Microsoft Visual Studio 2008* w/SP1 or Visual Studio 2010*</li>
</ol></li>
</ol>
<p>* Other names and brands may be claimed as the property of others.</p>
</td>
<td align="left" width="33%" valign="top" height="19">
<ul>
<li><a href="http://software.intel.com/en-us/articles/tickertape/" title="TickerTape">TickerTape Demo</a></li>
<li><a href="http://software.intel.com/en-us/articles/smoke-technology-demo/" title="Smoke">Smoke Game Technology </a></li>
<li><a href="http://software.intel.com/en-us/articles/ocean-fog-using-direct3d-10/">OceanFog Using Directed3D 10 </a></li>
</ul>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<!--bottom border for large box-->
<tr>
<td valign="top"><img height="8" width="697" src="http://software.intel.com/media/gamedev/_images/footer-bg-01.gif" /></td>
</tr>
<!--end border-->
</tbody>
</table>
</td>
<td width="10" rowspan="2"><img height="10" width="10" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
<!--raghava-->
</tr>
<tr>
<td width="344" valign="top"></td>
</tr>
</tbody>
</table>
<!-- end of 3 column table --><br /><br /></div>
</div>
</div>
</td>
<td valign="top" ><!-- RHC -->
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td align="center" width="215">
<table align="center" width="223" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td height="4"><img height="4" width="232" src="http://software.intel.com/file/20516" /></td>
</tr>
<tr>
<td>
<table align="center" width="223" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td align="center" valign="top"><a href="http://www.intelsoftwaregraphics.com/?lid=5ceakfXf8Ho=&amp;siteid=cqMoF5H/37o="><img height="71" width="223" src="http://software.intel.com/file/20512" alt="Intel Visual Adrenaline" border="0" title="Intel Visual Adrenaline" /></a></td>
</tr>
<tr>
<td valign="top" >
<table width="223" cellpadding="0" cellspacing="0" border="0" >
<tbody>
<tr>
<td width="11" height="8"></td>
<td align="center" width="10"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://software.intel.com/en-us/visual-computing/" title="Intel Adrenaline Developer Community" >Developer Community</a></td>
<td width="10"></td>
</tr>
<tr>
<td height="8"></td>
<td align="center"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://www.intel.com/cd/software/partner/asmo-na/eng/index.htm" title="Intel Adrenaline Software Partner Program" >Intel® Software Partner Program</a></td>
<td></td>
</tr>
<tr>
<td height="8"></td>
<td align="center"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://www.intel.com/Consumer/Game/index.htm" title="Intel Adrenaline Game On" >Game On</a></td>
<td></td>
</tr>
<tr>
<td height="8"></td>
<td align="center"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://www.intelsoftwaregraphics.com/?lid=5ceakfXf8Ho=&amp;siteid=cqMoF5H/37o=" title="Intel Adrenaline Showcase" >Showcase</a></td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td valign="top" height="7"><img height="7" width="223" src="http://software.intel.com/file/20515" /></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td valign="top" height="4"><img height="6" width="6" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td></td>
</tr>
</tbody>
</table>
<br /><center>
<table cellpadding="0" cellspacing="0" border="0" id="nav_table">
<tbody>
<tr>
<td>
<table width="190" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="9"></td>
<td>
<div align="center" ><b>A Scalable 3D <br />Particle System</b><br /><a href="http://software.intel.com/en-us/articles/tickertape/" title="TickerTape"><img src="http://software.intel.com/file/25664/" alt="Download PDF" border="0" /></a><br /><br /><b>Benefits of SIMD</b><br /><a href="http://software.intel.com/en-us/articles/tickertape-part-2/"><img src="http://software.intel.com/file/25665/" alt="Download PDF" border="0" /></a><br /><br /><b>Visual Adrenaline</b><br /><a href="http://software.intel.com/sites/billboard/"><img src="http://software.intel.com/file/25369" alt="Download PDF" border="0" /></a><br /></div>
</td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center><br /><center>
<table cellpadding="0" cellspacing="0" border="1" id="nav_table">
<tbody>
<tr>
<td>
<table width="190" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="9" class="right_container_hdr2"></td>
<td class="right_container_hdr2"><b>Intel Tools for Unreal Developers <br /><a href="http://software.intel.com/en-us/articles/epic-licenses-tbb-for-ue-licensees/">TBB for Unreal Engine</a></b></td>
<td class="right_container_hdr2"></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td width="9" class="right_container_hdr"></td>
<td class="right_container_hdr">
<h4>Related Links</h4>
</td>
<td class="right_container_hdr"></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td height="15"></td>
<td valign="middle"><a href="http://www.intel.com/software/graphics" title="Intel Visual Computing Home">Visual Computing Home</a></td>
<td></td>
</tr>
<tr>
<td></td>
<td>
<h3>Intel<sup>®</sup> Technologies</h3>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top"><a href="http://www.intel.com/software/sandybridge">Sandy Bridge</a><br /><a href="http://software.intel.com/en-us/articles/integrated-graphics/" title="Intel Visual Computing Technologies Integrated Graphics">Graphics</a><br /><a href="http://software.intel.com/en-us/articles/parallel-programming-vc/" title="Intel Visual Computing Technologies Parallel Programming">Parallel Programming</a></td>
<td></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td></td>
<td>
<h3>Focus Areas</h3>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top"><a href="http://software.intel.com/en-us/articles/game-dev/" title="Intel Game Development Focus Area">Game Development</a><br /><a href="http://software.intel.com/en-us/articles/artist-animator/" title="Intel Visual Computing Artist/Animator Focus Area">Artist/Animator</a><br /><a href="http://software.intel.com/en-us/articles/media/" title="Intel Visual Computing Media Focus Area">Media</a></td>
<td></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"></td>
</tr>
<tr>
<td></td>
<td>
<h3>Develop</h3>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top"><a href="http://software.intel.com/en-us/articles/tools-vc/" title="Intel Visual Computing Devlopment Tools">Tools</a><br /><a href="http://software.intel.com/en-us/articles/code/" title="Intel Visual Computing Devlopment Code">Code</a></td>
<td></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center><!--END right column Content --></td>
</tr>
</tbody>
</table> ]]></description>
      <link>http://software.intel.com/en-us/articles/fireflies/</link>
      <pubDate>Fri, 05 Nov 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/fireflies/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/fireflies/</guid>
      <category>Parallel Programming</category>
      <category>Tools</category>
      <category>Visual Computing</category>
      <category>Intel® Graphics Performance Analyzers (Intel® GPA)</category>
      <category>Code &amp; Downloads</category>
      <category>Game Development</category>
    </item>
    <item>
      <title>Intel® AVX C/C++ Intrinsics Emulation</title>
      <description><![CDATA[ <p>Intel® AVX instruction set extension <a target="_blank" href="http://software.intel.com/en-us/avx/">[1]</a> will appear in the next generation Intel microarchitecture codename ‘Sandy Bridge'. We chose to announce AVX early to get as much support from software vendors as possible by the hardware launch time. Now, most software development platforms are supporting Intel AVX, examples are compilers and assemblers from Intel, Microsoft and GCC as well as UNIX binutils.</p>
<p>For early adopters we introduced support of AVX in Intel® Software Development Emulator <a target="_blank" href="http://software.intel.com/en-us/articles/intel-software-development-emulator/">[2]</a>, it allows you to run and check functional correctness of the code with the actual AVX instructions before hardware is available.</p>
<p>Today we are adding another useful piece to help those who may not be able to use new tools supporting AVX in their current development environment but plan to migrate in the future or are using a software platform which is not supported by Intel SDE. These software developers can still start programming with Intel AVX using intrinsics.</p>
<p>Here we are providing the C and C++ header file which emulates Intel AVX intrinsics. The AVX emulation header file uses intrinsics for the prior Intel instruction set extensions up to Intel SSE4.2. SSE4.2 support in your development environment as well as hardware is required in order to use the AVX emulation header file. <br /><br />To use simply have this file included:</p>
<p>#include "avxintrin_emu.h"</p>
<p>Instead of usual:</p>
<p>#include &lt;immintrin.h&gt;</p>
<p><br />One can also create alternative immintrin.h file (which in turn includes avxintrin_emu.h) to avoid an intrusive change to the source base and then simply switch between real AVX code generation and emulation via alternating the path to include directories.</p>
<p>Emulation header is primarily targeting UNIX type of environments, and was tested on such with GCC and Intel C/C++ compilers. We have a strong support with other tools (compilers, assemblers and SDE) on Microsoft Windows platform, but this header file can still be used on Windows, if desired, with Intel Compiler.</p>
<p>Note that the AVX emulation header file is designed to allow functional correctness of an AVX implementation and not recommended for long-term usage or release in a final product. Once your development environment and hardware supports AVX, we recommend that you switch to the real AVX intrinsic header file.<br /><br />Although we did our best to debug it, this file must <em>not</em> be considered a reference functional implementation of AVX instructions or even bug-free. Please see the current version's limitations and caveats in the beginning of the file. Please let us know about the issues you faced using it.</p>
<p><b><br />Example</b></p>
<pre name="code" class="cpp:nogutter:nocontrols">#include "avxintrin_emu.h"  // #include &lt;immintrin.h&gt;

void saxpy( float a, const float* x, const float* y, float* __restrict z, size_t len )
{
    size_t i = 0;
    __m256 a_ = _mm256_set1_ps( a );

    for ( size_t len16_ = len &amp; -16; i + 16 &lt;= len16_; i += 16 )
    {
        __m256 x1_ = _mm256_loadu_ps( x + i );
        __m256 x2_ = _mm256_loadu_ps( x + i + 8 );

        __m256 y1_ = _mm256_loadu_ps( y + i );
        __m256 y2_ = _mm256_loadu_ps( y + i + 8 );

        x1_ = _mm256_mul_ps( x1_, a_ );
        x2_ = _mm256_mul_ps( x2_, a_ );

        x1_ = _mm256_add_ps( x1_, y1_ );
        x2_ = _mm256_add_ps( x2_, y2_ );

        _mm256_storeu_ps( z + i     , x1_ );
        _mm256_storeu_ps( z + i + 8 , x2_ );
    }

    for ( ; i &lt; len; ++i )
        z[i] = x[i] * a + y[i];
}</pre>
<p><br /><strong><br />References </strong></p>
<p>[1] Intel AVX - <a target="_blank" href="http://software.intel.com/en-us/avx/">http://software.intel.com/en-us/avx/</a></p>
<p>[2] Intel Software Development Emulator - <a target="_blank" href="http://software.intel.com/en-us/articles/intel-software-development-emulator/">http://software.intel.com/en-us/articles/intel-software-development-emulator/</a></p> ]]></description>
      <link>http://software.intel.com/en-us/articles/avx-emulation-header-file/</link>
      <pubDate>Wed, 23 Jun 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/avx-emulation-header-file/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/avx-emulation-header-file/</guid>
      <category>Parallel Programming</category>
      <category>Open Source</category>
      <category>What If Experimental Software</category>
      <category>Tools</category>
      <category>Intel® AVX</category>
      <category>Software News</category>
      <category>Code &amp; Downloads</category>
    </item>
    <item>
      <title>Adding Parallelism Sample Code</title>
      <description><![CDATA[ This is the sample code for the <em>Optimize an Existing Program by Adding Parallelism</em> Guide. Please see <strong>Article Attachments</strong> below.<br /><br />By installing or copying all or any part of the software components in this page, you agree to the terms of the <a href="http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/">Intel Sample Source Code License Agreement</a>.<br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/adding-parallelism-sample-code/</link>
      <pubDate>Tue, 11 May 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/adding-parallelism-sample-code/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/adding-parallelism-sample-code/</guid>
      <category>Parallel Programming</category>
      <category>Intel® Threading Building Blocks Knowledge Base</category>
    </item>
    <item>
      <title>Practical investigation of critical sections</title>
      <description><![CDATA[ <p>Recently we have read the post "<a href="http://software.intel.com/en-us/articles/managing-lock-contention-large-and-small-critical-sections/">Managing Lock Contention: Large and Small Critical Sections</a>" where the author touches upon the question of optimizing critical sections. I am not going to retell this post here but I would like to note that the author gives there some examples of how useful it is to unite sometimes several code fragments with critical sections into one large critical section. For example, instead of</p>
<div id="IDAQIQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre>Begin Thread Function ()
  Initialize ()
  BEGIN CRITICAL SECTION 1
    UpdateSharedData1 ()
  END CRITICAL SECTION 1
  DoFunc1 ()
  BEGIN CRITICAL SECTION 2
    UpdateSharedData2 ()
  END CRITICAL SECTION 2
  DoFunc2 ()
End Thread Function ()
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>you may write in this way</p>
<div id="IDAUIQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre>Begin Thread Function ()
  Initialize ()

  BEGIN CRITICAL SECTION 1
    UpdateSharedData1 ()
    DoFunc1 ()
    UpdateSharedData2 ()
  END CRITICAL SECTION 1

  DoFunc2 ()
End Thread Function ()
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>We decided to carry out some experiments and see if and how the theory meets the practice and what performance gains we may expect. For this purpose we made a test program that contains some functions with various synchronization implementations.</p>
<p>Here is what the program does: we calculate two sums (msum1, msum2). As this is performed in parallel, we need to protect these variables during the write. Like it was shown in the example from the article "Managing Lock Contention: Large and Small Critical Sections", we use two critical sections at first. Then we unite these sections into one critical section. Note that after this step the function of calculating the <i>n</i>-th root gets included into the section - it is a rather resource-intensive operation. Then we carry out some experiments.</p>
<p>Description of functions with various implementations of synchronization:</p>
<p>Foo1 - We use two critical sections:</p>
<div id="IDAAJQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre><span class="KEYWORD">#pragma</span> omp critical
{
  msum1 += m1;
}
<span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
<span class="KEYWORD">#pragma</span> omp critical
{
  msum2 += m2;
}
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>Foo2 - We use one critical section. The <i>n</i>-th root is calculated inside the critical section:</p>
<div id="IDAMJQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre><span class="KEYWORD">#pragma</span> omp critical
{
  msum1 += m1;
  <span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
  msum2 += m2;
}
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>Foo3 - We use one critical section. The <i>n</i>-th root is calculated outside the critical section:</p>
<div id="IDAWJQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre><span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
<span class="KEYWORD">#pragma</span> omp critical
{
  msum1 += m1;
  msum2 += m2;
}
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>Foo4 - We use atomic operations (<i>atomic</i> directive):</p>
<div id="IDAAKQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre><span class="KEYWORD">#pragma</span> omp atomic
msum1 += m1;

<span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);

<span class="KEYWORD">#pragma</span> omp atomic
msum2 += m2;
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>Foo5 - We use <i>reduction</i> directive:</p>
<div id="IDAMKQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre><span class="KEYWORD">#pragma</span> omp parallel <span class="KEYWORD">for</span> reduction(+:msum1, msum2)
...
msum1 += m1;
<span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
msum2 += m2;
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<p>The full text of the program is given below. You may also <a href="http://www.viva64.com/external-pictures/TestCriticalSections.7z">download</a> the project for Visual Studio 2005.</p>
<div id="IDA0KQZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre><span class="KEYWORD">#include</span> &lt;stdio.h&gt;
<span class="KEYWORD">#include</span> &lt;tchar.h&gt;
<span class="KEYWORD">#include</span> &lt;windows.h&gt;
<span class="KEYWORD">#include</span> &lt;math.h&gt;
<span class="KEYWORD">#include</span> &lt;omp.h&gt;
<span class="KEYWORD">#include</span> &lt;iostream&gt;
<span class="KEYWORD">#include</span> &lt;<span class="KEYWORD">float</span>.h&gt;

<span class="KEYWORD">class</span> Timing {
<span class="KEYWORD">public</span>:
  <span class="KEYWORD">void</span> StartTiming();
  <span class="KEYWORD">void</span> StopTiming();
  <span class="KEYWORD">double</span> GetUserSeconds() <span class="KEYWORD">const</span> {
    <span class="KEYWORD">return</span> value;
  }
<span class="KEYWORD">private</span>:
  DWORD_PTR oldmask;
  <span class="KEYWORD">double</span> value;
  LARGE_INTEGER time1;
};

<span class="KEYWORD">void</span> Timing::StartTiming()
{         
  SetThreadAffinityMask(::GetCurrentThread(), 0);
  QueryPerformanceCounter(&amp;time1);
}  

<span class="KEYWORD">void</span> Timing::StopTiming()
{  
  LARGE_INTEGER performance_frequency, time2;
  QueryPerformanceFrequency(&amp;performance_frequency);
  QueryPerformanceCounter(&amp;time2);  
  SetThreadAffinityMask(::GetCurrentThread(), oldmask);
  value = (<span class="KEYWORD">double</span>)(time2.QuadPart - time1.QuadPart);
  value /= performance_frequency.QuadPart;
}

<span class="KEYWORD">double</span> **CreateArray(ptrdiff_t w,
                     ptrdiff_t h)
{
  <span class="KEYWORD">double</span> **array = (<span class="KEYWORD">double</span> **)
    malloc(w * <span class="KEYWORD">sizeof</span>(<span class="KEYWORD">double</span> *));
  <span class="KEYWORD">if</span> (array == NULL)
    <span class="KEYWORD">throw</span> std::exception(<span class="STRING">"Error allocate memory"</span>);
  <span class="KEYWORD">for</span> (ptrdiff_t x = 0; x != w; ++x)
  {
    array[x] =
      (<span class="KEYWORD">double</span> *)malloc(h * <span class="KEYWORD">sizeof</span>(<span class="KEYWORD">double</span>));
    <span class="KEYWORD">if</span> (array[x] == NULL)
      <span class="KEYWORD">throw</span> std::exception(<span class="STRING">"Error allocate memory"</span>);

    <span class="KEYWORD">for</span> (ptrdiff_t y = 0; y != h; ++y)
      array[x][y] = (rand() % 100) / 100.0;
  }
  <span class="KEYWORD">return</span> array;
}

<span class="KEYWORD">double</span> Foo1(<span class="KEYWORD">double</span> **array,
            ptrdiff_t w, ptrdiff_t h)
{
  <span class="KEYWORD">double</span> msum1 = 0;
  <span class="KEYWORD">double</span> msum2 = 0;

  <span class="KEYWORD">#pragma</span> omp parallel <span class="KEYWORD">for</span>
  <span class="KEYWORD">for</span> (ptrdiff_t x = 0; x &lt; w; x++)
  {
    <span class="KEYWORD">double</span> m1 = 1;
    <span class="KEYWORD">for</span> (ptrdiff_t y = 0; y != h; ++y)
      m1 *= array[x][y];

    <span class="KEYWORD">#pragma</span> omp critical
    {
      msum1 += m1;
    }
    <span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
    <span class="KEYWORD">#pragma</span> omp critical
    {
      msum2 += m2;
    }
  }

  <span class="KEYWORD">return</span> msum1 + msum2;
}

<span class="KEYWORD">double</span> Foo2(<span class="KEYWORD">double</span> **array,
            ptrdiff_t w, ptrdiff_t h)
{
  <span class="KEYWORD">double</span> msum1 = 0;
  <span class="KEYWORD">double</span> msum2 = 0;

  <span class="KEYWORD">#pragma</span> omp parallel <span class="KEYWORD">for</span>
  <span class="KEYWORD">for</span> (ptrdiff_t x = 0; x &lt; w; x++)
  {
    <span class="KEYWORD">double</span> m1 = 1;
    <span class="KEYWORD">for</span> (ptrdiff_t y = 0; y != h; ++y)
      m1 *= array[x][y];

    <span class="KEYWORD">#pragma</span> omp critical
    {
      msum1 += m1;
      <span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
      msum2 += m2;
    }
  }

  <span class="KEYWORD">return</span> msum1 + msum2;
}

<span class="KEYWORD">double</span> Foo3(<span class="KEYWORD">double</span> **array,
            ptrdiff_t w, ptrdiff_t h)
{
  <span class="KEYWORD">double</span> msum1 = 0;
  <span class="KEYWORD">double</span> msum2 = 0;

  <span class="KEYWORD">#pragma</span> omp parallel <span class="KEYWORD">for</span>
  <span class="KEYWORD">for</span> (ptrdiff_t x = 0; x &lt; w; x++)
  {
    <span class="KEYWORD">double</span> m1 = 1;
    <span class="KEYWORD">for</span> (ptrdiff_t y = 0; y != h; ++y)
      m1 *= array[x][y];

    <span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
    <span class="KEYWORD">#pragma</span> omp critical
    {
      msum1 += m1;
      msum2 += m2;
    }
  }

  <span class="KEYWORD">return</span> msum1 + msum2;
}

<span class="KEYWORD">double</span> Foo4(<span class="KEYWORD">double</span> **array,
            ptrdiff_t w, ptrdiff_t h)
{
  <span class="KEYWORD">double</span> msum1 = 0;
  <span class="KEYWORD">double</span> msum2 = 0;

  <span class="KEYWORD">#pragma</span> omp parallel <span class="KEYWORD">for</span>
  <span class="KEYWORD">for</span> (ptrdiff_t x = 0; x &lt; w; x++)
  {
    <span class="KEYWORD">double</span> m1 = 1;
    <span class="KEYWORD">for</span> (ptrdiff_t y = 0; y != h; ++y)
      m1 *= array[x][y];

    <span class="KEYWORD">#pragma</span> omp atomic
    msum1 += m1;

    <span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);

    <span class="KEYWORD">#pragma</span> omp atomic
    msum2 += m2;
  }

  <span class="KEYWORD">return</span> msum1 + msum2;
}

<span class="KEYWORD">double</span> Foo5(<span class="KEYWORD">double</span> **array,
            ptrdiff_t w, ptrdiff_t h)
{
  <span class="KEYWORD">double</span> msum1 = 0;
  <span class="KEYWORD">double</span> msum2 = 0;

  <span class="KEYWORD">#pragma</span> omp parallel <span class="KEYWORD">for</span> reduction(+:msum1, msum2)
  <span class="KEYWORD">for</span> (ptrdiff_t x = 0; x &lt; w; x++)
  {
    <span class="KEYWORD">double</span> m1 = 1;
    <span class="KEYWORD">for</span> (ptrdiff_t y = 0; y != h; ++y)
      m1 *= array[x][y];

    msum1 += m1;
    <span class="KEYWORD">double</span> m2 = pow(m1, 1.0 / h);
    msum2 += m2;
  }

  <span class="KEYWORD">return</span> msum1 + msum2;
}

<span class="KEYWORD">int</span> _tmain(<span class="KEYWORD">int</span>, _TCHAR*)
{
  <span class="KEYWORD">const</span> ptrdiff_t W = 100;
  <span class="KEYWORD">const</span> ptrdiff_t H = 200;
  <span class="KEYWORD">double</span> **array = CreateArray(W, H);

  <span class="KEYWORD">const</span> ptrdiff_t TestsCount = 50000;
  Timing t;
  t.StartTiming();
  <span class="KEYWORD">double</span> result = 0;
  <span class="KEYWORD">for</span> (ptrdiff_t i = 0; i != TestsCount; ++i)
    result = Foo1(array, W, H);
  t.StopTiming();
  printf(<span class="STRING">"Foo1 - two critical sections\n"</span>);
  printf(<span class="STRING">"Foo1 return: %G\n"</span>, result);
  printf(<span class="STRING">"Foo1 time = %.3G seconds.\n\n"</span>,
    t.GetUserSeconds());

  t.StartTiming();
  <span class="KEYWORD">for</span> (ptrdiff_t i = 0; i != TestsCount; ++i)
    result = Foo2(array, W, H);
  t.StopTiming();
  printf(<span class="STRING">"Foo2 - one critical sections\n"</span>);
  printf(<span class="STRING">"Foo2 return: %G\n"</span>, result);
  printf(<span class="STRING">"Foo2 time = %.3G seconds.\n\n"</span>,
    t.GetUserSeconds());

  t.StartTiming();
  <span class="KEYWORD">for</span> (ptrdiff_t i = 0; i != TestsCount; ++i)
    result = Foo3(array, W, H);
  t.StopTiming();
  printf(<span class="STRING">"Foo3 - one critical sections + optimize\n"</span>);
  printf(<span class="STRING">"Foo3 return: %G\n"</span>, result);
  printf(<span class="STRING">"Foo3 time = %.3G seconds.\n\n"</span>,
    t.GetUserSeconds());

  t.StartTiming();
  <span class="KEYWORD">for</span> (ptrdiff_t i = 0; i != TestsCount; ++i)
    result = Foo4(array, W, H);
  t.StopTiming();
  printf(<span class="STRING">"Foo4 - atomic\n"</span>);
  printf(<span class="STRING">"Foo4 return: %G\n"</span>, result);
  printf(<span class="STRING">"Foo4 time = %.3G seconds.\n\n"</span>,
    t.GetUserSeconds());

  t.StartTiming();
  <span class="KEYWORD">for</span> (ptrdiff_t i = 0; i != TestsCount; ++i)
    result = Foo5(array, W, H);
  t.StopTiming();
  printf(<span class="STRING">"Foo5 - reduction\n"</span>);
  printf(<span class="STRING">"Foo5 return: %G\n"</span>, result);
  printf(<span class="STRING">"Foo5 time = %.3G seconds.\n\n"</span>,
    t.GetUserSeconds());
  <span class="KEYWORD">return</span> 0;
}
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<table width="98%" class="note">
<tbody>
<tr>
<td><b>ПРИМЕЧАНИЕ</b><br />
<p> </p>
</td>
</tr>
</tbody>
</table>
<p>The program measures the speed of each function. Here are the results we got when compiling the application with Intel(R) C++ Compiler 11.1.071 and launching it on a two-core processor:</p>
<div id="IDAB1QZC">
<table width="98%" class="code">
<tbody>
<tr>
<td>
<pre>Foo1 - two critical sections
Foo1 time = 2.21 seconds.

Foo2 - one critical sections
Foo2 time = 1.66 seconds.

Foo3 - one critical sections + optimize
Foo3 time = 1.48 seconds.

Foo4 - atomic
Foo4 time = 1.69 seconds.

Foo5 - reduction
Foo5 time = 0.863 seconds.
</pre>
</td>
</tr>
</tbody>
</table>
</div>
<h2>Summary<a name="IDAE1QZC"></a></h2>
<p>The results show that the code with two critical sections is the slowest (2.21 seconds). If you unite two critical sections and include the operation of the <i>n</i>-th root calculation into it, the time is reduced to 1.66 seconds. I repeat once again that this operation is rather expensive. If you take it out from the critical section, the speed increases by 1.66 - 1.48 = 0.18 seconds. Thus, we got in practice the evidence of a great speed gain from reducing the number of critical sections.</p>
<p>We also carried out an experiment using more efficient synchronization means. At first we supposed that atomic operations might provide an additional benefit. But in practice the speeds of <i>atomic</i> directive and one critical section almost coincide (the former - 1.69 seconds, the latter - 1.66 seconds). Moreover, if you take the <i>n</i>-th root calculation operation out of the critical section, it turns out that such a function is much more efficient than two <i>atomic</i> directives. The time of executing the function with the optimized critical section is 1.48 seconds while in the case of two atomic operations it is 1.69 seconds.</p>
<p><i>Reduction</i> directive showed the best result (0.86 seconds).</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/practical-investigation-of-critical-sections/</link>
      <pubDate>Sat, 13 Mar 2010 14:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/practical-investigation-of-critical-sections/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/practical-investigation-of-critical-sections/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Threading Challenge 2009 - Phase 2 - #3:  Graph Coloring</title>
      <description><![CDATA[ <img src="http://software.intel.com/file/15368" alt="746_125.jpg" title="746_125.jpg" /><br /><br />Below you will find many of the entries received for our <strong>3rd problem of phase 2 - Graph Coloring</strong>.  Please feel free to review and join us in the <strong><a href="http://software.intel.com/en-us/forums/graph-coloring/">forum</a></strong> dedicated to this problem to discuss.<br /><br /><br /><span class="sectionHeading">Winning Submission:<br /></span><br /><br /><span class="sectionHeadingText">*akki:  <a href="http://software.intel.com/file/23129">Code</a> / <a href="http://software.intel.com/file/23130">write-up</a> <br /></span><br /><br /><span class="sectionHeading">Other Submissions:</span><br /><br /><br />*mdm100:  <a href="http://software.intel.com/file/24797">Code</a> / Write-up<br /><br />*BradleyKuszmaul:  <a href="http://software.intel.com/file/23684">Code</a> / <a href="http://software.intel.com/file/23685">Write-up</a><br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/threading-challenge-2009-phase-2-3-graph-coloring/</link>
      <pubDate>Thu, 21 Jan 2010 10:30:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/threading-challenge-2009-phase-2-3-graph-coloring/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/threading-challenge-2009-phase-2-3-graph-coloring/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Threading Challenge 2009 - Phase 2 - #2: Knights Tour</title>
      <description><![CDATA[ <img src="http://software.intel.com/file/15368" alt="746_125.jpg" title="746_125.jpg" /><br /><br />Below you will find many of the entries received for our <strong>2nd problem of phase 2 - Knights Tour</strong>.  Please feel free to review and join us in the <a href="http://software.intel.com/en-us/forums/strassens-algorithm/"><strong>forum</strong></a> dedicated to this problem to discuss.<br /><br /><br /><span class="sectionHeading">Winning Submission:</span><br /><br /><br /><span class="sectionHeadingText">*mdm100: </span><a href="http://software.intel.com/file/22720"><strong>Code</strong> </a>/ <a href="http://software.intel.com/file/22721"><strong>Write-up</strong></a><br /><br /><br /><br /><span class="sectionHeading">Other Submissions:<br /></span><br /><br />*BradleyKuszmaul:  <a href="http://software.intel.com/file/24372">Code</a> / <a href="http://software.intel.com/file/24371">Write-up</a><br /><br />*gk4v07:  <a href="http://software.intel.com/file/24799">Code</a> / <a href="http://software.intel.com/file/24373">Write-up</a><br /><br />*tiandy:  <a href="http://software.intel.com/file/24374">Code</a> / <a href="http://software.intel.com/file/24375">Write-up</a><br /><br />*dendhui0815:  <a href="http://software.intel.com/file/24798">Code</a> / <a href="http://software.intel.com/file/24304">Write-up (Mandarin)</a><br /><br />*Jonas D'Mentia:  <a href="http://software.intel.com/file/24306">Code</a> / <a href="http://software.intel.com/file/24305">Write-up</a><br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/threading-challenge-2009-phase-2-2-knights-tour/</link>
      <pubDate>Thu, 21 Jan 2010 10:30:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/threading-challenge-2009-phase-2-2-knights-tour/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/threading-challenge-2009-phase-2-2-knights-tour/</guid>
      <category>Parallel Programming</category>
    </item>
  </channel></rss>
