<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sun, 12 Feb 2012 17:03:05 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/tools/type/code/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles Feed</title>
    <link>http://software.intel.com/en-us/articles/tools/type/code/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Ultrabook™ and the Intel® Energy Checker SDK</title>
      <description><![CDATA[ <h2 class="sectionHeading">Abstract</h2>
With the advent of the Ultrabook™<sup>1</sup>, the demand for applications that are power misers continues to rise. The Intel® Energy Checker SDK can be used to instrument an application and collect data to help a developer pinpoint power hungry features that can be optimized for power. This article gives an overview of the Intel Energy Checker SDK and discusses how it can be used to advantage when improving energy usage on an Ultrabook.<br /><br />
<h2 class="sectionHeading">More Work, Less Power</h2>
An Ultrabook™ needs to budget its power consumption very carefully to extend usefulness while running on battery. Therefore, applications that use less energy are preferred. Often, application developers create their program on a desktop system where power/energy consumption is less important than raw performance. Not only should applications be developed to conserve power when active, they should also be developed to minimize energy usage during program idle periods, this is often overlooked and can greatly extend battery life. If power issues are ignored, running a program on an Ultrabook will result in unpleasant surprises for the user. If developers test their application on an Ultrabook system during development, they will gain insight into how well the program runs in a power limited environment. An analysis tool such as the <a href="http://software.intel.com/en-us/articles/intel-energy-checker-sdk/">Intel® Energy Checker SDK</a> can be a powerful companion during the optimization phase for software designed for an Ultrabook.<br /><br />
<h2 class="sectionHeading">Energy Efficency</h2>
Before explaining what Intel Energy Checker SDK contains, a discussion on Energy Efficiency (EE) is in order. This is a term that is used extensively in the Intel Energy Checker SDK. There is no universally accepted definition of EE, so for the purposes of this tool it is defined as:<br />
<p ><em>EE=Work/Energy</em></p>
<em>Work</em> is defined as the amount of “<em>useful work</em>” done by a software application. There is no concise, easy definition of the term <em>useful work</em> either, as what is considered <em>useful work</em> in one program may be quite different in another application. The developer is required to make that determination. For example, one might consider the areas of a movie player program where it provides the customer value (such as decoding the movie) as useful work whereas areas of the program that are accessing resources, waiting on input, or performing synchronization would not.<br /><br />
<h2 class="sectionHeading">Code Instrumentation</h2>
The first step in using Intel Energy Checker SDK to help determine an application’s EE is to create and use “counters” in the software to determine quantities of “useful work”. A counter is defined as a 64-bit (8 byte) variable that keeps a running total of how many times a particular event occurs. In the “C” language, this becomes an unsigned long long data type. A developer can create one or more counters during the initialization portion of the software. Next, a container for the counters can be created, called a “Productivity Link” (PL)<sup>2</sup>. Each PL holds up to 512 counters, and up to 10 different PL’s can be open at one time, but most software will require far smaller numbers of counters and PL’s.<br /><br />During the application runtime, values can be written to any counter in the PL, based on the developer’s requirements. Intel Energy Checker SDK can collect the information from the PL’s in order to determine how much work was done.<br /><br />
<h2 class="sectionHeading">Energy Consumed</h2>
The second part of finding the EE of a software application is to measure how much energy was consumed while the program was running. To do this, Intel Energy Checker SDK uses two tools which are included in the SDK download: Energy Server (ESRV) and Temperature Server (TSRV). ESRV is used to monitor energy and power consumption as reported by external power tools while TSRV monitors temperature related information as reported by environmental probes. ESRV and TSRV counters can be accessed by any program using the Intel Energy Checker API. In addition to the counters created by the developer to determine quantities of work, the developer will want to add counters to collect information from ESRV and possibly TSRV. There are three different ways to set up ESRV:<br /><br /><ol>
<li>Use a power meter to collect actual “platform energy and power” information.<br /><br />There are several different power meters that work with the Intel Energy Checker SDK. Please consult the <em>Intel® Energy Checker SDK User Guide</em> included in the download or found on the <a href="http://software.intel.com/en-us/articles/intel-energy-checker-sdk/">Intel® Energy Checker SDK page</a> to determine which power meters will work and how they should be attached to the test system.<br /></li>
<li>Use <a href="http://software.intel.com/en-us/articles/intel-power-gadget/">Intel® Power Gadget</a> to collect “processor energy and power” usage information on 2nd Generation Intel Core™ processor family. External power meters can also be used which report platform power together with Intel Power Gadget that provides processor power.The blog Accessing Intel® Power Gadget From Intel® Energy Checker SDK by Intel engineer Jun De Vega discusses how to enable Intel® Power Gadget with Intel® Energy Checker.<br /></li>
<li>Choose to use the simulation method which will use the CPU utilization percentage returned from the OS. This method does not require a hardware probe. The Intel Energy Checker SDK offers this method as an option for all processors (rather than just the 2nd Generation Intel Core processor family as with the Intel Power Gadget) in order for enable the user who does not have a power meter. Included in the SDK is a support library for accessing this metric.</li>
</ol>
<p ><img src="http://software.intel.com/file/41168" /><br /><br /><strong>Figure 1:</strong> Conceptualized drawing of Intel Energy Checker setup with Instrumented Application, Power Meter and Environmental probes attached</p>
<h2 class="sectionHeading">Intel Energy Checker Extras</h2>
There are two companion tools that are bundled with the Intel Energy Checker SDK in addition to those already mentioned. The PL GUI Monitor is a user interface that displays Productivity Link (PL) counters in a running program that has already been instrumented with the Intel Energy Checker API. The PL CSV Logger<sup>3</sup> is an application that can collect and write PL counters to a CSV file for later analysis in a variety of spreadsheet applications.<br /><br />Included with the Intel Energy Checker SDK is the <em>Intel® Energy Checker SDK Companion Application User Guide</em> that discusses the features and capabilities of both of these tools.<br /><br />
<p ><img src="http://software.intel.com/file/41169" /><br /><br /><strong>Figure 2:</strong> PL GUI Monitor running while a picture is being rendered</p>
The entire Intel Energy Checker SDK includes other build, scripting, interoperability, and monitoring tools to help developers instrument code and collect energy metrics.<br /><br />A white paper entitled “<em>How Green Is Your Software?</em>” is available for download from the SDK site. This paper discusses approaches for making software power efficient. Look for it in the “Code, Resources and Documentation” section of the <a href="http://software.intel.com/en-us/articles/intel-energy-checker-sdk/">Intel Energy Checker SDK page</a>. Several blogs about Intel Energy Checker that were written by Intel Engineer Jamel Tayeb will also be helpful:<br /><br /><a href="http://software.intel.com/en-us/blogs/2010/04/15/using-the-intel-energy-checker-sdk-at-home/?wapkw=(Energy+Checker)">Using the Intel® Energy Checker SDK at Home</a><br /><br /><a href="http://software.intel.com/en-us/blogs/2010/02/19/creating-a-simple-device-library-for-intel-energy-checker-sdk/?wapkw=(Energy+Checker)">Creating a Simple Device Library for Intel® Energy Checker SDK</a><br /><br /><a href="http://software.intel.com/en-us/blogs/2010/03/30/measuring-the-energy-consumed-by-a-command-using-the-intel-energy-checker-sdk/?wapkw=(Energy+Checker)">Measuring the energy consumed by a command using the Intel® Energy Checker SDK</a><br /><br />All of these resources allow a developer to get started in gathering helpful information.<br /><br />
<h2 class="sectionHeading">Optimizing Applications for Ultrabooks</h2>
Once a program has been instrumented to collect counter information and an energy collection plan is in place (either simulation or power meter), the setup is complete. The developer will then be able to gather information about the application’s energy usage profile and to incorporate optimizations to improve results.<br /><br />There are several areas of optimization the Ultrabook developer can select for improvements:<br /><br />
<div >Consider modifying the application to be aware of the power status and changing usage to reduce energy consumption when the system is on battery.<br /><br />Check the hardware and software system power management possibilities to choose a balanced power setting. This could be a recommended setting suggested in application documentation.<br /><br />Reduce power usage while the application is actively running or doing work. Compute intensive parts of the program will likely benefit from multi-threading and vectorization techniques.<br /><br />Reduce power usage while the application is idle. Being able to minimize the timer tick rate or setting up periodic actions to happen within the same wakeup period are examples of how to reduce idle application power usage.</div>
<br /><br />
<h2 class="sectionHeading">Summary</h2>
With the growth of Ultrabook devices, it will benefit program designers and developers to take a look at ways to save energy while providing a great user experience on an Ultrabook. Intel Energy Checker SDK can provide the means to identify the key areas of focus and confirm the positive results achieved after optimization. Long live Ultrabook!<br /><br />
<h2 class="sectionHeading">About the Author</h2>
<img src="http://software.intel.com/file/41170"  /> Judy Hartley is a Software Applications Engineer who has been working in the Software and Services Group since 2005. She has contributed to many software products and written about her experiences through blogs and whitepapers. Recently Judy has been working on Graphics and Power tools and training for future Intel processors.<br /><br  />
<hr />
<br /><sup>1</sup> Ultrabook is a trademark of Intel Corporation in the U.S. and/or other countries.<br /><br /><sup>2</sup> A Productivity Link is a term used by Intel Energy Checker to represent an arbitrary or logical collection of counters.<br /><br /><sup>3</sup> CSV is the acronym for Comma Separated Values.<br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/ultrabook-and-the-intel-energy-checker-sdk/</link>
      <pubDate>Tue, 24 Jan 2012 00:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/ultrabook-and-the-intel-energy-checker-sdk/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/ultrabook-and-the-intel-energy-checker-sdk/</guid>
      <category>Mobility</category>
      <category>What If Experimental Software</category>
      <category>Tools</category>
      <category>Intel Software Network communities</category>
      <category>Intel SW Partner program</category>
      <category>Code &amp; Downloads</category>
      <category>Power Efficiency</category>
      <category>Resources For Software Developers</category>
      <category>Ultrabook</category>
    </item>
    <item>
      <title>How to Automate Static Security Analysis with Intel(R) C++ Compiler for Linux*</title>
      <description><![CDATA[ <p>Automate the static security analysis check done by the Intel(R) C++ Compiler for Linux. Static security analysis is the process of finding errors and security weaknesses in software through detailed analysis of source code.<br /><br />An automated quality gate like this one can notably reduce code reviews efforts, and of course will decrease the likely of having bugs and security threats found once the product is in production. <br /><br />To automate the static security analysis as a quality gate in any project, execute the check without graphical user interface which requires human interaction.</p>
<p> </p>
<p>In the case of legacy projects, ask the developers to submit new code only if they reduce the number of findings.<br />In the case of coding from scratch, allow no findings before uploading new code in your repository.<br /><br />When enabling the check (<strong>-diag-enable sc3</strong>) and compiling the code, a new folder will be created where the findings will be stored using a custom XML format.</p>
<blockquote>
<p>$ file rXsc/data.X/rXsc.pdr<br />rXsc/data.X/rXsc.pdr: XML document text</p>
</blockquote>
<br />The xmlstar* package can be used to easily list the findings and the associated location information (file, line and function). The package provides a command line tool to process XML documents.<br /><br /><a href="http://xmlstar.sourceforge.net/">http://xmlstar.sourceforge.net</a><br /><br />The following line can be used to verify that no findings are found before proceeding with the usual development cycle. <br /><br />
<blockquote>
<p>$ xml sel -t -m /diags/diag -v "concat(message/thread/stacktrace/loc/file, ':', message/thread/stacktrace/loc/line, ':', sc_verbose)" -n rXsc/data.0/rXsc.pdr <br />/home/$USER/work/$PROD/src/pool.c:157:pool.c(157): warning #12178: this value of "ret" isn't used in the program<br />/home/$USER/work/$PROD/src/pool.c:186:pool.c(186): error #12192: unreachable statement<br />/home/$USER/work/$PROD/src/pool.c:216:pool.c(216): warning #12135: procedure "pool_done" is never caled</p>
</blockquote>
<p> </p> ]]></description>
      <link>http://software.intel.com/en-us/articles/how-to-automate-static-security-analysis-with-intelr-c-compiler-for-linux/</link>
      <pubDate>Fri, 13 Jan 2012 00:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/how-to-automate-static-security-analysis-with-intelr-c-compiler-for-linux/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/how-to-automate-static-security-analysis-with-intelr-c-compiler-for-linux/</guid>
      <category>Tools</category>
      <category>Intel Software Network communities</category>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Resources For Software Developers</category>
    </item>
    <item>
      <title>Using Intel® TBB in network applications: Network Router emulator</title>
      <description><![CDATA[ <p><b>Introduction</b></p>
<p>Intel® Threading Building Blocks is used in wide range of applications. If performance makes sense and multi core platform is used, TBB is good thing to be added to C++ program. Network applications are usually highly-loaded as they process huge amount of traffic and processing time constraints are high. This article is intended to show how TBB can be used in network packet processing software, improving its productivity and processing time.</p>
<p>For a sample project I've created a simplified Network Router emulator. Network Router is a device that routes and transmits IP (Internet Protocol) packets in local area network (LAN). It connects several PCs, provides them access to Internet and internal network. The device has several internal network interfaces and one external.</p>
<p>The sample project emulates Network Router logic. It provides the following functionality:</p>
<ul>
<li>Input packets from file - the application is just a model so there is no need for real interconnection with network interface. Reading from file emulates real reading from network interface.</li>
<li>NAT - Network Address Translation. The router has only one external IP address, but packets should be delivered to several internal devices behind the router. NAT allows port and IP mapping from external to internal and vice versa.</li>
<li>IP routing - delivering packets to appropriate router NIC (Network Interface Controller) according to destination IP.</li>
<li>Bandwidth management - some traffic is real time and it's critical to deliver these packets as quick as possible (e.g. voice over IP). The VoIP protocols maintain telephone conversation and delays would degrade quality. The router can prioritize these critical packets so they can be processed quicker.</li>
</ul>
<p>I've created two versions of Network Router: serial and parallel. The latter uses Intel® Threading Building Blocks. I'll describe how TBB was used in the project and will provide performance results of the program parallelization.</p>
<p><b>Network Router implementation</b></p>
<p>Network router emulator gets packets from file and processes them. Packet processing includes Bandwidth management, NAT translation and IP routing. Packets are processed by several program modules. These processing modules are ordered sequentially, like in assembly line. This is common composition of packet processing application. Input file is a text file, each line represents one IP packet. There is separate thread that reads packets by big chunks.</p>
<p>Intel® TBB has tbb::pipeline class that provides high level framework for such kind of program structure. It has filters that process packets on each stage. Each packet goes through the pipeline and is processed step by step by its filters. One packet is processed sequentially - from first filter to second, than third, etc. However processing of one packet is independent from another, so filters can operate in parallel.</p>
<p ><br />Network Router scheme<br /><img height="256" width="531" src="http://software.intel.com/file/36534"  /></p>
<p><br /><br />Main function:</p>
<pre name="code" class="cpp">#include &lt;iostream&gt; 
#include &lt;sstream&gt;
#include &lt;fstream&gt;
#include &lt;vector&gt;
#include &lt;algorithm&gt;
#include &lt;ittnotify.h&gt;
#include &lt;tbb/pipeline.h&gt;
#include &lt;tbb/concurrent_hash_map.h&gt;
#include &lt;tbb/atomic.h&gt;
#include &lt;tbb/concurrent_queue.h&gt;
#include &lt;tbb/compat/thread&gt;
// Redirects calls to "new" and "delete" to TBB thread safe allocators
#include &lt;tbb/tbbmalloc_proxy.h&gt;

using namespace tbb;
using namespace std;

class bandwidth_manager_t;
class network_adress_translator_t;
class ip_router_t;
class compute_t;
typedef vector&lt;packet_trace_t&gt; packet_chunk_t;

int chunk_size = 1600;
concurrent_queue&lt;packet_chunk_t&gt; chunk_queue;
atomic&lt;bool&gt; stop_flag;

int main(int argc, char* argv[])
{
	ip_addr_t external_ip;
	nic_t external_nic;	
	nat_table_t nat_table;	// NAT table   
	ip_config_t ip_config;	// Router network configuration 					
	int ntokens = 24;	
	
	get_args (argc, argv);	
    ifstream config_file (config_file_name);

    if (!config_file) {
        cerr &lt;&lt; "Cannot open config file " &lt;&lt; config_file_name &lt;&lt; "\n";
        exit (1);
    }		
	if (! initialize_router (external_ip, external_nic, 
                            ip_config, config_file)) exit (1);	
	
	thread input_thread(input_function);

	// packet processing objects
	bandwidth_manager_t bwm;	
	network_adress_translator_t nat(external_ip, external_nic, nat_table);
	ip_router_t ip_router(external_ip, external_nic, ip_config);		

__itt_resume();
	bool stop_pipeline = false;	
	
	parallel_pipeline(ntokens,		
		make_filter&lt;void, packet_chunk_t*&gt;(		// Input filter
			filter::parallel,
			[&amp;](flow_control&amp; fc)-&gt; packet_chunk_t*{				
				
				if (stop_pipeline){					
					fc.stop();
				}				
				packet_chunk_t* packet_chunk = new packet_chunk_t(chunk_size);
					
				if(!chunk_queue.try_pop(*packet_chunk)){				
					if (stop_flag) {
						stop_pipeline = true;
					}
				}				
				return packet_chunk;
			}
		)&amp;	// Bandwidth manager filter
		make_filter&lt;packet_chunk_t*, packet_chunk_t*&gt;(		
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk)-&gt; packet_chunk_t*{								
				
				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){
					packet_trace_t packet;
					packet = (*packet_chunk)[i];				
					
					if (packet.nic == empty){
						break;
					}
					else{
						bwm.prioritize(packet);									
						compute_t compute;
						compute.work();						
					}										
				}
				std::sort(packet_chunk-&gt;begin(), packet_chunk-&gt;end(),
							packet_comparator);
				return packet_chunk;	
			}
		)&amp;	// NAT filter
		make_filter&lt;packet_chunk_t*, packet_chunk_t*&gt;(	
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk)-&gt; packet_chunk_t*{

				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){	
					packet_trace_t packet;

					packet = (*packet_chunk)[i];					
					if (packet.nic == empty)
						break;
					else{				
						nat.map(packet);
						compute_t compute;
						compute.work();	
					}
				}				
				return packet_chunk;
			}
		)&amp;	// IP routing filter
		make_filter&lt;packet_chunk_t*, packet_chunk_t*&gt;(		
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk)-&gt; packet_chunk_t*{			

				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){						
					packet_trace_t packet;
					packet = (*packet_chunk)[i];
					
					if (packet.nic == empty)
						break;
					else{				
						ip_router.route(packet);
						compute_t compute;
						compute.work();	
					}
				}				
				return packet_chunk;
			}
		)&amp;	// Output filter
		make_filter&lt;packet_chunk_t*, void&gt;(	
			filter::parallel,
			[&amp;](packet_chunk_t* packet_chunk){														
				
				for(int i=0; i&lt;packet_chunk-&gt;size(); i++){						
					packet_trace_t packet;
					packet = (*packet_chunk)[i];	
					compute_t compute;
					compute.work();	

					if (packet.nic == empty)
						break;
				}	
				// No output is required , just drop packets
				delete packet_chunk; 
			}
		)
	);	
__itt_pause();

	cout &lt;&lt; "\nAll packets are processed\n\n";		
	return 0;
}</pre>
<br />
<p>First part is "preparation" - creating objects, reading command line, opening files and initializing. Configuration file contains router interfaces info. Objects bwm, nat and ip_router are packet processing objects. They use containers nat_table and ip_config for storing NAT and IP tables.</p>
<p>The core component of Network Router is pipeline. It is implemented using tbb::parallel_pipeline() function, that takes number of tokens and list of filters as arguments. The element of work that is passed through the pipeline is of type packet_chunk_t. Parameter ntokens controls maximum number of concurrently processed elements. It has value 24 because the project was tested on 24-core machine and making it bigger wouldn't make an effect.</p>
<p>Pipeline filters perform some work execution, particularly packet processing in this application. Filters can be serial or parallel. This mode is controlled by filter parameter that is filter::parallel for all filters. This means that any filter can process some elements at the same time.</p>
<p>First filter extracts packet chunk from chunk_queue and passes it to second filter. Second filter performs bandwidth management operations on each packet from chunk. bwm module assigns priorities to packets according to protocol. Then packets in chunk are sorted by priority. This allows critical traffic to be processed as early as possible.  Subsequent filters make NAT mapping and IP routing. Last filter is output, but for simplicity real output is not done. Packets are just dropped.</p>
<p>Packet chunk is used as pipeline token because it's big enough. If single packets were passed through pipeline there would be too much transitions between threads, and overhead would be bigger than positive effect.</p>
<p>The __itt_resume() and __itt_pause() functions are used by Intel® VTune<sup>TM</sup> Amplifier XE that was used for performance measurements. These API functions mark the beginning and the end of area of interest.</p>
<p>Object compute of type compute_t makes workload for CPU. It just performs additional computations to simulate computing in real systems. The application doesn't perform the entire job needed for processing and routing packets in real life network equipment. It is just model framework of real application, so there is not enough CPU usage. Method compute_t:: work()starts computing "N Queens" algorithm.</p>
<p>Input file opening and reading is a job of separate thread. It is instantiated using std::thread class that is a part of new upcoming C++ 11 standard.</p>
<p><b>Serial implementation</b></p>
<p>To understand effect from parallelization a serial version was created. It has similar structure. The only difference is that parallel_pipeline is replaced with simple while loop.</p>
<p >Network router serial scheme<br /><br /><img height="248" width="459" src="http://software.intel.com/file/36533" /></p>
<p>While loop (replacing parallel_pipeline):</p>
<pre name="code" class="cpp">__itt_resume();
	bool stop = false;

	while (!stop){
		packet_chunk_t packet_chunk(chunk_size);
		
		if(!chunk_queue.try_pop(packet_chunk)){				
			if (stop_flag) {
				stop = true;
			}
		}		
		
		for(int i=0; i &lt; packet_chunk.size(); i++){
			packet_trace_t packet = packet_chunk[i];;			
			bwm.prioritize(packet);	
			compute_t compute;
			compute.work();									
		}
		std::sort(packet_chunk.begin(), packet_chunk.end(), packet_comparator);
		for(int i=0; i &lt; packet_chunk.size(); i++){
			packet_trace_t packet = packet_chunk[i];				
			nat.map(packet);
			compute_t compute;
			compute.work();		
			ip_router.route(packet);				
			compute.work();							
			compute.work();								
		}
	}
__itt_pause();</pre>
<p><br />There are four calls of compute.work() - the same number as in TBB version. This is going to be the most CPU time consuming function, so it's fair to have same number of calls to it.</p>
<p><b>Data structures</b></p>
<p>Input file has the following format:</p>
<p class="code">eth3 104.44.44.10 10.230.30.03 4003 5003 ftp<br />eth3 104.44.44.10 10.230.30.03 4003 5003 rtp<br />eth0 134.77.77.30 104.44.44.10 2004 4003 sip<br />eth3 104.44.44.10 10.230.30.03 4003 5003 http</p>
<p>Each line represents one packet. It has network interface, source, destination IP and port, protocol. Packet is stored in packet_trace_t structure:</p>
<pre name="code" class="cpp">typedef struct {
	nic_t nic;			// network interface where packet arrived
	ip_addr_t destIp;		// destination IP
	ip_addr_t srcIp;		// source IP
	port_t destPort;		// destination port
	port_t srcPort;		// source port 
	protocol_t protocol;	// protocol type (rtp, ftp, http, sip, etc)
	int priority;			// packet priority
} packet_trace_t;
</pre>
<br />NAT table and IP configuration table are stores in tbb::concurrent_hash_map. Packet chunk is stored in std::vector and chunk queue is of type tbb::concurrent_queue:<br /><br />
<pre name="code" class="cpp">typedef concurrent_hash_map&lt;port_t, address*, string_comparator&gt; nat_table_t; 
typedef concurrent_hash_map&lt;ip_addr_t, nic_t, string_comparator&gt; ip_config_t; 
typedef vector&lt;packet_trace_t&gt; packet_chunk_t;
concurrent_queue&lt;packet_chunk_t&gt; chunk_queue;
</pre>
<br />Input file reading is made by separate thread that executes input_function. The input_function opens file and reads it. Reading is performed by chunks that are passed to chunk queue. TBB containers are thread-safe, so main thread can read from the chunk queue at the same time without making additional synchronization manually. Input thread function:<br /><br />
<pre name="code" class="cpp">void input_function(){	
    ifstream in_file (in_file_name);
    if (!in_file) {
        cerr &lt;&lt; "Cannot open input file " &lt;&lt; in_file_name &lt;&lt; "\n";
        exit (1);
    }
	stop_flag = false;	
	
	while(in_file.good()){			
		packet_chunk_t packet_chunk(chunk_size);
								
		for(int i=0; i&lt;chunk_size; i++){
			packet_trace_t packet;
			in_file &gt;&gt; packet;					
			packet_chunk[i] = packet;			
		}
		chunk_queue.push(packet_chunk);			
	}
	stop_flag = true;
}</pre>
<br />
<p><b>Performance measurements</b></p>
<p>The goals of this project were to achieve good performance and scalability by using TBB. For measurements the following setup was used:</p>
<p>CPU: 4 processors Intel® Xeon X7460, 2,66 Ghz, 24 physical cores total <br />RAM: 16 GB <br />OS: Microsoft Windows Server® Enterprise 2008 SP2 <br />Workload: input file: 113405 packets (5,1 MB) <br />Measurement tool: Intel® VTune<sup>TM</sup> Amplifier XE 2011 <br />Analysis type: Concurrency with default settings</p>
There were performed two tests: for serial and for parallel versions. Below are summaries from the two analyses. Left is for serial and right is for TBB versions:<br /><br />
<p ><img height="326" width="599" src="http://software.intel.com/file/36538" /></p>
<br />
<p>It's seen that CPU time is similar. This is sum of CPU times of all cores of the system. But elapsed time is very different. This is clock time that the application takes for processing. In serial version it is near the value of overall CPU time. In TBB version it is 19 times less. So the application worked 19 times faster.</p>
CPU usage for serial version:<br /><br />
<p ><img height="265" width="766" src="http://software.intel.com/file/36535" /></p>
<br />CPU usage for TBB version:<br /><br />
<p ><img height="258" width="770" src="http://software.intel.com/file/36536" /></p>
<br /><br />
<p>Average number of utilized cores for TBB version is 20.5 and most of the processing time all 24 were used. This demonstrates that application is scalable enough and can use almost all cores on multi-core system.</p>
Bottom-up view of serial application shows that almost all the time is spent for computing module simulating real workload:<br /><br />
<p ><img height="298" width="856" src="http://software.intel.com/file/36537" /></p>
<br /><br />In TBB version picture is very similar, main hotspot is the same compute_t::do_work method. However it's mostly indicated with green that means good CPU utilization. Also there are more functions in the list because of using TBB constructions:<br /><br />
<p ><img height="424" width="770" src="http://software.intel.com/file/36540" /></p>
<br /><br />
<p>The results provided show good performance results for TBB-based application. However keep in mind the following conditions:</p>
<p>1) There were used Amplifier XE API functions __itt_resume() and __itt_pause() that bound measured area. The result show performance of tbb::parallel_pipeline for TBB version and while loop for serial version. Measurements of overall application work will give a little bit different results.</p>
<p>2) Simulated job was used to utilize CPU. The compute_t class computes algorithm of "N queens" task. Real processing is different.  If there would be not enough job for CPU, file input would consume relatively more time. So in real application scalability and performance gain can be worse.</p>
<p><strong>Conclusion</strong></p>
This sample project shows possibility of using TBB in composing Network packet processing applications and applicability of tbb::pipeline. These approaches can be applied in IP routing switches, telecommunication servers (VoIP telephony, video conferencing), various gateways and proxies, etc.  Like any hardly-loaded application network software can win from enabling multi-threading. And it is simple and effective to use Intel® Threading Building Blocks for managing parallelism in your application.
<div><br /></div>
<div>The full project source code:</div>
<div><a target="_blank" href="http://software.intel.com/file/36623">NetworkRouter.cpp</a></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/network-router-emulator/</link>
      <pubDate>Mon, 23 May 2011 13:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/network-router-emulator/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/network-router-emulator/</guid>
      <category>Parallel Programming</category>
      <category>Tools</category>
      <category>Intel Software Network communities</category>
      <category>Intel Software Network communities</category>
    </item>
    <item>
      <title>Fireflies - Scalable Ambient Effects</title>
      <description><![CDATA[ <link media="screen" href="http://software.intel.com/media/gamedev/css/3302_Intel_VC_01.css?v=11" type="text/css" rel="stylesheet" />
<link media="screen" href="http://software.intel.com/file/23729" type="text/css" rel="stylesheet" />
<table width="100" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top">
<div id="left_container">
<div id="header_content"><a href="http://software.intel.com/en-us/visual-computing/" title="Visual Computing Developer Community"><img height="96" width="727" src="http://software.intel.com/file/20493/" border="0" /></a></div>
<div id="left_content_container2"><!-- START left content -->
<div id="showcase_01">
<div >
<h2>Scalable Ambient Effects (Fireflies)</h2>
<p>Fireflies is a tech sample demonstrating a scalable ambient effect. In this sample, the ambient effect is a swarm of fireflies that scatter and reform into a walking character. Using Intel TBB, the firefly flight trajectory calculations performed per frame are distributed across multiple threads. By changing the number of simulated fireflies programmatically the ambient effect can be scaled to better match the performance of the platform it is running on.</p>
<p><a href="http://software.intel.comjavascript:void(0)" onclick="ndownload('http://software.intel.com/file/33362')" title="Fireflies Source"><img src="http://software.intel.com/file/25370" border="0" /></a><br /><br /><a href="http://software.intel.comjavascript:void(0)" onclick="ndownload('http://software.intel.com/file/33363')" title="Fireflies Installer" class="filedownload"><img src="http://software.intel.com/file/25371" border="0" /></a></p>
</div>
<div >
<p>
<object height="203" width="360" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000">
<param name="flashvars" value="file=http://software.intel.com/media/videos/e/f/2/a/4/b/e/Eliezer_Payzer_Firefly_Demo_V5.mp4&amp;image=http://software.intel.com/media/videos/e/f/2/a/4/b/e/ef2a4be5473ab0b3cc286e67b1f59f44_player.jpg&amp;autostart=false&amp;bufferlength=5&amp;allowfullscreen=true&amp;plugins=http://software.intel.com/common/swf/listen&amp;title=Ambient+Scalable+Effects+Fireflies+Demo+" />
<param name="allowfullscreen" value="true" />
<param name="src" value="http://software.intel.com/common/swf/mediaplayer.swf" /><embed src="http://software.intel.com/common/swf/mediaplayer.swf" allowfullscreen="true" flashvars="file=http://software.intel.com/media/videos/e/f/2/a/4/b/e/Eliezer_Payzer_Firefly_Demo_V5.mp4&amp;image=http://software.intel.com/media/videos/e/f/2/a/4/b/e/ef2a4be5473ab0b3cc286e67b1f59f44_player.jpg&amp;autostart=false&amp;bufferlength=5&amp;allowfullscreen=true&amp;plugins=http://software.intel.com/common/swf/listen&amp;title=Ambient+Scalable+Effects+Fireflies+Demo+" type="application/x-shockwave-flash" height="203" width="360"></embed>
</object>
</p>
<center><a href="http://software.intel.com/en-us/videos/ambient-scalable-effects-fireflies-demo-1/?wapkw=(fireflies">Fireflies Video (larger screen)</a></center>
<p><b><br />Read:</b> <a href="http://software.intel.com/en-us/articles/scalable-ambient-effects/" title="Scalable Ambient Effects">Scalable Ambient Effects<br /></a><b>Blog Post:</b> <a href="http://software.intel.com/en-us/blogs/2010/12/06/multithreaded-man-explodes-into-fireflies/" title="Multithreaded Man Explodes Into Fireflies">Multithreaded, Man Explodes Into Fireflies!</a></p>
</div>
<br clear="all" />
<div>
<table bgcolor="#ffffff" width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td><img height="37" width="531" src="http://software.intel.com/file/25372" /></td>
<td></td>
</tr>
</tbody>
</table>
<table bgcolor="#ffffff" cellpadding="0" bordercolor="#ffffff" cellspacing="6" border="0">
<tbody>
<tr>
<td width="214" valign="top">
<div align="center"><a href="http://software.intel.com/file/32677"><img src="http://software.intel.com/file/32607" alt="Fireflies_screenshot1_web.jpg" /></a></div>
</td>
<td width="234" valign="top">
<div align="center"><a href="http://software.intel.com/file/32678" title="Fireflies image 2"><img src="http://software.intel.com/file/32608" alt="Fireflies_screenshot2_web.jpg" title="Fireflies_screenshot2_web.jpg" /></a></div>
</td>
<td width="256" valign="top">
<div align="center"><a href="http://software.intel.com/file/32680" title="Fireflies image 3"><img src="http://software.intel.com/file/32609" alt="Fireflies_screenshot3_web.jpg" title="Fireflies_screenshot3_web.jpg" /></a></div>
</td>
</tr>
<tr>
<td valign="top">
<table align="center" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<p><i>Fireflies flock to form a walking character</i></p>
</td>
</tr>
</tbody>
</table>
</td>
<td valign="top">
<table align="center" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<p><i>Fireflies scatter and flock </i><a href="http://software.intel.com/file/23694/"></a></p>
</td>
</tr>
</tbody>
</table>
</td>
<td valign="top">
<div align="center">
<table align="center" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td width="161" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<p align="center">The sample can run in multithreaded as well as serial mode to better see the performance benefit of multithreading an ambient effect.</p>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<br /><br /><!-- start of 3 column table -->
<table width="695" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="695" rowspan="2" valign="top">
<table width="695" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td valign="top"><img height="8" width="697" src="http://software.intel.com/file/22889" /></td>
</tr>
<tr>
<td valign="top" class="panel_bg_02">
<table width="695" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="12" rowspan="2"><img height="8" width="12" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
<td valign="top" height="4"><img height="8" width="8" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top">
<table width="100%" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td align="center" width="33%" valign="top" height="19" class="arrow"><span ><b>What is it?</b></span></td>
<td align="center" width="33%" valign="top" height="19" class="arrow"><span ><b>System Requirements</b></span></td>
<td align="center" width="33%" valign="top" height="19" class="arrow"><span ><b><a href="http://software.intel.com/en-us/articles/code/">Additional Code Samples</a></b></span></td>
</tr>
<tr>
<td align="left" width="33%" valign="top" height="19" class="arrow"></td>
<td align="left" width="33%" valign="top" height="19" class="arrow"></td>
<td align="left" width="33%" valign="top" height="19" class="arrow"></td>
</tr>
<tr>
<td align="left" width="33%" valign="top" height="19">
<ul>
<li>Threaded particle system using <a href="http://www.threadingbuildingblocks.org/">Intel® Threading Building Blocks</a></li>
<li>Scalable Ambient Effects </li>
</ul>
</td>
<td align="left" width="33%" valign="top" height="19"><ol type="1">
<li>CPU: Dual core or better (Intel® Core™ i5 or better suggested)</li>
<li>GFX: DX9c capable graphics card </li>
<li>OS: Microsoft Windows Vista* or Microsoft Windows 7*</li>
<li>MEM: 2 GB of RAM or better </li>
<li>Software: <ol type="1">
<li>DirectX SDK (June 2010 release or later)</li>
<li>Build with Microsoft Visual Studio 2008* w/SP1 or Visual Studio 2010*</li>
</ol></li>
</ol>
<p>* Other names and brands may be claimed as the property of others.</p>
</td>
<td align="left" width="33%" valign="top" height="19">
<ul>
<li><a href="http://software.intel.com/en-us/articles/tickertape/" title="TickerTape">TickerTape Demo</a></li>
<li><a href="http://software.intel.com/en-us/articles/smoke-technology-demo/" title="Smoke">Smoke Game Technology </a></li>
<li><a href="http://software.intel.com/en-us/articles/ocean-fog-using-direct3d-10/">OceanFog Using Directed3D 10 </a></li>
</ul>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<!--bottom border for large box-->
<tr>
<td valign="top"><img height="8" width="697" src="http://software.intel.com/media/gamedev/_images/footer-bg-01.gif" /></td>
</tr>
<!--end border-->
</tbody>
</table>
</td>
<td width="10" rowspan="2"><img height="10" width="10" src="http://software.intel.com/media/gamedev/_images/blank.gif" /></td>
</tr>
<tr>
<td></td>
<!--raghava-->
</tr>
<tr>
<td width="344" valign="top"></td>
</tr>
</tbody>
</table>
<!-- end of 3 column table --><br /><br /></div>
</div>
</div>
</td>
<td valign="top" ><!-- RHC -->
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td align="center" width="215">
<table align="center" width="223" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td height="4"><img height="4" width="232" src="http://software.intel.com/file/20516" /></td>
</tr>
<tr>
<td>
<table align="center" width="223" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td align="center" valign="top"><a href="http://www.intelsoftwaregraphics.com/?lid=5ceakfXf8Ho=&amp;siteid=cqMoF5H/37o="><img height="71" width="223" src="http://software.intel.com/file/20512" alt="Intel Visual Adrenaline" border="0" title="Intel Visual Adrenaline" /></a></td>
</tr>
<tr>
<td valign="top" >
<table width="223" cellpadding="0" cellspacing="0" border="0" >
<tbody>
<tr>
<td width="11" height="8"></td>
<td align="center" width="10"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://software.intel.com/en-us/visual-computing/" title="Intel Adrenaline Developer Community" >Developer Community</a></td>
<td width="10"></td>
</tr>
<tr>
<td height="8"></td>
<td align="center"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://www.intel.com/cd/software/partner/asmo-na/eng/index.htm" title="Intel Adrenaline Software Partner Program" >Intel® Software Partner Program</a></td>
<td></td>
</tr>
<tr>
<td height="8"></td>
<td align="center"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://www.intel.com/Consumer/Game/index.htm" title="Intel Adrenaline Game On" >Game On</a></td>
<td></td>
</tr>
<tr>
<td height="8"></td>
<td align="center"><img height="5" width="5" src="http://software.intel.com/file/20514" /></td>
<td align="left"><a href="http://www.intelsoftwaregraphics.com/?lid=5ceakfXf8Ho=&amp;siteid=cqMoF5H/37o=" title="Intel Adrenaline Showcase" >Showcase</a></td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td valign="top" height="7"><img height="7" width="223" src="http://software.intel.com/file/20515" /></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td valign="top" height="4"><img height="6" width="6" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td></td>
</tr>
</tbody>
</table>
<br /><center>
<table cellpadding="0" cellspacing="0" border="0" id="nav_table">
<tbody>
<tr>
<td>
<table width="190" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="9"></td>
<td>
<div align="center" ><b>A Scalable 3D <br />Particle System</b><br /><a href="http://software.intel.com/en-us/articles/tickertape/" title="TickerTape"><img src="http://software.intel.com/file/25664/" alt="Download PDF" border="0" /></a><br /><br /><b>Benefits of SIMD</b><br /><a href="http://software.intel.com/en-us/articles/tickertape-part-2/"><img src="http://software.intel.com/file/25665/" alt="Download PDF" border="0" /></a><br /><br /><b>Visual Adrenaline</b><br /><a href="http://software.intel.com/sites/billboard/"><img src="http://software.intel.com/file/25369" alt="Download PDF" border="0" /></a><br /></div>
</td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center><br /><center>
<table cellpadding="0" cellspacing="0" border="1" id="nav_table">
<tbody>
<tr>
<td>
<table width="190" cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr>
<td width="9" class="right_container_hdr2"></td>
<td class="right_container_hdr2"><b>Intel Tools for Unreal Developers <br /><a href="http://software.intel.com/en-us/articles/epic-licenses-tbb-for-ue-licensees/">TBB for Unreal Engine</a></b></td>
<td class="right_container_hdr2"></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td width="9" class="right_container_hdr"></td>
<td class="right_container_hdr">
<h4>Related Links</h4>
</td>
<td class="right_container_hdr"></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td height="15"></td>
<td valign="middle"><a href="http://www.intel.com/software/graphics" title="Intel Visual Computing Home">Visual Computing Home</a></td>
<td></td>
</tr>
<tr>
<td></td>
<td>
<h3>Intel<sup>®</sup> Technologies</h3>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top"><a href="http://www.intel.com/software/sandybridge">Sandy Bridge</a><br /><a href="http://software.intel.com/en-us/articles/integrated-graphics/" title="Intel Visual Computing Technologies Integrated Graphics">Graphics</a><br /><a href="http://software.intel.com/en-us/articles/parallel-programming-vc/" title="Intel Visual Computing Technologies Parallel Programming">Parallel Programming</a></td>
<td></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"><img height="4" width="4" src="http://software.intel.com/file/20494" /></td>
</tr>
<tr>
<td></td>
<td>
<h3>Focus Areas</h3>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top"><a href="http://software.intel.com/en-us/articles/game-dev/" title="Intel Game Development Focus Area">Game Development</a><br /><a href="http://software.intel.com/en-us/articles/artist-animator/" title="Intel Visual Computing Artist/Animator Focus Area">Artist/Animator</a><br /><a href="http://software.intel.com/en-us/articles/media/" title="Intel Visual Computing Media Focus Area">Media</a></td>
<td></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"></td>
</tr>
<tr>
<td></td>
<td>
<h3>Develop</h3>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td valign="top"><a href="http://software.intel.com/en-us/articles/tools-vc/" title="Intel Visual Computing Devlopment Tools">Tools</a><br /><a href="http://software.intel.com/en-us/articles/code/" title="Intel Visual Computing Devlopment Code">Code</a></td>
<td></td>
</tr>
<tr>
<td colspan="3" valign="top" height="4"></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center><!--END right column Content --></td>
</tr>
</tbody>
</table> ]]></description>
      <link>http://software.intel.com/en-us/articles/fireflies/</link>
      <pubDate>Fri, 05 Nov 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/fireflies/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/fireflies/</guid>
      <category>Parallel Programming</category>
      <category>Tools</category>
      <category>Visual Computing</category>
      <category>Intel® Graphics Performance Analyzers (Intel® GPA)</category>
      <category>Code &amp; Downloads</category>
      <category>Game Development</category>
    </item>
    <item>
      <title>Intel® AVX C/C++ Intrinsics Emulation</title>
      <description><![CDATA[ <p>Intel® AVX instruction set extension <a target="_blank" href="http://software.intel.com/en-us/avx/">[1]</a> will appear in the next generation Intel microarchitecture codename ‘Sandy Bridge'. We chose to announce AVX early to get as much support from software vendors as possible by the hardware launch time. Now, most software development platforms are supporting Intel AVX, examples are compilers and assemblers from Intel, Microsoft and GCC as well as UNIX binutils.</p>
<p>For early adopters we introduced support of AVX in Intel® Software Development Emulator <a target="_blank" href="http://software.intel.com/en-us/articles/intel-software-development-emulator/">[2]</a>, it allows you to run and check functional correctness of the code with the actual AVX instructions before hardware is available.</p>
<p>Today we are adding another useful piece to help those who may not be able to use new tools supporting AVX in their current development environment but plan to migrate in the future or are using a software platform which is not supported by Intel SDE. These software developers can still start programming with Intel AVX using intrinsics.</p>
<p>Here we are providing the C and C++ header file which emulates Intel AVX intrinsics. The AVX emulation header file uses intrinsics for the prior Intel instruction set extensions up to Intel SSE4.2. SSE4.2 support in your development environment as well as hardware is required in order to use the AVX emulation header file. <br /><br />To use simply have this file included:</p>
<p>#include "avxintrin_emu.h"</p>
<p>Instead of usual:</p>
<p>#include &lt;immintrin.h&gt;</p>
<p><br />One can also create alternative immintrin.h file (which in turn includes avxintrin_emu.h) to avoid an intrusive change to the source base and then simply switch between real AVX code generation and emulation via alternating the path to include directories.</p>
<p>Emulation header is primarily targeting UNIX type of environments, and was tested on such with GCC and Intel C/C++ compilers. We have a strong support with other tools (compilers, assemblers and SDE) on Microsoft Windows platform, but this header file can still be used on Windows, if desired, with Intel Compiler.</p>
<p>Note that the AVX emulation header file is designed to allow functional correctness of an AVX implementation and not recommended for long-term usage or release in a final product. Once your development environment and hardware supports AVX, we recommend that you switch to the real AVX intrinsic header file.<br /><br />Although we did our best to debug it, this file must <em>not</em> be considered a reference functional implementation of AVX instructions or even bug-free. Please see the current version's limitations and caveats in the beginning of the file. Please let us know about the issues you faced using it.</p>
<p><b><br />Example</b></p>
<pre name="code" class="cpp:nogutter:nocontrols">#include "avxintrin_emu.h"  // #include &lt;immintrin.h&gt;

void saxpy( float a, const float* x, const float* y, float* __restrict z, size_t len )
{
    size_t i = 0;
    __m256 a_ = _mm256_set1_ps( a );

    for ( size_t len16_ = len &amp; -16; i + 16 &lt;= len16_; i += 16 )
    {
        __m256 x1_ = _mm256_loadu_ps( x + i );
        __m256 x2_ = _mm256_loadu_ps( x + i + 8 );

        __m256 y1_ = _mm256_loadu_ps( y + i );
        __m256 y2_ = _mm256_loadu_ps( y + i + 8 );

        x1_ = _mm256_mul_ps( x1_, a_ );
        x2_ = _mm256_mul_ps( x2_, a_ );

        x1_ = _mm256_add_ps( x1_, y1_ );
        x2_ = _mm256_add_ps( x2_, y2_ );

        _mm256_storeu_ps( z + i     , x1_ );
        _mm256_storeu_ps( z + i + 8 , x2_ );
    }

    for ( ; i &lt; len; ++i )
        z[i] = x[i] * a + y[i];
}</pre>
<p><br /><strong><br />References </strong></p>
<p>[1] Intel AVX - <a target="_blank" href="http://software.intel.com/en-us/avx/">http://software.intel.com/en-us/avx/</a></p>
<p>[2] Intel Software Development Emulator - <a target="_blank" href="http://software.intel.com/en-us/articles/intel-software-development-emulator/">http://software.intel.com/en-us/articles/intel-software-development-emulator/</a></p> ]]></description>
      <link>http://software.intel.com/en-us/articles/avx-emulation-header-file/</link>
      <pubDate>Wed, 23 Jun 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/avx-emulation-header-file/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/avx-emulation-header-file/</guid>
      <category>Parallel Programming</category>
      <category>Open Source</category>
      <category>What If Experimental Software</category>
      <category>Tools</category>
      <category>Intel® AVX</category>
      <category>Software News</category>
      <category>Code &amp; Downloads</category>
    </item>
    <item>
      <title>Using the Microsoft* debug heap manager with memory error analysis of Intel® Parallel Inspector</title>
      <description><![CDATA[ <p>The Microsoft C runtime debug heap manager tracks/checks/reports a subset of the memory usage that memory error analysis of Intel Parallel Inspector tracks/checks/reports. </p>
<p>Using both of these technologies at the same time has the following implications...</p>
<ul>
<li>Binaries under analysis of Inspector may be interrupted by dialogue boxes 
<ul>
<li>Press the "ignore" button- execution will continue (recommended action) - note: you may have to press "ignore" multiple times - as by default this dialogue box will appear every so many instances for each unique error detected.</li>
<li>Do not press the "abort" button - as that will exit the application before Intel Parallel Inspector can give you a list of all memory errors, and Intel Parallel Inspector may report false positives as your application exited prematurely.</li>
<li>Do not press the “retry” button in the dialog box, else - the debugger will open and point you to assembly code that was "generated" as a result of running your application under the  Inspector analysis engine rather than the assembly of your application (not recommended)</li>
</ul>
</li>
<li>The same issue may be reported by both technologies.</li>
<li>Performance will suffer as both technologies are tracking and checking memory usage</li>
</ul>
<p>You may want to turn off the Debug Heap Manager provided by the Microsoft C runtime library.</p>
<p >There is only one way to "turn off" the debug heap manager... and that is:</p>
<ul >
<li>  Use the Release/Base version of the Microsoft C runtime library by compiling with either /MD or /MT</li>
</ul>
<p >In the ideal situation, it is recommended that you use /Od with memory error analysis in Intel Parallel Inspector with the /MD or /MT runtime library selections. By default a "debug" configuration in Visual Studio will select /MDd or /MTd settings rather than the /MD or /MT settings. You would need to check these settings for each project in your solution.  Note: It can be difficult to accomplish this on large projects - as it will be difficult to have the same runtime library used in your entire application (all dll(s), lib(s), etc).</p>
<p>Another way, to work around this problem - is to tell the "debug" version of the heap manager to disable heap checking and reporting (tracking still occurs with this method).  This can be done using the _CrtSetDbgFlag api.  An example follows showing a code snippet which turns these features off.</p>
<p >#include &lt;crtdbg.h&gt;</p>
<p >main() {</p>
<p >int oriDbgFlag, newDbgFlag;</p>
<p >oriDbgFlag = _CrtSetDbgFlag(_CRTDBG_REPORT_FLAG);</p>
<p >newDbgFlag &amp;= ~_CRTDBG_ALLOC_MEM_DF; //Turn this off (by default it is on)</p>
<p >newDbgFlag |= _CRTDBG_CHECK_ALWAYS_DF;  //Turn this on (by default it is off)</p>
<p >newDbgFlag &amp;= ~_CRTDBG_CHECK_CRT_DF;  //Not needed as this is default</p>
<p >newDbgFlag &amp;= ~_CRTDBG_DELAY_FREE_MEM_DF; //Not needed as this is default</p>
<p >newDbgFlag &amp;= ~_CRTDBG_LEAK_CHECK_DF; //Not needed as this is default</p>
<p >newDbgFlag = (newDbgFlag &amp; 0x0000FFFF) | _CRTDBG_CHECK_DEFAULT_DF; //Not needed as this is default</p>
<p >newDbgFlag = _CrtSetDbgFlag(newDbgFlag);</p>
<p >//...</p>
<p >For more information look for _CrtSetDbgFlag at MSDN.</p>
<p>Potential dialogue boxes/messages that the debug heap manager of the Microsoft C runtime library may produce, which can be suppressed using the techniques above (when under analysis of Intel Parallel Inspector):</p>
<p >Client hook allocation failure at file</p>
<p >Client hook allocation failure %hs line</p>
<p >Invalid allocation size:</p>
<p >Error: memory allocation: bad memory block type.</p>
<p >Client hook re-allocation failure at file %hs line.</p>
<p >Client hook re-allocation failure Or Error: memory allocation: bad memory block type.</p>
<p >Error: memory allocation: bad memory block type. The Block at 0x%p was allocated by aligned routines, use _aligned_realloc(). The Block at 0x%p was allocated by aligned routines, use _aligned_free()</p>
<p >Client hook free failure. HEAP CORRUPTION DETECTED: before %hs block (#%d) at 0x%p. CRT detected that the application wrote to memory before start of heap buffer.</p>
<p >HEAP CORRUPTION DETECTED: after %hs block (#%d) at 0x%p.</p>
<p >CRT detected that the application wrote to memory after end of heap buffer.</p>
<p >HEAP CORRUPTION DETECTED: after %hs block (#%d) at 0x%p.</p>
<p >CRT detected that the application wrote to memory after end of heap buffer.</p>
<p >_heapchk fails with _HEAPBADBEGIN.</p>
<p >_heapchk fails with _HEAPBADNODE.</p>
<p >_heapchk fails with _HEAPBADEND.</p>
<p >_heapchk fails with _HEAPBADPTR.</p>
<p >_heapchk fails with unknown return value!</p>
<p >HEAP CORRUPTION DETECTED: before %hs block (#%d) at 0x%p.</p>
<p >CRT detected that the application wrote to memory before start of heap buffer.</p>
<p >HEAP CORRUPTION DETECTED: before %hs block (#%d) at 0x%p.</p>
<p >CRT detected that the application wrote to memory before start of heap buffer.</p>
<p >HEAP CORRUPTION DETECTED: after %hs block (#%d) at 0x%p.</p>
<p >CRT detected that the application wrote to memory after end of heap buffer.</p>
<p >HEAP CORRUPTION DETECTED: after %hs block (#%d) at 0x%p.</p>
<p >CRT detected that the application wrote to memory after end of heap buffer.</p>
<p >HEAP CORRUPTION DETECTED: on top of Free block at 0x%p.</p>
<p >CRT detected that the application wrote to a heap buffer that was freed.</p>
<p >HEAP CORRUPTION DETECTED: on top of Free block at 0x%p.</p>
<p >CRT detected that the application wrote to a heap buffer that was freed.</p>
<p >%hs located at 0x%p is %Iu bytes long.</p>
<p >Bad memory block found at 0x%p.</p>
<p >Detected memory leaks!</p>
<p >Damage before 0x%p which was allocated by aligned routine</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/using-the-microsoft-debug-heap-manager-with-memory-error-analysis-of-intel-parallel-inspector/</link>
      <pubDate>Thu, 06 May 2010 21:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/using-the-microsoft-debug-heap-manager-with-memory-error-analysis-of-intel-parallel-inspector/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/using-the-microsoft-debug-heap-manager-with-memory-error-analysis-of-intel-parallel-inspector/</guid>
      <category>Tools</category>
      <category>Intel® Parallel Inspector</category>
      <category>Intel® Parallel Inspector Knowledge Base</category>
      <category>Code &amp; Downloads</category>
    </item>
  </channel></rss>
