<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Wed, 25 Nov 2009 09:12:20 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/multi-core/type/technical-article/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles feed</title>
    <link>http://software.intel.com/en-us/articles/multi-core/technical-article//all</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>3D Modeling and Parallel Mesh Simplification</title>
      <description><![CDATA[ <p>By Christopher G. Healey</p>
<p class="sectionHeading">Introduction</p>
<p>3D geometric models are common in computer graphics—for example, in computer games, scientific simulations, or architectural design. These models are usually built as a collection of vertices connected together to form a polygon mesh. Complex models can contain tens of millions of polygons. For example, the power plant and DoubleEagle Tanker models (http://www.cs.unc.edu/~walk/models) available from the University of North Carolina at Chapel Hill contain 13 million and 82 million polygons, respectively.<br /><br />As you can imagine, managing large polygonal models within an application presents numerous challenges for developers—the most pressing of which is how quickly the mesh can be rendered. Interactive applications need to render at least 30 frames a second. A number of clever approaches like view culling, point-based rendering, and geometry caching have been designed to help with this. Another example involves global illumination algorithms like radiosity. The speed of these algorithms depends in large part on the size of the polygon mesh they take as input. Reducing the mesh's size can produce significant speed-ups in the algorithm.</p>
<p>This article describes a collection of mesh simplification algorithms. Mesh simplification reduces the number of polygons in the mesh—often dramatically—while remaining faithful to the topology (that is, the shape) of the original mesh.   In this article, you’ll learn how to simplify your own meshes. You can apply this in any situation where smaller models would help, for example, to improve character models in a game for better rendering speeds, or to simplify a complicated terrain model so it can be held entirely in memory.</p>
<p class="sectionHeading"><br />Basic Simplification Operations</p>
<p>Any differences between the original and the simplified mesh are called geometric error. You use geometric error to control a simplification algorithm. One option is to define an error threshold, then ask the algorithm to reduce the number of polygons as much as possible without producing geometric error above the set threshold. Another approach is to pick a desired polygon count, then ask the algorithm to reduce the mesh to this size while minimizing the amount of geometric error being produced.</p>
<p>Different algorithms reduce the number of polygons in a mesh in different ways. Two common simplification operations are the edge collapse and vertex collapse.</p>
<p class="sectionHeading"><br />Edge Collapse</p>
<p>An edge collapse (see Figure 1) takes an edge (<i>v</i>1, <i>v</i>2) in the mesh and collapses it to a single vertex <i>v</i>new. The edge and any triangles that use the edge are removed. Common locations for <i>v</i>new include either of the collapsed edge's endpoints (<i>v</i>new=<i>v</i>1 or <i>v</i>new=<i>v</i>2) or a position somewhere along the collapsed edge (<i>v</i>new= <i>v</i>1+<i>w</i>(<i>v</i>2- <i>v</i>1), 0 <span style="text-decoration: underline;">&lt;</span> <i>w</i> <span style="text-decoration: underline;">&lt;</span> 1).</p>
<p><img src="http://software.intel.com/file/23792/" /></p>
<p class="sectionHeading">Vertex Collapse</p>
<p>A vertex collapse (see Figure 2) combines two vertices (<i>v</i>1, <i>v</i>2) into a single vertex <i>v</i>new. Unlike an edge collapse, <i>v</i>1 and <i>v</i>2 do not need to be connected. Triangles connected to either <i>v</i>1 or <i>v</i>2 are updated to connect to <i>v</i>new. You can place <i>v</i>new somewhere along the line connecting <i>v</i>1 and <i>v</i>1, or in any other position that does not introduce inconsistency like the mesh folding over or polygons interpenetrating one another. Notice that an edge collapse is a special case of a vertex collapse. It's a vertex collapse where <i>v</i>1 and <i>v2</i> are connected and <i>vnew</i> is somewhere along the edge between <i>v</i>1 and <i>v2</i>.</p>
<p><img src="http://software.intel.com/file/23793/" /></p>
<p class="sectionHeading">Progressive Meshes</p>
<p>Edge collapses seem like a promising way to reduce a mesh's size. Collapsing an edge removes the triangles it shares, making the model smaller. The key question is: Which edges should we remove, and in what order? Progressive meshes (PM) solve exactly this problem. The PM algorithm simplifies or refines a mesh using three mesh modification operators: an edge collapse to simplify and reduce the size of the mesh, a vertex split to add detail and increase the size of the mesh, and an edge swap to improve the shape of the mesh [Hoppe 1993]. This article focuses on mesh simplification, so let’s concentrate on the edge collapse operator.</p>
<p>The PM algorithm simplifies an initial mesh <i>M<sup>n</sup></i> with <i>n</i> vertices into a progressive sequence of meshes (<i>M<sup>n−1</sup></i>, <i>M<sup>n−2</sup></i>,  . . . <i>M<sup>1</sup></i>), each with one less vertex. This makes it easy to select a mesh with a desired number of vertices. If you want a mesh with <i>m</i> vertices, simply choose <i>M<sup>m</sup></i> from the sequence. It’s also easy to locate a mesh with a maximum geometric error.  Walk down the sequence of meshes until you find the mesh <i>M<sup>m-1</sup></i> with more error than you want. The previous mesh <i>M<sup>m</sup></i> will be the smallest mesh with an error below your threshold. You can also use the sequence to animate the simplification or refinement of the mesh—a technique known as geomorphing.</p>
<p>Simplification is controlled by an energy function that measures two competing quality measures: how tightly a simplified mesh fits the original (<i>Edist</i>) and the number of vertices the simplified mesh contains (<i>Erep</i>). Improving one measure will normally make the other worse (for example, fewer vertices will reduce the fit to the original mesh). A spring measure <i>Espring</i> is also included in the energy function. Its job is to keep the mesh tight along its edges. Adding these three measures together produces the final energy function:<i><br /></i></p>
<p><i>E</i> = <i>Edist</i> + <i>Erep</i> + <i>Espring</i></p>
<p>Given a mesh <i>M<sup>m</sup></i> with <i>V</i> = {<i> v1</i>, …, <i>vm</i> } vertices and edges between the vertices defined by <i>K</i>, <i>Edist</i>(<i>V,K</i>) measures tightness as the squared distance from <i>V </i>to the original mesh. <i>Erep</i> (<i>K</i>) is simply a weighted vertex count <i>crep m</i>. <i>crep </i>allows you to control how aggressively to simplify the mesh. A larger <i>crep </i>means the energy function will favor reducing vertices over maintaining a tight fit to the original mesh. Each edge collapse replaces an edge’s endpoints (<i>v</i>1, <i>v</i>2) with a single vertex <i>v</i>new, as Figure 3 shows.</p>
<p><img src="http://software.intel.com/file/23794/" /></p>
<p>To optimize <i>v</i>new, all other vertices are held constant. You derive a <i>v</i>new position that minimizes the squared distance to the original mesh through conjugate gradient iteration. You can localize distance calculations to triangles incident on either <i>v1</i> or <i>v2</i>. To reduce the number of iterations performed, you have three possible starting positions: <i>v1</i>, <i>v2</i>, and ½(<i>v1</i>+<i>v2</i>). The best overall position is then assigned to <i>v</i>new.</p>
<p>Figure 4 shows a simple pseudocode overview of the progressive mesh (PM) algorithm.</p>
<p> </p>
<pre name="code" class="cpp">start with initial K, V for mesh Mn<br />repeat {<br />	choose an edge collapse producing new connectivity K’<br />	optimize vnew position producing new vertices V’<br />	calculate E(K’,V’)<br />	if E(K’,V’) &lt; E(K,V)<br />		set K=K’, V=V’<br />}<br />until convergence<br /></pre>
<p> </p>
<p>Figure 4. Simple pseudocode of the PM algorithm</p>
<p>This code produces a sequence of edge collapses that continues until no improvements to the energy function can be found.</p>
<p>You can extend the PM algorithm to consider both geometric error and errors in surface attributes like color and texture patterns [Hoppe 1996]. You do this by updating the energy function to include penalties for collapsing edges that straddle different surface colors or different textures. Now, a simplified model maintains its shape <i>and </i>any color and texture patterns on its surface.</p>
<p>Figure 3a–c shows a geographic mesh of North America with frost coverage and rainfall measured at each vertex position. A modified PM algorithm that respects frost and rainfall values was used to reduce the mesh to 10% of its original vertices (Figure 3d–f) [Walter 2001]. Although 90% of the original vertices and their corresponding weather values have been removed, the simplified frost and rainfall patterns (Figure 3e–f) remain faithful to the full-resolution versions (Figure 3b–c).</p>
<p class="sectionHeading"><br />Simplification Using Quadric Errors</p>
<p>An alternative to collapsing an edge is to connect two vertices (a vertex collapse), then remove any triangles that reduce to lines or points. The quadric simplification approach uses this exact strategy [Garland 1997]. The algorithm chooses a set of vertex collapses<sup>**</sup> that reduce an initial mesh (<i>M<sup>n</sup></i>) into a sequence of meshes (<i>M<sup>n−1</sup></i>, <i>M<sup>n−2</sup></i>, . . . <i>M<sup>1</sup></i>). A quadric error metric (QEM) selects which vertices to collapse.</p>
<p>Unlike an edge collapse, a vertex collapse can replace two vertices that are not connected (Figure 2). This means disconnected regions of the mesh can be joined together. Whether this is what you want depends on your models’ domain.</p>
<p>The main problem is to decide which pairs of vertices to collapse to produce simplified meshes that closely approximate the original mesh. To do so, perform these steps:</p>
<ol>
<li>Identify all vertex pairs that could be collapsed.</li>
<li>Assign a cost to each of these collapses.</li>
<li>Place the collapses on a heap that is ordered by minimum cost. This means the collapse with the lowest cost will always be at the front of the heap.</li>
</ol>
<p>Once this is done, you can remove the vertex pair (<i>v</i>1, <i>v</i>2) with the minimum cost from the top of the heap and perform the collapse. Vertex pairs still on the heap that contain either <i>v</i>1 or <i>v</i>2 need to have their costs updated. This re-orders the heap, so the next best collapse is at the front. Continue retrieving, collapsing, and updating until your desired mesh size or error threshold is reached.</p>
<p>A quadric error matrix <i>Qi</i> at each vertex <i>v</i>i defines the cost of a collapse. <i>Qi</i> is built by examining each triangle that includes <i>v</i>i. Once you have QEMs <i>Q1</i> and <i>Q2</i> for vertex pair (<i>v</i>1, <i>v</i>2), you can approximate an error matrix for the collapsed vertex <i>vnew</i> as <i>Q</i> = <i>Q1</i> + <i>Q2</i>. The algorithm first tries to find a position for <i>vnew</i> that minimizes error. If the search fails, it looks for an optimal position along the edge <i>v1 v2</i>. As a fallback, the algorithm chooses the position with the smallest error from among <i>v1</i>, <i>v2</i>, and ½(<i>v1</i>+<i>v2</i>).</p>
Figure 5 shows the pseudocode for the quadric error algorithm.
<p><sup>**</sup>In his paper, Garland refers to the vertex collapse operation as a <i>pair contraction.</i></p>
<p> </p>
<pre name="code" class="cpp">compute Qi for all vi in Mn<br />select as valid pairs<br />	all edges (v1,v2)<br />	any pair (v1,v2) where v1−v2 &lt; t for threshold value t<br />compute optimal vnew, cost of vnew for each valid pair<br />place all pairs on a heap ordered by minimum cost<br />repeat {<br />	remove (v1,v2) from the top of the heap<br />	contract to position vnew by setting v1=vnew<br />	replace all occurrences of v2 in heap with v1<br />	update costs of valid pairs involving v1<br />}<br />until target vertex size of error threshold is achieved<br /><br /></pre>
<p> </p>
<p>Figure 5. Pseudocode for Garland’s QEM</p>
<p>As with PM, you can extend quadric error meshes to respect surface attributes—in particular, colors, texture coordinates, and normal directions [Garland 1998].</p>
<p class="sectionHeading">Simplification Envelopes</p>
<p>When you simplify a mesh, how can you make sure the shape stays true to the original model? Simplification envelopes offer a simple and elegant way to do this [Cohen 1993]. A mesh is bounded on both sides by two envelopes, two offset surfaces guaranteed to be no more than a maximum distance ε from any point on the original model. You then simplify the mesh by removing vertices, which creates holes in the mesh. If you can fill a hole without falling outside of the simplification envelopes, the removal is allowed. Vertices are removed until no holes can be filled.</p>
<p>By varying ε, a sequence of simplified models can be constructed. Small ε produces a close match to the original model but with limited simplification. Large ε produces a significant reduction in vertex and polygon count but with a less accurate fit to the original model.</p>
<p>You build the simplification envelopes by constructing envelopes for each triangle in the mesh, then combining them. For an edge (<i>v</i>1, <i>v</i>2) and normals to the vertices (<i>n</i>1, <i>n</i>2), find the plane passing through the edge in the direction <i>n</i>1 at<i> v</i>1 and <i>n</i>2 at <i>v</i>2. Every triangle will have three such planes—one for each of its three edges. Extending the triangle’s edges a distance +ε and −ε along each plane produces two simplification envelopes—one above the triangle and one below it (see Figure 6). Envelopes for individual polygons are combined. Any intersections between polygon envelopes are corrected, producing a smooth surface bounding the object.</p>
<p><img src="http://software.intel.com/file/23795/" /></p>
<p>After you build the two envelopes, simplification can begin. You can use two different approaches to control simplification: a local method that uses vertex removal and a global method that uses overlap.</p>
<p>The local approach simplifies large models by minimizing preprocessing and considering only local changes to the mesh. Here, each vertex in the mesh is placed on a queue. A vertex is selected from the queue and removed from the mesh, producing a “hole.” The algorithm uses triangulation to try to fill the hole. Each new triangle is tested to ensure that it lies within the mesh’s envelopes. If the hole is filled successfully, the new triangles' vertices are saved in the queue. If the hole cannot be filled, the mesh is left unchanged. You continue until the queue is empty. At this point, no additional simplification is possible.</p>
<p>The global approach tries to remove as many vertices as possible at each step in the algorithm. To do this, the vertices of the original mesh are combined into all possible triangles that do not intersect the boundary envelopes. Count the number of mesh vertices each candidate triangle overlaps, place them on a heap in descending order of vertex overlap, then retrieve them one by one. For each candidate, remove all the triangles in the mesh that overlap the candidate, forming a large hole. As before, you use triangulation to try to fill the hole. Holes that you successfully fill are kept. This process continues until the heap is empty.</p>
<p>Figure 7 and figure 8 show the pseudocode for both the local and global algorithms, respectively.</p>
<p> </p>
<pre name="code" class="cpp">build simplification envelopes<br />assign all vi in Mn to a queue<br />repeat {<br />	retrieve vi from queue<br />	remove vi from mesh<br />	if resulting hole can be filled<br />		add new vj to queue<br />}<br />until queue is empty<br /></pre>
<p> </p>
<p>Figure 7. The local algorithm</p>
<p> </p>
<pre name="code" class="cpp">build simplification envelopes<br />for all vi in Mn {<br />	build all triangles tj within<br />	  envelopes<br />	calculate vertex overlap<br />	add tj to heap<br />}<br />repeat {<br />	remove tj from heap<br />	remove triangles tj overlaps<br />	if resulting hole can be filled<br />		keep filled hole<br />}<br />until heap is empty<br /></pre>
<p> </p>
<p>Figure 8. The global algorithm</p>
<p class="sectionHeading"><br />Parallel Mesh Simplification</p>
<p>Mesh simplifications—especially methods that perform local modifications to a mesh—seem like ideal candidates for parallelization. Various approaches have been suggested to divide simplification into independent subtasks that can be run in parallel. The potential advantages of parallel mesh simplification are two-fold.</p>
<ul>
<li>By dividing the problem across several independent processes, the time needed to complete the simplification is the time needed by the slowest process, plus the time needed to subdivide the mesh and combine the processes’ results.</li>
<li>The amount of memory required by each process is reduced, because it operates on a subset of the original mesh. So, when a mesh does not fit in main memory, you can split it into subsets that do, eliminating the need to swap data to and from disk.</li>
</ul>
<p>Both advantages can provide significant speed-ups, particularly for very large meshes.</p>
<p><img src="http://software.intel.com/file/23796/" /></p>
<p>Parallel simplification algorithms follow a common pattern for producing their final results (Figure 9):<br />      A parent processor partitions the original mesh into <i>p</i> sub-meshes <i>Si</i>.<br />      Each <i>Si</i> is sent to one of <i>p</i> child processors.<br />      Each child simplifies <i>Si</i> and returns it to the parent.<br />      The parent combines the results to produce a final, simplified mesh.</p>
<p class="sectionHeading"><br /><br />Parallel Simplification</p>
<p>A good example of the split-simplify-combine technique is a method that uses vertex decimation to simplify a mesh. Vertex decimation is directly compatible with both the PM and QEM simplification algorithms. Other local approaches (for example, the local simplification envelope technique) should also work with only minor modifications.</p>
<p>One recent algorithm that follows the standard pattern of partitioning the original mesh into sub-meshes [Rasch 2000] distributes each sub-mesh to a child processor for parallel simplification. A child processor can create its own “worker” processes. These workers will themselves simplify in parallel, further improving performance.</p>
<p>The parent and child processes are usually run on separate machines. This means the processes have separate memory spaces. Rather than passing mesh information back and forth, you can use an external library (for example, the Adsmith-Library) to simulate a shared-memory environment. The library distributes the mesh data to all the processes, but presents it to each process as a single, large memory block.</p>
<p>The basic algorithm still leaves certain issues open. For example, if multiple workers are simplifying a sub-mesh in parallel, it’s important to make sure that two workers don't suggest different changes to the same area of the mesh. You could do this by asking each worker to lock the triangles that use the vertices it is trying to simplify. Similarly, simplifying boundary triangles can create cracks in the mesh if separate child processors make incompatible changes to edges shared across the boundary. Again, you can manage this by locking boundary triangles or by “re-connecting” mismatched boundaries in the master processor—for example, using a zippering algorithm to “sew” the edges back together [Turk 1994].</p>
<p>This algorithm was evaluated on a variety of models ranging from 15,000 to 922,000 polygons. Small models show linear speed-up for up to four processors. For more than four processors, proper load balancing becomes difficult. Complex models that do not fit in the main memory show super-linear speed-ups, because most of the time spent in the non-parallel case involves swapping data to and from disk. Given an estimate of how long a single processor would need if it had enough memory to store the entire model, you see a 26× improvement for a 36-processor environment.</p>
<p class="sectionHeading"><br /><br />Distributed Simplification</p>
<p>Partition and reconstruction approaches that use different communication strategies also exist. One example uses a portable message-passing interface (Local Area Multicomputer-Message Passing Interface, or LAM-MPI) to communicate between processes, rather than assuming a large shared memory space [Dehne 2000]. A master process partitions the original mesh into sub-meshes of roughly equal size, then sends them to child processors for simplification. A child processor may create additional children to further distribute the simplification tasks. Results are sent back to the master to be combined into a final result.</p>
<p>Simplification times with 1, 2, 4, 8, and 16 processors have been measured, together with the time to transfer data between the master and the children and the difference in time Δ<i>S</i> between the slowest and fastest child processes. Simplification times show a near linear speed-up as the number of processors increases. As with the shared memory algorithm, speed-ups also improve as the size of the model increases. Transfer times, even for 16 processors, do not exceed 5% of the total execution time. However, Δ<i>S</i> can be as high as 50% for certain tests. This shows a significant load imbalance. In spite of this, partitioning the original mesh to produce perfect load balancing is extremely difficult, and the time needed to do it may be more than the time saved during the improved simplification.</p>
<p>A recent re-evaluation of the message passing algorithm confirms the speed-up in execution times. Geometric error and execution times for 1, 8, and 16 processors was recorded and compared to the QEM algorithm [Tang 2007, Yongquan 2009]. Parallel results are similar to QEM, although in all cases QEM produces smaller absolute errors than the binary space-partitioning simplification used by the message passing algorithm. Execution times show a speed-up of approximately 7× for the 8-processor environment and approximately 11× for 16 processors, reaffirming the effectiveness of a parallel implementation.</p>
<p class="sectionHeading"><br />Parallel Hierarchical Level-of-Detail</p>
<p>Another way to simplify a model before rendering is by creating hierarchical level-of-detail trees (HLODs) in parallel. HLODs are like any other tree structure. They have a root, internal nodes, and leaf nodes. The original, full-detail model is stored at the root. You simplify by performing the following steps:</p>
<ol>
<li>Simplify the original model stored at the root.</li>
<li>At the same time, subdivide the original model into multiple sub-model parts.</li>
<li>Create a child node to hold each sub-model.</li>
<li>Ask the child nodes to simplify their sub-model.</li>
</ol>
<p>Simplification in the child nodes is more aggressive than the simplification at the root node. This means the child nodes will be a more simplified version of the model.<br /><br />You perform exactly the same "subdivide and simplify" strategy on internal nodes to further simplify the model. When finished, you have an HLOD tree [Cozzi 2008]. Nodes farther down the tree are more simplified. If you take all the leaf nodes and combine their sub-models, you recreate the original model at the highest level of simplification.</p>
<p>Vertex clustering is used to perform simplification. Vertex clustering subdivides a mesh into cells. You choose a single vertex <i>v</i>new to replace all the vertices {<i>v</i>1, ..., <i>v</i>m} in each cell. Triangles attached to {<i>v</i>1, ..., <i>v</i>m} are updated to use <i>v</i>new. Triangles that are do not degenerate into lines or points are kept. This strategy works well, since you don't need vertices that are outside the sub-model you are simplifying.</p>
<p>Building HLODs offers not just one, but two opportunities to parallelize. First, the simplification in different nodes can be run in parallel. Second, the recursive steps in the HLOD can also be run in parallel. That is, whenever a subtree is created, you can assign it to a separate thread for further processing.</p>
<p><img src="http://software.intel.com/file/23797/" /></p>
<p>You can use a hybrid approach that combines both of these advantages (Figure 10) by subdividing the model into eight sub-models (an octree), then processing pairs of subtrees in parallel. The root and each subtree create a separate thread to perform simplification so that subdividing and simplifying can run in parallel. Although you might expect more threads to produce even better results, runtime examination shows that the cost of thread management begins to outweigh the savings from parallel execution, particularly when all the cores in the CPU are busy with other tasks.</p>
<p>An example scenario simplified a city model with 5,646,041 triangles into HLODs of height six. Using separate threads for either simplification or subtree processing produced improvements of around 15-20%. Combining these steps using a hybrid parallel approach sped up the algorithm, generating a savings of approximately 40%. By increasing the number of cores in the CPU and optimizing thread management, you should see even larger improvements.</p>
<p class="sectionHeading"><br />Parallel View-dependent Mesh Refinement</p>
<p>Parallel methods are not used only for simplifying a mesh. For instance, it is possible to refine a PM in parallel based on where a viewer is looking. The goal is to dynamically reduce detail in areas of the mesh that are small or hidden and maintain detail in areas that are in sight. You can do this by building a “vertex hierarchy” of all possible mesh refinements for <i>Mn</i>. You then use the view frustum, surface orientation, and screen-space size (in pixels) to choose an appropriate level of detail for different regions in the mesh [Hu 2009].</p>
<p>The vertex hierarchy contains the positions of the <i>n</i> vertices in the original mesh, plus their new positions as they are moved during edge collapses. A “vertex frontier” within the hierarchy contains the vertices that are currently being used to render the mesh. At each timestep, you process the frontier's vertices in parallel, searching for locations where new edge collapses or vertex splits are needed because of changes in where a viewer is located and where he is looking. Updates are sent to a triangle buffer that keeps a list of triangles to render.</p>
<p>You can perform load balancing by measuring how long it took to complete each algorithm step during previous passes. Given a fixed time budget (for example, a minimum frames-per-second rendering requirement), you complete as many steps as is possible. Partial steps are held and finished during the next timestep.</p>
<p>Results show that very large models can be simplified and rendered at interactive speeds, even for an error threshold of at most 1 pixel. For example, a 10,000,000-triangle model is simplified to approximately 390,000 triangles and rendered in 22 ms. As the number of cores increases, so too should the performance of the algorithm, because more vertices on the vertex frontier can be updated during each processing step.</p>
<p class="sectionHeading"><br />Code Resources</p>
<p>Numerous source code implementations exist for each of the three simplification algorithms. Here are a few examples:</p>
<ul>
<li><a target="_blank" href="http://hezhao.net/project/progressive-meshes.html">Progressive meshes</a></li>
<li><a target="_blank" href="http://mgarland.org/software/qslim.html">Quadric simplification</a></li>
<li><a target="_blank" href="http://www.cs.unc.edu/~geom/envelope.html">Simplification envelopes</a></li>
</ul>
<p class="sectionHeading"><br />Summary</p>
<p>Polygon meshes have long been a standard method for representing the shape and surface properties of three-dimensional models. You can use simplification algorithms to set the number of polygons in your models to just the right number to both maintain a desired level of detail and to minimize the number of polygons you need to render.</p>
<p>Numerous simplification algorithms have been proposed that apply basic operations like edge and vertex collapses. Each algorithm reduces the size of a mesh using cost and objective functions to search for a simplified result with a specific polygon count or a guaranteed error tolerance.</p>
<p>More recent work has described ways to parallelize the simplification algorithms. The parallel algorithms community uses a partition, simplify, and recombine approach to process subsets of the original mesh simultaneously. Computer graphics researchers have suggested other uses of parallel processing—for example, to dynamically update a simplified mesh based on a changing view location. Results from both areas show that parallel implementations can provide dramatic improvements, both in execution time and in the number of resources needed to complete the simplification task.</p>
<p class="sectionHeading"><br />About the Author</p>
<p>Christopher G. Healey received a Bachelor’s degree in Math from the University of Waterloo in Waterloo, Canada, and an M.Sc. and Ph.D. from the University of British Columbia in Vancouver, Canada. Following a postdoctoral fellowship at the University of California at Berkeley, he joined the Department of Computer Science at North Carolina State University, where he is currently an Associate Professor. His research interests include visualization, graphics, visual perception, and areas of applied mathematics, databases, artificial intelligence, and aesthetics related to visual analysis and data management.</p>
<p class="sectionHeading"><br />References</p>
<p>Cohen, Varshney, Manocha, Turk, Weber, Agarwal, Brooks, and Wright (1996). Simplification envelopes. In: <i>ACM SIGGRAPH ’93 Conference Proceedings</i>. New Orleans, Louisiana; pp. 119–28.</p>
<p>Cozzi, P. and Loo, B. T. (2008). Parallel Processing City Models for Real-Time Visualization. http://www.seas.upenn.edu/~pcozzi/PreprocessingCityModels.pdf.</p>
<p>Dehne, Langis, and Roth (2000). Mesh simplification in parallel. In: <i>Proceedings 4th International Conference on Algorithms and Architectures for Parallel Processing.</i> Hong Kong; pp. 281–90.</p>
<p>Garland and Heckbert (1997). Surface simplification using quadric error metrics. In: <i>ACM SIGGRAPH ’97 Conference Proceedings.</i> Los Angeles, Cal.; pp. 209–16.</p>
<p>Garland and Heckbert (1998). “Simplifying surfaces with color and texture using quadric error metrics. In: <i>Proceedings IEEE Visualization ’98</i>, Research Triangle Park, NC; pp. 263–9.</p>
<p>Hoppe, DeRose, Duchamp, McDonald, and Stuetzle (1993). Mesh optimization. In: <i>ACM SIGGRAPH ’93 Conference Proceedings</i>, Anaheim, Cal.; pp. 19–26.</p>
<p>Hoppe (1996). Progressive meshes. In: <i>ACM SIGGRAPH ’96 Conference Proceedings,</i> New Orleans, Louisiana; pp. 99–108.</p>
<p>Hu, Sander, and Hoppe (2009). Parallel view-dependent refinement of progressive meshes. In: <i>Proceedings of the 2009 Symposium on Interactive 3D Graphics.</i> Boston, Mass.; pp. 169–76.</p>
<p>Rasch and Schmidt (2000). Parallel mesh simplification. In: <i>Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Algorithms (PDPTA 2000)</i>. Las Vegas, Nev.</p>
<p>Tang, Jia, and Li (2007). Simplification algorithm for large polygon model in distributed environment. In: <i>Lecture Notes in Computer Science: Advanced Intelligent Computing Theories and Applications with Aspects of Theoretical and Methodological Issues</i>. Heidelberg, Germany: Springer-Verlag; pp. 960–9.</p>
<p>Turk and Levoy (1994). Zippered polygon meshes from range images. In: <i>ACM SIGGRAPH ’94 Conference Proceedings</i>. Orlando, Flo.; pp. 311–8.</p>
<p>Walter and Healey (2001). Attribute preserving dataset simplification. In: <i>Proceedings IEEE Visualization 2001</i>, San Diego, Cal.; pp. 113–20.</p>
<p>Yongquan, Nan, Pengdong, Chu, Jintao, and Rui (2009). A parallel memory efficient framework for out-of-core mesh simplification. In: <i>Proceedings 11th International Conference on High Performance Computing and Communications (HPCC-09). </i>Seoul, Korea; pp. 666–71.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-modeling-and-parallel-mesh-simplification</link>
      <pubDate>Mon, 23 Nov 2009 16:30:16 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/3d-modeling-and-parallel-mesh-simplification#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-modeling-and-parallel-mesh-simplification</guid>
      <category>Visual Computing</category>
      <category>Game Development</category>
    </item>
    <item>
      <title>Bringing Enhanced Video to Audiences Everywhere</title>
      <description><![CDATA[ <p>Digital video is lighting up screens and capturing the imaginations of computer users everywhere, a trend that places unusually high demands on both computer platforms and software applications. The latest product releases from one of the most agile and innovative software producers in this space, Sonic Solutions, respond to this trend in a direct and forthright way, delivering a series of tuned and optimized applications that capitalize on the forward-looking platform technologies from Intel. The result is enhanced video experiences for consumers and a collection of software applications designed to speed through as many frames per second as is physically possible on a given machine. To some degree, the outstanding results achieved by Sonic in its development efforts can be traced to a rich collaborative history between Sonic and Intel and the hard work of engineers on the staffs of both companies. <br /><br />This solution brief highlights the tools and techniques that proved useful in the development process and describes some of the areas where substantial performance gains and productivity boosts were achieved.</p>
<p><b>Techniques for Optimizing and Tuning Video Applications</b></p>
<p>The collaborative relationship between Sonic Solutions and Intel extends back several years and has yielded significant benefits in numerous areas and on multiple platforms. Among the recent platforms from Intel that have been targeted for upcoming software releases from Sonic are:</p>
<ul>
<li>Intel® X58 Express chipset</li>
<li>Intel® Core™2 Duo processor</li>
<li>Intel® Centrino®2 processor technology</li>
<li>Intel® Core™ 2 Extreme processor</li>
<li>Intel® Core™ i7 processor</li>
<li>Intel® Core™ i7 mobile processor</li>
</ul>
<p>Sonic Solutions’ integration of the Intel® Media Software Development Kit (Intel® Media SDK) into Roxio CinePlayer* and Roxio Creator* 2010 provides the advantages of hardware acceleration on today’s graphics platforms with support for future graphics platforms, as well. The video copy-and-conversion feature in Creator 2010 was also optimized to enhance performance and responsiveness through the use of multi-threading. Intel provided assistance in this engineering effort, and Sonic Solutions used a number of tools from the Intel® Software Development Products offering, including the Intel Media SDK, Intel® Threading Building Blocks, Intel® VTune™ Performance Analyzer, Intel® Thread Profiler, and others. The Intel Media SDK, in particular, delivered a productivity boost and the reliability of proven, refined codec performance to the Sonic development team.</p>
<p><b>Roxio CinePlayer* Enhancements</b></p>
<p>Providing a cost-effective way to decode and play back DVDs on PCs, Roxio CinePlayer has built a solid customer base in the consumer market for its high-quality video and crisp audio delivery. The Windows Vista*-compatible player provides full support for Dolby Digital 5.1 Surround Sound and handles InterActual* content embedded on DVDs. During the engagement with Intel, engineers focused on enhancing playback of high-definition (HD) video content when running on computers equipped with the Intel® G45 Express chipset. To this end, the Intel Media SDK proved useful in providing developers streamlined access to the hardware acceleration features offered by a key component of the Intel G45 Express chipset: the Intel® Graphics Media Accelerator X4500HD (Intel® GMA X4500HD) with built-in support for 1080p HD video playback.</p>
<p>The hardware acceleration helps provide an enhanced end-user experience through faster decoding of MPEG-2, AVC, and VC1 video content—whether from DVD or streamed from the Internet.</p>
<p><a href="http://software.intel.com/file/23874">Download Full PDF</a> (428kb)</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/bringing-enhanced-video-to-audiences-everywhere</link>
      <pubDate>Mon, 23 Nov 2009 11:27:13 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/bringing-enhanced-video-to-audiences-everywhere#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/bringing-enhanced-video-to-audiences-everywhere</guid>
      <category>Visual Computing</category>
      <category>Media</category>
    </item>
    <item>
      <title>Interview with Anatoliy Kuznetsov, the author of BitMagic C++ Library</title>
      <description><![CDATA[ <h2>Abstract</h2>
<p>In this article, Anatoliy Kuznetsov answers the questions and tells us about the open BitMagic C++ Library.</p>
<h2>Introduction</h2>
<p>In my regular browsing  through 64-bit programming related websites, I often came across references to  BitMagic C++ Library and realized that it had benefited a lot from using  64-bits. I decided to contact the author asking him to give us an interview  about his research and developments.</p>
<p><strong>Questions are asked by: Andrey Karpov</strong> - in-house developer at OOO "Program Verification Systems currently"  working on  <a href="http://www.viva64.com/pvs-studio/">PVS-Studio</a> tool designed for verification modern C++  applications.</p>
<p><strong>The answers are given by: Anatoliy Kuznetsov</strong> - chief software engineer at NCBI; developer of the  open source  <a href="http://bmagic.sourceforge.net/">BitMagic C++ Library</a>.</p>
<p><strong>Anatoliy, please tell us a few words about  yourself. What projects are you involved in?</strong></p>
<p>I am a chief software  engineer, at present I am working for the bio-molecular data discovery and  visualization team at <a href="http://www.ncbi.nlm.nih.gov/">NCBI</a> (National  Center for Biotechnology Information). Apart from my primary job, I am the  chief developer and architect of the open-source BitMagic C++ Library.</p>
<p>I am a planning  engineer by education, a graduate of the Lobachevskiy University of Nizhniy  Novgorod.</p>
<p><strong>What is BitMagic?</strong></p>
<p>BitMagic has been developed  as a universal template library for processing compressed bit vectors. The  library solves several tasks:</p>
<p>It provides a bit container which is really  compatible with STL in terms of ideology. It means that such container must  support iterators, memory allocators and interact with algorithms and other STL  containers.</p>
<p>The library can efficiently operate on very  long and sparse vectors.</p>
<p>It enables the serialization vectors for  further writing to a database or sending them over a network.</p>
<p>A developer is provided with a set of  algorithms to implement set-theory operations and calculate distances and  similarity metrics in multidimensional binary spaces.</p>
<p>Much consideration is given to optimization for  popular fast calculation systems such as <a href="http://www.viva64.com/go.php?url=272">SSE</a>.</p>
<p><strong>What tasks addressed by BitMagic make it so  attractive for developers?</strong></p>
<p>The library turned out  to be fairly a one-stop solution and apparently it won’t be easy to list all its  possible applications. At present, the library offers more compelling results in  the following spheres:</p>
<p>Bit and inverted indexes building for full-text  search systems, acceleration of relational algebra operations (AND, OR, JOIN  etc).</p>
<p>Development of non-standard extensions and  indexes for existing databases (Oracle Cartridges, MS SQL extended stored  procedures). Generally, such extensions help integrate scientific, geographic  and other non-standard data into the database.</p>
<p>Data mining algorithms development.</p>
<p>In-memory indexes and databases development.</p>
<p>Development of precise access isolation systems  with a large number of objects (security enhanced databases with the isolation of  access to specific fields and columns).</p>
<p>Task management systems (for the computation  cluster), real-time task state tracking systems, and the storage of task states  described as Finite State Machines.</p>
<p>Tasks related to the representation and storage  of strongly connected graphs.</p>
<p><strong>What can you tell about the history of BitMagic  development? What motivated you to create it?</strong></p>
<p>For a long time my  colleagues and I had worked on tasks related to large databases, analysis and  visualization systems. The initial working version demonstrating bit-vector capabilities  was produced by Maxim Shemanaryov (he is the developer of Antigrain Geometry, a  wonderful 2D vector graphics library: <a href="http://www.antigrain.com">http://www.antigrain.com</a>). Then, some ideas for an equivalent  representation of data sets were described by Koen Van Damm, an engineer from  Europe, who worked on programming language parsers for the verification of complex  systems. There were other sources as well. I decided to bring them all together  in the form of a library suitable for repeated use in various projects.</p>
<p><strong>What license BitMagic is distributed under?  Where I can download it?</strong></p>
<p>The library is free  for commercial and non-commercial use and is available in the form of source  texts. The only limitation is the the requirement to give credit to the library  and its authors if you are using it in a released product.</p>
<p>You can check out all the  materials here: <a href="http://bmagic.sourceforge.net">http://bmagic.sourceforge.net</a>.</p>
<p><strong>Am I right saying that BitMagic has significant  advantage if compiled as a 64-bit version?</strong></p>
<p>Yes indeed, the  library uses a series of optimization methods to accelerate processing in  64-bit systems or SIMD-enabled systems (128-bit <a href="http://www.viva64.com/go.php?url=273">SSE2</a>).</p>
<p>Here are some factors  leading to a faster execution of algorithms:</p>
<p>a wide machine word (logical operations are  performed over a wide word);</p>
<p>the programmer (and the compiler) has access to  additional registers and the lack of registers is not so critical (this an  inherited disadvantage of x86 architecture);</p>
<p>memory alignment often makes the operation  faster (128-bit alignment of addresses provides good results);</p>
<p>and of course it’s possible to place more  objects and data for processing in a program’s memory. That’s a great benefit of  the 64-bit version, which is clear to everyone.</p>
<p>At the moment the fastest  method is to use 128-bit SSE2 optimizations with a 64-bit program. This mode  combines the double number of <a href="http://www.viva64.com/terminology/x86.html">x86</a> registers  with a wide machine word to perform logical operations.</p>
<p>64-bit systems and  programs are going through a real Renaissance. The migration to 64-bits will be  faster than the move from 16 to 32. The launch of 64-bit Windows versions in  the mainstream market and the availability of related tools (like the one your  company is developing) will stimulate this process. In the environment of  increasingly complex systems and larger amounts of code, such tools as <a href="http://www.viva64.com/pvs-studio/">PVS-Studio</a> will be of great  practical utility as they help reduce labor costs and the time to market.</p>
<p><strong>Please tell us about the compression methods  used in BitMagic.</strong></p>
<p>The current 3.6.0 version  of the library uses several compression methods.</p>
<p>"Bitvectors" in memory are split into  blocks. When a block is not occupied or is occupied fully, it won’t be  allocated. That is, the programmer can set bits in a wide range very far from  zero. Thus setting a 100,000,000 bit won’t result in an exploded memory use which  is often common to vectors with a two-dimensional linear model.</p>
<p>Blocks in memory can have an equivalent  representation in the form of areas – the so-called gaps. Actually that’s a sort  of RLE coding. Unlike RLE, our library doesn't lose the capability to execute logical  operations or access random bits.</p>
<p>When serializing "bitvectors", a set  of different methods is used: conversion to lists of integer numbers  (representing nulls or ones) and list coding by the Elias Gamma Coding method. When  implementing these methods, we do lose the random bit access capability but it  is not so critical for disk writes if compared with reduced storage and  input-output overheads.</p>
<p><strong>Could you give some code examples demonstrating  possible uses of BitMagic?</strong></p>
<p>The first example simply  creates 2 vectors, initializes them and performs the logical operation AND.  Further, the enumerator class is used for the iteration and printing of values  saved in the vector.</p>
<pre name="code" class="cpp">#include &lt;iostream&gt;
#include "bm.h"
using namespace std;
int main(void)
{
    bm::bvector&lt;&gt;   bv;    
    bv[10] = true; bv[100] = true; bv[10000] = true;
    bm::bvector&lt;&gt;   bv2(bv);    
    bv2[10000] = false;
    bv &amp;= bv2;
    bm::bvector&lt;&gt;::enumerator en = bv.first();
    bm::bvector&lt;&gt;::enumerator en_end = bv.end();
    for (; en &lt; en_end; ++en) {
        cout &lt;&lt; *en &lt;&lt; endl;
    }
    return 0;
}
</pre>
<p>The next example  demonstrates vector serialization and the use of compression.</p>
<pre name="code" class="cpp">#include &lt;stdlib.h&gt;
#include &lt;iostream&gt;
#include "bm.h"
#include "bmserial.h"
using namespace std;
// This procedure creates very dense bitvector.
// The resulting set will consists mostly from ON (1) bits
// interrupted with small gaps of 0 bits.
//
void fill_bvector(bm::bvector&lt;&gt;* bv)
{
    for (unsigned i = 0; i &lt; MAX_VALUE; ++i) {
        if (rand() % 2500) {
            bv-&gt;set_bit(i);
        }
    }
}
void print_statistics(const bm::bvector&lt;&gt;&amp; bv)
{
    bm::bvector&lt;&gt;::statistics st;
    bv.calc_stat(&amp;st);
    cout &lt;&lt; "Bits count:" &lt;&lt; bv.count() &lt;&lt; endl;
    cout &lt;&lt; "Bit blocks:" &lt;&lt; st.bit_blocks &lt;&lt; endl;
    cout &lt;&lt; "GAP blocks:" &lt;&lt; st.gap_blocks &lt;&lt; endl;
    cout &lt;&lt; "Memory used:"&lt;&lt; st.memory_used &lt;&lt; endl;
    cout &lt;&lt; "Max.serialize mem.:" &lt;&lt; 
            st.max_serialize_mem &lt;&lt; endl &lt;&lt; endl;;
}
unsigned char* serialize_bvector(
  bm::serializer&lt;bm::bvector&lt;&gt; &gt;&amp; bvs, 
  bm::bvector&lt;&gt;&amp; bv)
{
    // It is reccomended to optimize 
    // vector before serialization.
    bv.optimize();  
    bm::bvector&lt;&gt;::statistics st;
    bv.calc_stat(&amp;st);
    cout &lt;&lt; "Bits count:" &lt;&lt; bv.count() &lt;&lt; endl;
    cout &lt;&lt; "Bit blocks:" &lt;&lt; st.bit_blocks &lt;&lt; endl;
    cout &lt;&lt; "GAP blocks:" &lt;&lt; st.gap_blocks &lt;&lt; endl;
    cout &lt;&lt; "Memory used:"&lt;&lt; st.memory_used &lt;&lt; endl;
    cout &lt;&lt; "Max.serialize mem.:" &lt;&lt; 
             st.max_serialize_mem &lt;&lt; endl;
    // Allocate serialization buffer.
    unsigned char*  buf = 
        new unsigned char[st.max_serialize_mem];
    // Serialization to memory.
    unsigned len = bvs.serialize(bv, buf, 0);
    cout &lt;&lt; "Serialized size:" &lt;&lt; len &lt;&lt; endl &lt;&lt; endl;
    return buf;
}
int main(void)
{
    bm::bvector&lt;&gt;   bv1;    
    bm::bvector&lt;&gt;   bv2;
   //  set DGAP compression mode ON
    bv2.set_new_blocks_strat(bm::BM_GAP);  
    fill_bvector(&amp;bv1);
    fill_bvector(&amp;bv2);
    // Prepare a serializer class 
    // for best performance it is best 
    // to create serilizer once and reuse it
    // (saves a lot of memory allocations)
    //
    bm::serializer&lt;bm::bvector&lt;&gt; &gt; bvs;
    // next settings provide lowest serilized size 
    bvs.byte_order_serialization(false);
    bvs.gap_length_serialization(false);
    bvs.set_compression_level(4);
    unsigned char* buf1 = serialize_bvector(bvs, bv1);
    unsigned char* buf2 = serialize_bvector(bvs, bv2);
    // Serialized bvectors (buf1 and buf2) now ready to be
    // saved to a database, file or send over a network.
    // ...
    // Deserialization.
    bm::bvector&lt;&gt;  bv3;
    // As a result of desrialization bv3 
    // will contain all bits from
    // bv1 and bv3:
    //   bv3 = bv1 OR bv2
    bm::deserialize(bv3, buf1);
    bm::deserialize(bv3, buf2);
    print_statistics(bv3);
    // After a complex operation 
    // we can try to optimize bv3.
    bv3.optimize();
    print_statistics(bv3);
    delete [] buf1;
    delete [] buf2;
    return 0;
}
</pre>
<p><strong>What are your plans for BitMagic further  development? </strong></p>
<p>We want to implement  some new vector compression methods with parallel data procession capability.</p>
<p>In view of the mass expansion  of Intel Core i5-i7-i9, it would be reasonable to release a SSE 4.2 version of  the library. Intel has added some interesting features that can be used very efficiently.  The most promising feature is hardware support for bit number calculation (Population  Count).</p>
<p>We are experimenting  with nVidia CUDA and other GPGPUs. Today graphic cards enable you to perform  integer and logical operations - and their resources can be harnessed for algorithms  working on data sets and compression.</p>
<h2>References</h2>
<ol>
<li>Elias Gamma encoding of bit-vector Delta gaps (D-Gaps). <a href="http://bmagic.sourceforge.net/dGap-gamma.html">http://bmagic.sourceforge.net/dGap-gamma.html</a> </li>
<li>Hierarchical Compression. <a href="http://bmagic.sourceforge.net/hCompression.html">http://bmagic.sourceforge.net/hCompression.html</a> </li>
<li>D-Gap Compression. <a href="http://bmagic.sourceforge.net/dGap.html">http://bmagic.sourceforge.net/dGap.html</a> </li>
<li>64-bit Programming And Optimization. <a href="http://bmagic.sourceforge.net/bm64opt.html">http://bmagic.sourceforge.net/bm64opt.html</a> </li>
<li>Optimization of memory allocations. <a href="http://bmagic.sourceforge.net/memalloc.html">http://bmagic.sourceforge.net/memalloc.html</a> </li>
<li>Bitvector as a container. <a href="http://bmagic.sourceforge.net/enum.html">http://bmagic.sourceforge.net/enum.html</a> </li>
<li>128-bit SSE2 optimization. <a href="http://bmagic.sourceforge.net/bmsse2opt.html">http://bmagic.sourceforge.net/bmsse2opt.html</a> </li>
<li>Using BM library in memory saving mode. <a href="http://bmagic.sourceforge.net/memsave.html">http://bmagic.sourceforge.net/memsave.html</a> </li>
<li>Efficient distance metrics. <a href="http://bmagic.sourceforge.net/distopt.html">http://bmagic.sourceforge.net/distopt.html</a></li>
</ol> ]]></description>
      <link>http://software.intel.com/en-us/articles/BitMagic-Library</link>
      <pubDate>Fri, 20 Nov 2009 03:30:48 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/BitMagic-Library#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/BitMagic-Library</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Intel® Media Software Development Kit (Intel® Media SDK)</title>
      <description><![CDATA[ <b>Q1: What did Intel announce?</b><br /><b>A1:</b> At the Intel Developers Forum, Intel announced the availability of the Intel® Media Software Development Kit (SDK), a tool specifically designed for software developers of media applications for video playback and encoding. Intel Media SDK enables developers to advantage of hardware acceleration in Intel platforms and future-proofs the software by allowing developers to program once and enjoy performance gains on future Intel processors and graphics chipsets.<br /><br /><b>Q2: What are the benefits to using Intel Media SDK?</b><br /><b>A2:</b> Developers using Intel Media SDK no longer have to write separate code paths to tap into platform specific hardware acceleration to improve video performance. The Intel Media SDK features a single API that streamlines workflow and exploits hardware acceleration capabilities within Intel hardware. Additionally, applications integrating Intel Media SDK today will take advantage of hardware acceleration capabilities of future graphics processing solutions without rewriting the program code.<br /><br /><b>Q3: Is the Intel Media SDK setting the stage for future multi-core products?</b><br /><b>A3:</b> Intel Media SDK helps developers produce future-proofed code by using a single API that supports today's hardware as well as well as hardware that will be available in the future. This API is available for developers to evaluate now through membership in Intel's Visual Adrenaline developer program at www.intel.com/software/media. <br /><br /><b>Q4: Are there developers currently using Intel Media SDK to optimize their media applications?</b><br /><b>A4:</b> Yes, Intel has worked with a number of media application developers in the Alpha and Beta phases leading to this announcement. Developers including Corel*, CyberLink*, and Sonic* are just a few of the companies working with Intel to develop and refine this tool. These same developers have made public their intentions to use the Intel Media SDK in future products.<br /><br /><b>Q5: What platforms does Intel Media SDK support?</b><br /><b>A5:</b> Intel Media SDK supports a broad selection of hardware platforms including Intel Integrated Graphics chipsets (starting with Intel® G45/GM45 Express Chipsets), Intel® Architecture Processors (for software-based encode and decode) and third-party graphics platforms (through DLL extensions). Intel® Media SDK supports the Windows* Vista operating system (32-bit and 64-bit) with Windows* 7 support planned for a future release.<br /><br /><b>Q6: Does the Intel Media SDK tool work with non-Intel graphics?</b><br /><b>A6:</b> The API within the Intel Media SDK is extensible, allowing development teams to create dynamic link libraries (DLL) that support platform-specific implementations, including hardware from third party vendors.<br /><br /><b>Q7: What is the pricing for the Intel Media SDK?</b><br /><b>A7:</b> The Intel Media SDK is a free download to members of the Visual Adrenaline Developer program. The Visual Adrenaline Developer program is a no-cost, individual membership offering visual computing developers insight on Intel's software tools, community interaction and developer support. For more information on the Visual Adrenaline developer program visit: <a href="http://intel.com/software/graphics/">http://intel.com/software/graphics/</a>. For information on downloading the Intel Media SDK, visit <a href="http://www.intel.com/software/media/">http://www.intel.com/software/media/</a>.<br /><br /><b>Q8: What video codecs does Intel Media SDK support?</b><br /><b>A8:</b> Intel Media SDK supports encode of AVC/H.264/MPEG-4 part 10 (an international standard for compressing/decompressing video) and MPEG-2 video and decode support for AVC/H.264, MPEG-2 video, and VC-1.<br /><br /><b>Q9: Does Intel Media SDK support any video pre-processing features?</b><br /><b>A9:</b> Yes, Intel Media SDK supports pre-processing functions, including: Inverse Telecine, Scene Detection, Deinterlacing, Denoising, Resizing and Color conversion.<br /><br /><b>Q10: How does Intel Media SDK work when no hardware acceleration is available?</b><br /><b>A10:</b> When hardware is not present for acceleration of decode or encode, Intel Media SDK will fall back to using software. This software will function on legacy and non-Intel CPUs. It is designed to not degrade performance on those non-Intel CPUs, but is not optimized for performance. Performance optimization is included for Intel processors and is fully threaded and utilizes a heritage of software encoders and decoders within the Intel® Integrated Performance Primitives (Intel® IPP) product.<br /><br /><b>Q11: Does Intel Media SDK software fall back on use of Intel® Streaming SIMD Extensions (Intel® SSE)?</b><br /><b>A11:</b> All instruction level optimizations for software fallback originate from Intel® Integrated Performance Primitives (Intel® IPP). Intel guarantees functionality on competitive processors; however performance optimizations are designed for Intel processors. Intel Media SDK 1.0 supports instructions up to Intel® Streaming SIMD Extensions 4.1 (Intel® SSE 4.1).<br /><br /><b>Q12: Is H.264 optimized for transmission latency?</b><br /><b>A12:</b> Intel Media SDK is not optimized for offline video editing, transcoding or video playback usages for streaming or video conferencing usage models where latency would be a focus.<br /><br /><b>Q13: Why limit support to the Microsoft Windows* operating environments?</b><br /><b>A13:</b> Intel is first supporting the Microsoft Windows Vista operating system (32-bit and 64-bit) with Microsoft Windows* 7 support planned for a future release. Intel continues to monitor customer feedback and will factor customer needs into future product plans.<br /><br /><b>Q14: Does the Intel Media SDK support Apple Mac OS and Apple platforms based on Intel hardware?</b><br /><b>A14:</b> For Intel's initial version of Intel Media SDK we chose to support Windows Vista and in the future will support Microsoft Windows 7 in order to support a broad customer base with the first release. Intel continues to monitor customer feedback and will factor customer needs into future product plans.
<p><br /><strong>Q15: How can I use my own SW library for encoding or decoding instead of using the Intel supplied library?</strong><br /><strong>A15:</strong> Replace libmfxsw32.dll or libmfxsw64.dll with your own SW library DLL. Use the same naming, and place into your unique install directory.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-media-software-development-kit-intel-media-sdk</link>
      <pubDate>Tue, 17 Nov 2009 10:43:59 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/intel-media-software-development-kit-intel-media-sdk#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-media-software-development-kit-intel-media-sdk</guid>
      <category>Visual Computing</category>
      <category>Media</category>
    </item>
    <item>
      <title>No Moving Parts:  The Promise of Solid-State Drives</title>
      <description><![CDATA[ <em>Take away the moving parts, wrap the idea in silicon, polish until all rough edges are smooth, and suddenly you've got a new model for storage devices that topples previous performance, power efficiency, and reliability benchmarks. As a number of high-profile game developers are discovering, </em><a href="http://www.intel.com/go/ssd"><em>Intel® SATA Solid-State Drives (Intel® SSDs)</em></a><em> move data swiftly, delivering massive volumes of input and output operations per second. The result: huge productivity boosts for programmers and blazingly fast load times for gamers.<br /><br /></em>
<h1 class="sectionHeading">Download PDF</h1>
<a href="http://software.intel.com/file/23877/">No Moving Parts: The Promise of Solid-State Drives</a><br /><br /><br />
<h1 class="sectionHeading">Get Into the Game Faster</h1>
Processor advances, such as the recently released <a href="http://www.intel.com/products/processor/corei7ee">Intel® Core™ i7 processor Extreme Edition</a>, have reset expectations for mobile gaming, desktop gaming, and digital content creation. Intel SSD technology offers a natural complement to the latest processor improvements, boosting storage solution performance to a level more appropriate to the latest processor advances. The trick to achieving outstanding performance: eliminate all moving parts and redefine storage paradigms.<br /><br />The physical act of rotating platters and aligning a magnetic read head with a track and sector, as is done with conventional hard disk drives (HDDs), creates lag time and reduces the device's lifetime, since moving parts inevitably wear out over time. By relying entirely on NAND flash technology, SSDs avoid the key negative aspects of conventional HDDs, increasing reliability and operational life in the process. Because no physical movement is required to access the wholly electronic storage areas of the SSD, access to data is substantially faster.<br /><br />Another consideration is power consumption. "It takes a lot of power to keep conventional hard disk drives spinning," Alan Frost, Marketing Programs Manager of Intel's Solid-State Drive Group said, "and then to move the arm back and forth across the surface to find the information that you need. A solid-state drive has no moving parts. It uses less power. It is significantly faster. In a laptop configuration, you get better battery life. The visual computing hook-especially for game developers, but this could also be for artists, for digital content people, for video editing, or whatever-is that the system responds significantly faster. If you have a large amount of code to compile, your compile times go down."<br /><br />Developers doing rendering work, using art programs or lighting tools, typically get a lot of coffee breaks waiting for scenes to render or animations to complete. SSDs shave minutes off the wait time, delivering significantly better performance and much faster rendering times when large-volume data accesses are required.<br /><br />"Everyone across the game-development tool chain is constantly looking for ways to improve their productivity," Frost said. "Intel SSDs are the perfect complement to balancing the performance of Core i7 and Xeon 5500 systems for developers, artists, and testers. And, as every aspect of computing continues to skew toward mobility, Intel SSDs deliver better battery life and the reliability needed for performance on the go."<br /><br />
<h1 class="sectionHeading">Gauging Reactions</h1>
A seed program to put select SSD units in the hands of game developers and digital content creators has yielded early feedback for the technology. Design parameters for the SSDs favor either desktop and laptop uses, or enterprise configurations. It's important to select an SSD model targeting the correct end use. At the recent NAB trade show, Michael Katz, who handles visual computing developer relations at Intel, noted the keen interest in ramping up the use of SSDs as replacements to hard drives. "We have two classes of drives. The first is a mainstream drive, which is the <a href="http://www.intel.com/design/flash/nand/mainstream/index.htm">X18-M and the X25-M</a>. The '18' means it is a 1.8-inch size for a laptop. The '25' is a 2.5-inch drive, which fits in notebook or desktop computers. And then we have the enterprise drives, including the <a href="http://www.intel.com/design/flash/nand/extreme/index.htm">X25-E</a>, which use a different flash memory cell technology to achieve faster performance."<br /><br />Katz noted that the Intel SSD units are fully plug-compatible with the standard spinning hard drive. "You can take out your hard drive and put in an SSD," he said. "The SSD uses the same interface-the SATA 2 interface-that a standard hard drive uses, so it is 100 percent compatible. The operating system doesn't know it is an SSD. It just uses it as a hard drive. When a drive is used in a server, there are higher requirements for read-and-write cycles, which typically are better handled by the X25-E SSD. The requirements for a server deployment are different from those of an everyday workstation."<br /><br />The first reaction to the performance of Intel SSDs is typically enthusiastic. The seed program to solicit impressions from some leading game development companies yielded a number of positive comments. Tim Sweeney, CEO and founder of Epic Games said, "I experienced one of the first-generation SSDs from another vendor, and its nearly quarter-second random-access write performance left me quite jaded about SSD technology. So, when the first Intel SSD arrived, I plugged it into my laptop and was immediately astonished at the performance. The machine booted more than three times faster, and in the course of running applications, I never experienced an I/O-related delay. Between the performance and the spooky lack of hard drive chattering, it was really a game-changing experience."<br /><br />Is the Intel SSD a good fit for the game-development environment? "Game development is enormously taxing on a PC's I/O resources because we work with enormous quantities of data-game content, compiled code, and so on," Sweeney said. "The use of these resources is largely random-access, with frequent writes, which is the worst case for 'Rusty Spinning Media' technology like hard drives. Intel's new SSDs provide an enormous boost for overall I/O performance, and for random-access writes in particular. When compiling thousands of source code files or loading game content spanning hundreds of megabytes, the opportunity for increased performance is dramatic."<br /><br />Bartosz Kijanka, VP of Engineering at Gas Powered Games, summed up his impressions after testing: "Very, very nice. I think we'd need a beefy RAIDed SCSI 320 drive array to match this kind of performance-at many times the cost. I can see the future now-and it has no moving parts."<br /><br />For another perspective from the development front, Michael Antonov, CTO of Scaleform Corporation, oversees the development and production of a vector graphics engine used in many award-winning game titles. As he describes it, "Scaleform GFx* is a solution used to display Adobe Flash*-based user interfaces, HUDs, and animated textures in games on all PC and console platforms. The major benefit of Scaleform GFx* is that it allows developers to rapidly create live, animated user interfaces by working in Flash Studio*."<br /><br />Antonov also had the opportunity to test drive an Intel SSD during the seed program. "My first impression of Intel's SSD is that it is a high-performing drive solution that is available at very competitive prices. It performs well in many benchmarks. SSD greatly improves productivity when developing on the go. It beats any laptop drive in performance several times while using less power, which is a great thing. Fast disk performance is important for developers because of the extensive disk I/O that takes place during compilation, linking, and file searches. If these tasks are reduced, productivity improves. I wouldn't want to have a laptop without an SSD and Intel's SSD offers a good combination of capacity, performance, and cost."<br /><br />The growing role of SSDs in the game-development world and the potential for escalating productivity in the process has captured the imaginations of performance-bent coders and software companies engaged in data-intensive processes. The operative word is: <i>zoom</i>.<br /><br />
<h1 class="sectionHeading">Balancing Performance Across the Whole System</h1>
Straight-line processor performance has been the metric by which system performance has typically been measured, but this approach overlooks other aspects of system performance, such as the speed at which I/O operations with the storage system can be performed. The speed at which those I/O operations can be accomplished makes a substantial difference in overall system behavior and responsiveness. <br /><br />For example, virus-scanning software methodically searches through files on a drive to ferret out viruses and worms, causing a large impact on system performance. Many gamers routinely turn it off while playing games or installing patch updates for games. The fast access an SSD provides makes it possible to transparently perform virus-scanning operations without bogging down net system performance. Dedicated gamers can play and scan at the same time. This kind of responsiveness, of course, translates well to a whole range of activities centered around drive access-such as cataloging photographs, installing applications, copying and exporting contacts or e-mail messages, and multitasking with Microsoft Office* applications. <br /><br />The net result of relying on an SSD is marked improvement in overall system performance, as shown by the industry standard benchmark results depicted in Figure 1. <br /><br /><img src="http://software.intel.com/file/23773" /><br /><br />
<h1 class="sectionHeading">Comparing Memory Cell Technologies</h1>
Specific differences in the design technologies of SSDs affect their suitability for desktop or server applications. Both single-level cell (SLC) and multi-level cell (MLC) SSDs use non-volatile solid-state NAND flash memory for storing data, but key differences affect the usage models:<br /><br /><b><a href="http://www.intel.com\design\flash\nand\extreme\index.htm">Intel® X25-E Solid-State Drives</a></b><br /><b>SLC SSDs:</b> With SLC technology, individual cells exist in one of two voltage states, basically storing a single bit per cell. This approach results in faster write operations and enhanced reliability. SLC SSDs have greater longevity with the capability of sustaining substantial numbers of write cycles.<br /><br />SLC SSD characteristics favor enterprise applications, in which speed and reliability are paramount. Typical applications include online transaction processing, database operations, data warehousing, enterprise resource planning, and business intelligence.<br /><br /><b><a href="http://www.intel.com\design\flash\nand\mainstream\index.htm">Intel® X25-M Solid-State Drives</a></b><br /><b>MLC SSDs:</b> With MLC technology, four voltage states per cell can be handled, so each cell is capable of storing two bits. MLC SSDs store information at greater densities more cost effectively. Intel MLC SSDs have less longevity than their SLC counterparts, but will provide five years of useful life in client PC use.<br />Because of their lower cost and higher capacity, <br />MLC SSDs favor read-intensive enterprise applications, <br />such as streaming video, and all client PC applications.<br /><br />These characteristics have a strong influence on the success or failure of a particular type of deployment. For example, using MLC SSDs for typical enterprise applications in which performance is important may lead to disappointing results. Similarly, the higher initial cost of SLC SSDs may make them unsuitable for mainstream desktop applications in which cost is a strong consideration. <br /><br />
<h1 class="sectionHeading">Mobile Computing Perks</h1>
Power efficiency is particularly important in mobile computing. The advantages of SSDs-reduced power requirements, durability, and responsiveness-resonate well with mobile users who want all of these qualities in their selected storage system. With data loss a common concern for mobile users, the ability of the SSD to withstand shock and vibration far beyond the limits of a conventional HDD is a definite plus.<br /><br />Testing conducted by Intel in the latter part of 2008 identified these key benefits of the SSD over conventional HDDs:<br /><br />
<ul>
<li>Improved system performance-up to 50 percent improvement of system-level benchmarks</li>
<li>Improved battery life (systems ran about 50 to 70 minutes longer between charges)</li>
<li>Decreased support costs</li>
<li>Quicker installation times for operating systems and applications</li>
<li>No need for disk fragmentation software</li>
</ul>
For more details about the proof-of-concept study conducted by Intel, refer to the white paper <a href="http://www.intel.com/go/ssd">Improving the Mobile Experience with Solid-State Drives</a>.<br /><br /><img src="http://software.intel.com/file/23774" /><br /><br /><b>Figure 2</b> The compact size of Intel® Solid-State Drive solutions inspires new form-factor product designs.<br /><br />
<h1 class="sectionHeading">Enterprise Advantages</h1>
Another proof-of-concept study conducted by Intel examined the ways in which SSDs contribute to the operational efficiency of the enterprise.<br /><br />The test results and analysis went beyond the benchmarking of performance, power, and reliability and assessed the cost factors associated with using SSDs within the enterprise, which are currently more expensive than conventional HDDs. The study determined that when used to replace data HDDs in arrays, SSDs could deliver improved total cost of ownership, while also providing higher performance for many different I/O-intensive applications. In situations where the SSDs replaced operating system drives in servers, the study noted substantially faster performance for typical support tasks, such as builds and patches, as well as lower power consumption.<br /><br />In conclusion, the study's authors noted, "Obtaining the maximum benefit from SSDs requires a shift in how we think about disk performance. Current write caching assumptions may not apply, disk fragmentation is no longer an issue, and current RAID approaches-designed to improve performance with high-latency HDDs-may be less effective with SSDs."<br /><br />For more information about SSD enterprise applications, refer to the white paper <i><a href="http://www.intel.com/go/ssd">Solid-State Drives in the Enterprise: A Proof of Concept</a></i>.<br /><br />
<h1 class="sectionHeading">The Last Word</h1>
With an estimated life expectancy of 1.2 million hours mean time before failure, an operating shock capability of 1,000G for 0.5 ms, and a typical active power consumption of 150 mW, Intel SSD solutions deliver the goods for an expanding user community. For game development and digital content creation, these drives are the ideal performance match for the latest generation of Intel® Core™ processors-elevating storage performance to a level on par with multi-core processing performance capabilities.<br /><br />"Intel SSDs are for everyone who needs performance," Tim Sweeney said. "I can't imagine installing a computer without an Intel SSD as the boot drive. Sure, if you need multi-terabyte storage to store your collection of . . . legally purchased . . . music and DVD movies, then you'd augment that with a traditional hard drive for mass storage of streaming media. But, for applications and data that you work with every day, Intel Solid-State Drives are the right technology."<br /><br /><br /><br />Capture the buzz. Subscribe to <a href="http://www.intelsoftwaregraphics.com/?lid=XE2kehZk8mw=&amp;siteid=cqMoF5H/37o=">Intel® Software Dispatch for Visual Adrenaline</a>. (Did we mention it's fun, informative, visually stimulating, free, and you can unsubscribe at any time?)<br /><br /><br />
<p>*Other names and brands may be claimed as the property of others.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/no-moving-parts-the-promise-of-solid-state-drives</link>
      <pubDate>Tue, 17 Nov 2009 09:58:52 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/no-moving-parts-the-promise-of-solid-state-drives#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/no-moving-parts-the-promise-of-solid-state-drives</guid>
      <category>Visual Computing</category>
    </item>
    <item>
      <title>Building the Moblin Media Player with the Intel Compiler</title>
      <description><![CDATA[ <meta content="en-gb" />
<meta content="text/html; charset=windows-1252" />
<h3><span style="font-size: x-small;"><span style="font-family: Arial"><strong>Building the Moblin Media Player Gui with the Intel Compiler</strong></span></span></h3>
<h3><span style="font-size: x-small; font-family: Arial;"><strong class="sectionHeadingText">Introduction</strong></span></h3>
<p class="sectionBody">The media player project is know as Hornsey and depends on clutter, clutter-gst, bickley, nbtk, bognor-regis, libunique, libstartup-notification and gtk+-2.0.</p>
<p><span class="sectionBody">For the build enironment I am using the Moblin2 development image installed on a dualcore laptop.  </span></p>
<p><span style="font-size: x-small;"><span class="sectionBody">In this paper, I am only building the top level of the player, and not the underlying library dependencies.</span></span></p>
<h3><span style="font-size:10.0pt;font-family:Arial"><strong class="sectionHeadingText">Setting up the build environment </strong></span></h3>
<p class="MsoNormal">Download and install the moblin dev image from here</p>
<p class="MsoNormal"><span style="font-size: x-small; font-family: Arial;"><a href="http://moblin.org/documentation/moblin-sdk/download-development-images" style="color: blue; text-decoration: underline; text-underline: single">http://moblin.org/documentation/moblin-sdk/download-development-images</a>6</span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Create a USB or CD and install. Instructions here:</p>
<p class="MsoNormal"><span style="font-size: x-small; font-family: Arial;"><a href="http://moblin.org/documentation/test-drive-moblin/using-moblin-live-image" style="color: blue; text-decoration: underline; text-underline: single">http://moblin.org/documentation/test-drive-moblin/using-moblin-live-image</a></span></p>
<p class="MsoNormal"> </p>
<h3><span style="font-size: x-small;"><span style="font-family: Arial"><strong class="sectionHeadingText">Building the media player gui</strong></span></span></h3>
<h3><span style="font-size: x-small;"><span style="font-family: Arial"><strong class="sectionHeadingText">Downloading the sources</strong></span></span></h3>
<p class="MsoNormal"><span class="sectionBody">Boot up Mobln2<br /><br />In directory of your choice get the Hornsey code</span></p>
<p class="MsoNormal"><span class="sectionBody">    git clone git://git.moblin.org/hornsey</span></p>
<p class="sectionBody"> </p>
<p class="MsoNormal"><span style="font-size: x-small;"><span class="sectionBody">Directory structure should look similar to this:</span></span></p>
<p class="MsoNormal"><span style="font-size: x-small;"><span class="sectionBody">    &lt;dev dir&gt;/hornsey</span></span></p>
<p class="MsoNormal"><span class="sectionBody">Go into the hornsey directory and change the branch to match  the moblin dev image<br />e.g.  </span></p>
<p class="MsoNormal"><span style="font-size: x-small;"><span class="sectionBody">        </span></span><span style="font-family: Courier New;"><span class="sectionBody"> cd hornsey<br />         git checkout –b origin/moblin-2.0<br /></span></span><span style="font-size:10.0pt;font-family:Arial"><br /></span><span style="font-style: normal; font-variant: normal; font-weight: normal; font-family: Arial"><span class="sectionBody"> </span></span><span class="sectionBody">FYI you can get a list of branches by doing<br />       </span><span style="font-size: x-small;"><span style="font-family: Arial"><br /><span class="sectionBody">    </span></span><span class="sectionBody">    </span></span><span class="sectionBody">git  branch –a <br /> </span></p>
<h3><span style="font-size:10.0pt;font-family:Arial"><strong class="sectionHeadingText">Building the sources</strong></span></h3>
<p><span style="font-size: x-small; font-family: Arial;"><b class="sectionHeadingText">Choosing which compiler </b></span></p>
<p><span style="font-size: x-small; font-family: Arial;"><b class="sectionHeadingText">Building with GCC</b></span></p>
<p><span class="sectionBody">For our installation, we'll use the environment variable  $PREFIX custom directory so as not to override any existing installation</span></p>
<p><span class="sectionBody">     export PREFIX=~/dv/hornsey/gcc<br />    ./autogen.sh --prefix=$PREFIX</span></p>
<p><span style="font-size: x-small; font-family: Arial;"><b>Building with ICC</b></span></p>
<p><span class="sectionBody">To build the library with the Intel compiler use the following commands</span></p>
<p><span class="sectionBody">    export CC=icc<br />    export CXX=icc<br />    export CFLAGS="-02 -g"<br /> </span></p>
<p><span class="sectionBody">   export PREFIX=~/dv/hornsey/icc<br />  ./autogen.sh --prefix=$PREFIX</span></p>
<p><span style="font-size: x-small; font-family: Arial;"><b class="sectionHeadingText">Continuing the build</b></span></p>
<p><span class="sectionBody">When autogen has completed you should get a message similar to the display below </span></p>
<p class="MsoNormal"><span class="sectionBody"> </span></p>
<p class="MsoNormal"><span class="sectionBody">      config.status: config.h is unchanged<br />      config.status: executing depfiles commands<br />      config.status: executing default-1 commands<br />      config.status: executing po/stamp-it commands<br />      Now type `make' to compile hornsey</span></p>
<p class="MsoNormal"><span class="sectionBody"> </span></p>
<p><span class="sectionBody">Now build the source by calling make <br /></span></p>
<p><span class="sectionBody">        <br />        make clean<br /><br />        make<br /><br />The progress of the make will be reported, the last few lines looking similar to this:</span></p>
<p><span class="sectionBody">          CC    hrn-controls.o<br />          CC    nbtk-im-label.o<br />          CC    hornsey<br />        Making all in data<br />        Making all in po<b> </b></span></p>
<p class="sectionHeadingText"><b>Installing the </b><b>media player</b></p>
<p>To install the new new media player call:</p>
<p>      make install</p>
<p><span style="font-size: x-small; font-family: Arial;">Checking the contents of the install dir :
<p class="MsoNormal">ls -R  $PREFIX</p>
<p class="MsoNormal">    /home/sblairch/dv/hornsey/gcc:<br />                bin  share</p>
<p class="MsoNormal">    /home/sblairch/dv/hornsey/gcc/bin:<br />                hornsey</p>
<p>    /home/sblairch/dv/hornsey/gcc/share:<br />                applications  dbus-1  hornsey  icons  locale</p>
<h3><span style="font-size: x-small;"><span class="sectionHeadingText">Running the Media Player</span></span></h3>
<p><span style="font-family: Arial;">The media player can be invoked by simple calling the executable  &lt;install dir&gt;/bin/hornsey</span></p>
</span></p> ]]></description>
      <link>http://software.intel.com/en-us/articles/building-the-moblin-media-player-with-the-intel-compiler</link>
      <pubDate>Tue, 17 Nov 2009 00:33:48 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/building-the-moblin-media-player-with-the-intel-compiler#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/building-the-moblin-media-player-with-the-intel-compiler</guid>
      <category>Mobility</category>
      <category>Tools</category>
      <category>Intel® Atom™ Software Developer Community</category>
      <category>MID</category>
    </item>
    <item>
      <title>Annoying error of Uninitialized Memory Access</title>
      <description><![CDATA[ While analyzing a huge project at the mi4 level of Intel® Parallel Inspector, a user may get a lot of errors referred as 'Uninitialized Memory Access' which might considered as a false positives. In some cases these errors do not reflect the real problem in the application and a bunch of such errors might be annoying. Before getting rid of these errors with the Suppressions mechanism let’s consider the problem. The result of a mi4 level memory analysis of simple program is represented on the picture below. Inspector fires the Uninitialized Memory Access error blaming memcpy() function which reads the uninitialized memory pointed by foo.  <br /><br />
<div><img alt="u1.JPG" title="u1.JPG" src="http://software.intel.com/file/22800" /></div>
<br /><br /> Looking at the sample code, we can conclude that the flagged read operation does not affect correctness of the whole application, although it is not good engineering practice.  <br /><br />
<div>
<pre name="code" class="cpp:nogutter:nocontrols">int main() {
	char *foo = (char*)malloc(100);
	char *bar = (char*)malloc(100);
	memcpy(bar, foo, 100);
	
	free(bar);
	free(foo);
	return 0;
}
</pre>
</div>
<br /> <br /> It should be mentioned that Inspector does not report the error on the mi2 or mi3 levels (and not at mi1). However, a user might want to hide such reports for the sake of not littering the real errors list. There is an easy way to do that with the help of suppressions.  <br /> In the Details view of the results, select 'Read' observation for this problem, right-click the mouse for the context menu and choose 'Suppress...'.  <br /><br />
<div><img alt="u2.JPG" title="u2.JPG" src="http://software.intel.com/file/22801" /></div>
<br /><br /> In the private suppression dialog create a filter with 'Uninitialized memory access' problem and 'Read' description. For all other columns (module, function, etc.) you may set * (all) depending of scope of interest. The setting will be saved in the .sup file which can be reused with any other project if made public (Tools &gt; Options &gt; Intel Parallel Inspector &gt; General &gt; Suppressions). <br /><br />
<div><img alt="u3.JPG" title="u3.JPG" src="http://software.intel.com/file/22796" /></div>
<br /><br />This can also be set by selecting 'Private Suppression: Delete problems' in the Configure Analysis dialog box before you click on 'Run Analysis'.<br /><br />
<div><img alt="u4.JPG" title="u4.JPG" src="http://software.intel.com/file/22797" /></div>
<br /><br />The error will not appear after level mi4 analysis is completed.  <br /><br />
<div><img alt="u5.JPG" title="u5.JPG" src="http://software.intel.com/file/22802" /></div>
<br /><br /> Where an instruction is added to code that uses the uninitialized memory in a way that might affect correctness, Inspector should report the error regardless if 'Uninitialized memory access' is suppressed. Consider the following sample code. A printf() instruction sending the content of uninitialized memory to the output is added to the initial sample. <br /><br />
<div>
<pre name="code" class="cpp:nogutter:nocontrols">int main() {
	char *foo = (char*)malloc(100);
	char *bar = (char*)malloc(100);
	memcpy(bar, foo, 100);
	printf("%c\n", bar[100]);//referencing uninitialized mem
	free(bar);
	free(foo);
	return 0;
}
</pre>
</div>
<br /><br /> Inspector will report ‘Invalid memory Access’ error on each level mi2-mi4 and flag the source code line containing the prinf() call. <br /><br />
<div><img alt="u6.JPG" title="u6.JPG" src="http://software.intel.com/file/22803" /></div>
<div><br /></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/error-uninitialized-memory-access</link>
      <pubDate>Fri, 13 Nov 2009 06:17:29 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/error-uninitialized-memory-access#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/error-uninitialized-memory-access</guid>
      <category>Tools</category>
      <category>Intel® Parallel Inspector</category>
    </item>
    <item>
      <title>Optimizations for MSC.Software SimXpert* using Intel® Threading Building Blocks (Intel® TBB)</title>
      <description><![CDATA[ <h1 class="sectionHeading">Executive Summary</h1>
<a target="_blank" href="http://www.mscsoftware.com/Contents/Products/CAE-Tools/SIMXpert.aspx">MSC.Software SimXpert</a>* is a fully integrated simulation environment for performing multidiscipline based analysis. It provides an interactive, graphical interface designed to facilitate the engineer's job of performing end-to-end simulations including native computer-aided design (CAD) import, pre-processing, model set up, solving, post-processing, and reporting. Intel and MSC.Software collaborated on threading of SimXpert, resulting in a significant performance improvement in the tool and increased productivity for users. Intel TBB was the method selected because of its compatibility with SimXpert, i.e., a multi-platform application written in C++. SimXpert has many features beyond typical high performance computing (HPC) number crunching applications, including - complex data-base style manipulations with geometry data and simulations results, complex memory allocation operations, reliance on extensive enterprise class infrastructure C++ code, overall visualization pipeline architecture that fits into the Intel TBB pipeline parallel pattern, and OpenGL* rendering. Threading with Intel TBB was added incrementally, starting with threading an initial 72 key engineering operations, followed by threading the code responsible for producing graphical primitives for fringe plots. Measurements for seven very large customer simulation models on a two socket (2S) Intel® Xeon® processor 5100 series platform (4 threads) showed scaling between 3.8X to 3.9X for the engineering calculations. For the fringe plot optimizations, a speedup ranging from 3% to 44% was achieved. Going forward, MSC.Software will continue the incremental threading approach; next steps will be threading the remaining plot types with possible future implementation of Intel TBB pipeline to overlap processing with I/O.<br /><br />
<h1 class="sectionHeading">Introduction</h1>
To address increasing customer model sizes and align with the multi-core processor roadmaps for hardware vendors, MSC.Software engaged with Intel to thread SimXpert. Intel® Software College provided training for a group of MSC.Software engineers on threading for multi-processor architectures and Intel® Threading Tools (Intel® Thread Checker, Intel® Thread Profiler, and Intel TBB). A multi-phased, incremental threading approach was defined for the project. For Phase One, MSC.Software identified 72 engineering operations in the post processing portion of SimXpert that are responsible for the calculation of various engineering quantities , e.g., von Mises, Principal, Tresca, and Maximum Shear stresses. Intel prototyped the engineering operations and investigated both Intel TBB and OpenMP* for threading implementation. Intel TBB was selected as the best method due to its compatibility for all supported platforms. Its performance was also slightly faster than OpenMP. For Phase Two, code responsible for producing graphical primitives was threaded, which improved performance for fringe plots. This whitepaper discusses the details of these threading implementation phases, the results achieved, and plans for additional threading for SimXpert in future phases.<br /><br />
<h1 class="sectionHeading">Background/Workloads measured</h1>
Once the finite element model has been analyzed, the results can be accessed by SimXpert for post-processing. It was the Post-Processing Component (PPC) of SimXpert that Intel and MSC.Software targeted for threading. This "module" allows the expert analyst to view selected results in a variety of ways such as fringe, deformation, contour, vector, and tensor plots, identify problems, and redesign areas of a structure if necessary. Performance for both threading phases was measured for fringe plots using large simulation models provided by MSC.Software customers. These models represent typical use cases from customers in the Aerospace, Automotive, and General Manufacturing industries. The numerical and graphical loading that occurs is due to several critical factors.<br /><br />
<ul>
<li>Free faces (Figure 1) are the internal and external faces of the model's finite elements where a fringe plot is rendered.</li>
<li>The clustering of the finite element IDs for the elements whose free faces are being rendered directly affects the result data retrieval time.</li>
<li>The dimensionality of the data (i.e. scalar, vector, tensor data type) directly affects the number of data values that are retrieved for post-processing.</li>
<li>The complexity of the engineering derivation that is applied to the initial analysis data to transform it from either a vector or tensor data type to a scalar data type for fringe plot rendering also plays a roll.</li>
</ul>
<br /><br /><img src="http://software.intel.com/file/23672" /><br /><br /><b>Figure 1 - Free Face Rendering on the Model's Finite Elements</b><br /><br />
<h1 class="sectionHeading">Threading SimXpert - Phase One</h1>
The initial targets for threading SimXpert were 72 engineering calculations in the Post-Processing Component (PPC) portion of SimXpert. Transformations were required in the original serial code before it could be parallelized with tbb::parallel_for.<br /><br /><b>Original Serial Code</b><br /><br />
<pre name="code" class="cpp">  for (size_t i=0; i&lt;Size;++i) { 
         deriveFunc(ptr_inArray,ptr_outArray); 
         ptr_inArray += inStride);
         ptr_outArray += outStride); 
  }
</pre>
<br /><br /><b>Transformation to make arrays random access containers</b><br /><br />
<pre name="code" class="cpp">for (size_t i=0;i&lt;Size; ++i) {
     deriveFunc(ptr_inArray[k* inStride],
			ptr_outArray[k * outStride]);
}
</pre>
<br /><br />After completing the transformations, tbb::parallel_for was integrated into the application. MSC.Software relied heavily on other threading tools such as Intel® Thread Checker and Intel® Thread Profiler to ensure correctness and optimum performance. This code represented only 7.4% of the total runtime for SimXpert, but threading resulted in an average of 4.9% improvement in overall performance. Table 1 shows the scaling that was achieved on a 2S 3.0GHz Intel® Xeon® processor 5100 series platform/8GB with Red Hat Linux* 4 update 3.<br /><br />
<table border="0" cellpadding="0" cellspacing="0" class="tableformat1">
<tbody>
<tr>
<td><b>Plot</b></td>
<td><b>File Name/Entity Count</b></td>
<td><b>Serial Time (sec)</b></td>
<td><b>Parallel Time (sec)</b></td>
<td><b>Speedup Factor (Serial Time/Parallel Time)</b></td>
<td><b>Serial Process Time (sec)</b></td>
<td><b>Parallel Process Time (sec)</b></td>
<td><b>%Process Speedup (s-p)/s</b></td>
<td><b>%Time spent in numeric operations</b></td>
</tr>
<tr>
<td>Fringe - Stress, Max Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0kst0.xdb/624924</td>
<td>0.765</td>
<td>0.196</td>
<td>3.903</td>
<td>10.22</td>
<td>9.65</td>
<td>5.579</td>
<td>7.48</td>
</tr>
<tr>
<td>Fringe - Stress, Mid Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0kst0.xdb/624924</td>
<td>0.763</td>
<td>0.195</td>
<td>3.904</td>
<td>10.209</td>
<td>9.635</td>
<td>5.623</td>
<td>7.47</td>
</tr>
<tr>
<td>Fringe - Stress, Min Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0kst0.xdb/624924</td>
<td>0.762</td>
<td>0.197</td>
<td>3.873</td>
<td>10.208</td>
<td>9.636</td>
<td>5.604</td>
<td>7.46</td>
</tr>
<tr>
<td>Fringe - Stress, Tresca Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0kst0.xdb/624924</td>
<td>0.767</td>
<td>0.196</td>
<td>3.905</td>
<td>10.228</td>
<td>9.675</td>
<td>5.410</td>
<td>7.50</td>
</tr>
<tr>
<td>Fringe - Stress, Max Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0ust0.xdb/605288</td>
<td>0.696</td>
<td>0.180</td>
<td>3.874</td>
<td>9.573</td>
<td>9.152</td>
<td>4.401</td>
<td>7.27</td>
</tr>
<tr>
<td>Fringe - Stress, Mid Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0ust0.xdb/605288</td>
<td>0.691</td>
<td>0.181</td>
<td>3.820</td>
<td>9.553</td>
<td>9.110</td>
<td>4.641</td>
<td>7.24</td>
</tr>
<tr>
<td>Fringe - Stress, Min Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0ust0.xdb/605288</td>
<td>0.693</td>
<td>0.179</td>
<td>3.879</td>
<td>9.556</td>
<td>9.114</td>
<td>4.626</td>
<td>7.25</td>
</tr>
<tr>
<td>Fringe - Stress, Tresca Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0ust0.xdb/605288</td>
<td>0.693</td>
<td>0.178</td>
<td>3.886</td>
<td>9.584</td>
<td>9.105</td>
<td>4.998</td>
<td>7.23</td>
</tr>
<tr>
<td>Fringe - Stress, Max Shear Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0ust0.xdb/605288</td>
<td>0.695</td>
<td>0.180</td>
<td>3.861</td>
<td>9.554</td>
<td>9.099</td>
<td>4.766</td>
<td>7.27</td>
</tr>
<tr>
<td>Fringe - Stress, Max Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0jst0.xdb/2394421</td>
<td>2.883</td>
<td>0.731</td>
<td>3.942</td>
<td>39.068</td>
<td>37.007</td>
<td>5.275</td>
<td>7.38</td>
</tr>
<tr>
<td>Fringe - Stress, Mid Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0jst0.xdb/2394421</td>
<td>2.888</td>
<td>0.730</td>
<td>3.956</td>
<td>39.090</td>
<td>36.945</td>
<td>5.486</td>
<td>7.39</td>
</tr>
<tr>
<td>Fringe - Stress, Min Princ Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0jst0.xdb/2394421</td>
<td>2.880</td>
<td>0.730</td>
<td>3.947</td>
<td>39.086</td>
<td>36.816</td>
<td>5.808</td>
<td>7.37</td>
</tr>
<tr>
<td>Fringe - Stress, Tresca Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0jst0.xdb/2394421</td>
<td>2.874</td>
<td>0.730</td>
<td>3.937</td>
<td>37.996</td>
<td>36.833</td>
<td>3.061</td>
<td>7.56</td>
</tr>
<tr>
<td>Fringe - Stress, Max Shear Avg Meth=Avg/Derive, Extrap Meth=Avg</td>
<td>xx0jst0.xdb/2394421</td>
<td>2.894</td>
<td>0.732</td>
<td>3.952</td>
<td>39.433</td>
<td>38.277</td>
<td>2.932</td>
<td>7.34</td>
</tr>
<tr>
<td>Average</td>
<td></td>
<td></td>
<td></td>
<td>3.90</td>
<td></td>
<td></td>
<td>4.872</td>
<td>7.37</td>
</tr>
<tr>
<td>Minimum</td>
<td></td>
<td></td>
<td></td>
<td>3.82</td>
<td></td>
<td></td>
<td>2.932</td>
<td>7.23</td>
</tr>
<tr>
<td>Maximum</td>
<td></td>
<td></td>
<td></td>
<td>3.96</td>
<td></td>
<td></td>
<td>5.808</td>
<td>7.56</td>
</tr>
</tbody>
</table>
<br /><b>Table 1 - Summary for Plots Where Serial Time in Numeric Operations Was Greater than 0.5 Seconds</b><br /><br />
<h1 class="sectionHeading">Threading SimXpert - Phase Two</h1>
A key goal for the user experience with SimXpert is quick post-processing of analysis result data. Post-processing analysis involves transforming the initial analysis data to the final numerical form specified by the engineer, then mapping it to its graphical primitive representation. For example, an engineer may want to direct SimXpert to render color fringe plots of von Mises, Maximum Principal, and Maximum Shear stress to investigate the performance of the simulation model relative to its applied loading. Figure 2 demonstrates a fringe plot of the von Mises stress distribution across a simple connecting rod model.<br /><br /><img src="http://software.intel.com/file/23673" /><br /><br /><b>Figure 2 - Fringe Plot of von Mises stress</b><br /><br />Phase Two for SimXpert applied threading to the portion of the code responsible for graphical primitive production for fringe plots. This code accounted for approximately 35% of the total plot time. As a proof of concept, MSC.Software and Intel prototyped the threaded code and saw scaling up to 3.2X on 4 cores. The method used involved the production and packaging of graphics primitives into containers. The program flow was modified as follows -<br /><br />
<table border="0" cellpadding="0" cellspacing="0" class="tableformat1">
<tbody>
<tr>
<td><b>Serial</b></td>
<td><b>Parallel</b></td>
</tr>
<tr>
<td>Iterate over all elements/faces</td>
<td>Divide face/element iteration over multiple threads with tbb::parallel_for</td>
</tr>
<tr>
<td>Allocate (or reallocate) memory as needed for containers</td>
<td>Local storage holds elements in each Intel TBB task</td>
</tr>
<tr>
<td>Do calculations on each element and produce graphical primitives</td>
<td>Serial code works on the local containers without modification</td>
</tr>
<tr>
<td>Copy primitives into container (flat array) using memcpy</td>
<td>Partial results in each local container safely get combined into tbb::concurrent_vector</td>
</tr>
<tr>
<td>Sequentially bump container pointer, stored in a member variable</td>
<td></td>
</tr>
<tr>
</tr>
</tbody>
</table>
<br /><b>Table 2 - Serial versus Parallel program flow</b><br /><br />Performance improvements were observed when the models ran on a 2S 2.66GHz Intel® Xeon® processor 5100 series platform/8GB Memory/Windows* XP Professional X64 Edition Version 2003 SP2 (specifics in Table 3):<br /><br />
<ul>
<li>3D solid finite element simulation model representing the casting of a V6 engine block (modelsec.xdb) with 98,814 free faces and a 358.7MB file size achieved a 28% performance improvement</li>
<li>3D solid finite element simulation model representing a turbine blade (xx0kst0.xdb) with 65,416 free faces and 513.3MB file size achieved between 3 and 10% performance improvement for various plots</li>
<li>3D solid finite element simulation model representing a casting of kitchen appliance housing (xx0ust0.xdb) with 90,460 free faces and a 281.7MB file size achieved between 6 and 26% performance improvement for various plots</li>
<li>2D and 3D finite element simulation model representing a car chassis (xx0o.xdb) with 1,209,323 free faces and a 438.8MB file size achieved between 19 and 27% performance improvement for various plots</li>
<li>3D solid finite element simulation model representing the central hub of an aircraft propeller (xx0fst0.xdb) with 89, 935 free faces and a 165.2MB file size achieved between 10 and 30% performance improvement for various plots</li>
<li>3D solid finite element simulation model representing the casting of a straight 6 engine block (xx0jst0.xdb) with 461,808 free faces and a 1028.5MB file size achieved between 15 and 44% performance improvement for various plots</li>
</ul>
<br />
<table border="0" width="740" cellpadding="0" cellspacing="0" class="tableformat1">
<tbody>
<tr>
<td><b>Workload / Description</b></td>
<td><b>file size/# free faces</b></td>
<td width="50"><b>Fringe - Eigen Vectors, Translational - % speedup</b></td>
<td width="50"><b>Fringe - Stress, Von Mises Avg Meth=Avg/ Derive, Extrap Meth=Avg - % speedup</b></td>
<td width="50"><b>Fringe - Stress, Max Princ Avg Meth=Avg/ Derive, Extrap Meth=Avg - % speedup</b></td>
<td width="50"><b>Fringe - Stress, Tresca Avg Meth=Avg/ Derive, Extrap Meth=Avg - % speedup</b></td>
<td width="50"><b>Fringe - Stress, Octal Avg Meth=Avg/ Derive, Extrap Meth=Avg - % speedup</b></td>
<td width="50"><b>Fringe - Stress, Inv 1 Avg Meth=Avg/ Derive, Extrap Meth=Avg - % speedup</b></td>
<td width="50"><b>Fringe - Stress, Max Shear Avg Meth=Avg/ Derive, Extrap Meth=Avg - % speedup</b></td>
<td width="50"><b>Fringe - Disp Trans, Mag - % speedup</b></td>
</tr>
<tr>
<td>modelsec (engine block)</td>
<td>358.7 MB/98,814</td>
<td>28.092</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>xx0kst0 (turbine blade)</td>
<td>513.3 MB/65,416</td>
<td></td>
<td>3.729</td>
<td>8.103</td>
<td>8.968</td>
<td>3.373</td>
<td>2.926</td>
<td>8.199</td>
<td>10.334</td>
</tr>
<tr>
<td>xx0ust0 (housing)</td>
<td>281.7 MB/90,460</td>
<td></td>
<td>8.422</td>
<td>13.03</td>
<td>10.733</td>
<td>6.024</td>
<td>6.143</td>
<td>10.628</td>
<td>26.203</td>
</tr>
<tr>
<td>xx0o (car chassis)</td>
<td>438.8 MB/1,309,323</td>
<td></td>
<td>19.437</td>
<td>19.791</td>
<td>20.03</td>
<td>19.647</td>
<td>19.671</td>
<td>19.902</td>
<td>27.543</td>
</tr>
<tr>
<td>xx0fst0 (propeller hub)</td>
<td>165.2 MB/83,935</td>
<td></td>
<td>10.013</td>
<td>10.863</td>
<td>10.73</td>
<td>10.472</td>
<td>10.867</td>
<td>11.007</td>
<td>29.824</td>
</tr>
<tr>
<td>xx0jst0 (straigt 6-cyl engine block)</td>
<td>1028.5 MB/461,808</td>
<td></td>
<td>15.061</td>
<td>18.856</td>
<td>18.988</td>
<td>15.636</td>
<td>15.463</td>
<td>19.585</td>
<td>43.807</td>
</tr>
</tbody>
</table>
<br /><!--     
<table width="690" cellpadding="0" cellspacing="0" border="0" class="tableformat1">
<tr>
<td><b>Workload/Description</b></td>
<td><b>file size/# free faces</b></td>
<td><b>Fringe - Eigen Vectors, Translational - % speedup</b></td>
<td><b>Fringe - Stress, Von Mises Avg Meth=Avg/Derive, Extrap Meth=Avg - % speedup</b></td>
<td><b>Fringe - Stress, Max Princ Avg Meth=Avg/Derive, Extrap Meth=Avg - % speedup</b></td>
<td><b>Fringe - Stress, Tresca Avg Meth=Avg/Derive, Extrap Meth=Avg - % speedup</b></td>
</tr>
<tr>
<td>modelsec (engine block)</td>
<td>358.7 MB/98,814</td>
<td>28.092</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>xx0kst0 (turbine blade)</td>
<td>513.3 MB/65,416</td>
<td>&nbsp;</td>
<td>3.729</td>
<td>8.103</td>
<td>8.968</td>
</tr>
<tr>
<td>xx0ust0 (housing)</td>
<td>281.7 MB/90,460</td>
<td>&nbsp;</td>
<td>8.422</td>
<td>13.03</td>
<td>10.733</td>
</tr>
<tr>
<td>xx0o (car chassis)</td>
<td>438.8 MB/1,309,323</td>
<td>&nbsp;</td>
<td>19.437</td>
<td>19.791</td>
<td>20.03</td>
</tr>
<tr>
<td>xx0fst0 (propeller hub)</td>
<td>165.2 MB/83,935</td>
<td>&nbsp;</td>
<td>10.013</td>
<td>10.863</td>
<td>10.73</td>
</tr>
<tr>
<td>xx0jst0 (straigt 6-cyl engine block)</td>
<td>1028.5 MB/461,808</td>
<td>&nbsp;</td>
<td>15.061</td>
<td>18.856</td>
<td>18.988</td>
</tr>
</table>
<br /> 
<table width="690" cellpadding="0" cellspacing="0" border="0" class="tableformat1">
<tr>
<td><b>Workload/Description</b></td>
<td><b>file size/# free faces</b></td>
<td><b>Fringe - Stress, Octal Avg Meth=Avg/Derive, Extrap Meth=Avg - % speedup</b></td>
<td><b>Fringe - Stress, Inv 1 Avg Meth=Avg/Derive, Extrap Meth=Avg - % speedup</b></td>
<td><b>Fringe - Stress, Max Shear Avg Meth=Avg/Derive, Extrap Meth=Avg - % speedup</b></td>
<td><b>Fringe - Disp Trans, Mag - % speedup</b></td>
</tr>
<tr>
<td>modelsec (engine block)</td>
<td>358.7 MB/98,814</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>xx0kst0 (turbine blade)</td>
<td>513.3 MB/65,416</td>
<td>3.373</td>
<td>2.926</td>
<td>8.199</td>
<td>10.334</td>
</tr>
<tr>
<td>xx0ust0 (housing)</td>
<td>281.7 MB/90,460</td>
<td>6.024</td>
<td>6.143</td>
<td>10.628</td>
<td>26.203</td>
</tr>
<tr>
<td>xx0o (car chassis)</td>
<td>438.8 MB/1,309,323</td>
<td>19.647</td>
<td>19.671</td>
<td>19.902</td>
<td>27.543</td>
</tr>
<tr>
<td>xx0fst0 (propeller hub)</td>
<td>165.2 MB/83,935</td>
<td>10.472</td>
<td>10.867</td>
<td>11.007</td>
<td>29.824</td>
</tr>
<tr>
<td>xx0jst0 (straigt 6-cyl engine block)</td>
<td>1028.5 MB/461,808</td>
<td>15.636</td>
<td>15.463</td>
<td>19.585</td>
<td>43.807</td>
</tr>
</table>
<br />--><b>Table 3 - Speedup for fringe plot optimization (average of three runs)</b><br /><br />
<h1 class="sectionHeading">MSC.Software Testimonial</h1>
"We are very pleased with the progress we have made in a very short time incorporating parallel algorithms from Intel® TBB in our SimXpert code. Intel's technical leadership and assistance facilitated our decision to introduce multithreaded versions of SimXpert moving forward, and we're pleased to see the breadth of TBB parallel algorithms, such as Parallel For and Parallel Pipeline, that are available for our future consideration. <br /><br />"We were very excited to see the near theoretical performance scaling that was achieved by applying multithreading to the post processing portion of SimXpert that is responsible for calculation of various engineering quantities such as von Mises, Principal, Tresca, and Maximum Shear stresses. Performing tests with seven very large customer simulation models, on a machine equipped with two Intel® Core™ 2 Duo processors, we found that the scaling of the engineering calculations ranged from 3.86 to 3.9. Achieving these near perfect results has increased the excitement within SimXpert development team to expand the use of multi-threading throughout the product."<br /><br />George Truesdell<br />Manager, Product Development<br /><br />
<h1 class="sectionHeading">Next Steps</h1>
In future releases (following SimXpert R4), the remaining plot types will be threaded. Intel TBB pipeline will also be evaluated for threading overlap processing and buffered I/O. Intel engineers have prototyped an Intel TBB pipeline that uses the engineering calculations from Phase One. Intel Thread Profiler identified an issue in this initial implementation with buffer thrash. When fixed, the desired scalability was achieved. Matching the pipeline token count to the hardware thread count produced "laminar" scheduling and eliminated buffer thrash, resulting in 3.9X scaling on 4 cores and 7.5X to 7.8X scaling on 8 cores. <br /><br /><b>Concept</b><br /><br /><img src="http://software.intel.com/file/23674" /><br /><br /><img src="http://software.intel.com/file/23675" /><br /><br />
<h1 class="sectionHeading">Conclusion</h1>
The MSC.Software project to add threading to SimXpert was successful, resulting in a significant performance improvement in SimXpert and a faster turnaround time for end-users, leading to increased productivity. SimXpert was one of the first commercial applications to release with Intel TBB. Intel TBB was an ideal tool for this project since SimXpert is a multi-platform application written in C++ that has many features beyond typical HPC number crunching applications. In addition, the code of SimXpert was well suited to the incremental threading approach that MSC.Software chose. For Phase One, measurements for seven very large customer simulation models on a 2S Intel® Xeon® processor 5100 series platform (4 threads) showed scaling between 3.8X to 3.9X for the engineering calculations. For Phase Two, optimizations for fringe plots resulted in a speedup ranging from 3 to 44% for measured workloads. MSC.Software plans to continue with the incremental threading approach for the remaining plot types, and investigate Intel TBB pipeline for overlapping processing and I/O.<br /><br />
<h1 class="sectionHeading">MSC.Software SimXpert</h1>
MSC.Software's SimXpert* is a fully integrated simulation environment for performing multidiscipline based analysis. It provides an interactive, graphical interface designed to facilitate the engineer's job of performing end-to-end simulations including native CAD import, pre-processing, model set up, solving, post-processing, and reporting. Designed for both analysts and design engineers, SimXpert scales across multiple engineering physics offering a state-of-the art, easy to use graphical interface for performing multidiscipline simulations, either coupled or chained, at any stage of the design process. The SimXpert scalable workspaces include a variety of discipline solutions including structural linear and nonlinear analysis, thermal, multibody dynamics, and explicit crash simulation. With built-in, bi-directional CAD associativity, engineers gain superior interoperability between SimXpert and multiple CAD systems including V5, Pro/Engineer, and Unigraphics. Additionally, SimXpert provides a unique CAE graphical template builder and runner to allow analysts to quickly automate a variety of steps during analysis such as model set up, pre-processing, post-processing or other mundane, time consuming tasks.<br /><br />
<h1 class="sectionHeading">About the Authors</h1>
<b>Kathy Carver</b> joined Intel in 1992 and is currently an application engineer in Intel's Software and Services Group (SSG) working on optimizing CAE applications to take advantage of the Intel's latest hardware and software innovations. Previously at Intel, she worked on pre-silicon validation of the first Intel® Itanium® processor and on development tools for Intel's Supercomputer Systems Division (SSD). She holds a BS in Computer Science from Western Kentucky University, Bowling Green, KY.<br /><br /><b>Mark Lubin</b> is a Parallel Applications Engineer at Intel within the Software and Services Group (SSG) where he is working on optimizing of HPC applications. Prior to joining Intel, Mark did his postdoctoral research at UCSD, where he developed quantum molecular dynamics computer models and software for parallel computers. He has published over 15 peer-reviewed publications. Mark received his M.S. in EE from Moscow Institute of Electronic Technology, Russia. He received his Ph.D. in physics from the University of Central Florida.<br /><br /><b>Bonnie Aona</b> is a software engineer in the Intel Compilers and Languages Group within the Software and Services Group (SSG) working on optimizing and testing applications to take advantage of the latest Intel software and hardware innovations to achieve high performance and parallelism. Bonnie's career leverages Software Quality Assurance and program management with software design for complex high performance applications for computer graphics, real-time systems, scientific research, manufacturing, e-Commerce, aerospace and healthcare. She holds Masters degrees in Electrical and Computer Engineering from University of California at Davis.<br /><br /><br />*Other names and brands may be claimed as the property of others. ]]></description>
      <link>http://software.intel.com/en-us/articles/optimizations-for-mscsoftware-simxpert-using-intel-threading-building-blocks-intel-tbb</link>
      <pubDate>Thu, 12 Nov 2009 15:31:02 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/optimizations-for-mscsoftware-simxpert-using-intel-threading-building-blocks-intel-tbb#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/optimizations-for-mscsoftware-simxpert-using-intel-threading-building-blocks-intel-tbb</guid>
      <category>Visual Computing</category>
    </item>
    <item>
      <title>The Cost Benefit Case for Database Migration to Intel Servers</title>
      <description><![CDATA[ <p style="text-align: center;"><i>Value Proposition For Migration:<br />Cost/Benefit Case For IBM DB2 9.7 And Intel Xeon Processor 5500 And 7400 Series-Based Servers</i></p>
<p style="text-align: center;"><img src="http://software.intel.com/file/23676" /></p>
<p><b>Consolidation Opportunities</b></p>
<p>At yearend 2004, the typical U.S. Fortune 500 corporation contained fewer than 300 server database instances. By the end of 2009, the number will have increased to more than 2,000. Similar trends have occurred in midsize business, in the public sector and in other types of organization worldwide. The fastest rates of growth have been among databases deployed on small x86 servers.</p>
<p>Multiplication of server databases has contributed to “server sprawl,” resulting in low levels of utilization, unnecessary duplication of resources, and inflation of system administration and facilities costs.</p>
<p>Although server consolidation has become pervasive, to date it has been more commonly applied to application and infrastructure servers, rather than database servers. Database consolidation has often raised complex performance issues, making it more difficult to plan for and prepare for initiatives.</p>
<p>One implication is that, in many organizations, the potential for database server consolidation has been little exploited. At a time of economic pressures, it is an obvious area of potential cost savings. Key technology shifts have made consolidation increasingly viable. More powerful multicore processors, along with the growing sophistication of server and database platforms are creating new opportunities.</p>
<p>This report examines the cost savings that may be realized by upgrading and consolidating IBM DB2 databases. Three-year costs are compared for the following:</p>
<ul>
<li>2005 technologies: DB2 Version 8.2 is deployed on xSeries 335 two-socket servers with singlecore Intel Xeon processors and the Windows Server 2003 operating system.</li>
<li>Current technologies: DB2 Version 9.7 is deployed on (1) IBM System x3550 M2 two-socket servers with quad-core Intel Xeon 5500 processors, and (2) IBM System x3850 M2 four-socket servers with six-core Intel Xeon 7400 processors. The Windows 2008 operating system is employed on both System x platforms. </li>
</ul>
<p>Savings are realized in a number of areas, including hardware maintenance, support for DB2 databases and Windows operating systems, and system administration and energy costs.</p>
<p>Calculations for DB2 9.7 deployed on System x3550 M2 and x3850 M2 servers allow for transition costs. These include acquisition and installation of new servers, along with database consolidation, staff retraining and related costs.</p>
<p><b>Cost Comparisons</b></p>
<p>Comparisons are based on six installations with between 25 and 231 DB2 instances employed for a variety of applications in manufacturing, aerospace, government, IT services, insurance and financial services organizations.</p>
<p>Numbers of instances, servers and full time equivalent (FTE) system administration (sysadmin) personnel for use of 2005 technologies are based on user-supplied data. Although organizations employed a variety of two-socket x86 servers, installed bases were normalized to use of DB2 8.2 and IBM xSeries 335 server<br />models for calculation purposes.</p>
<p>Scenarios were then developed for migration of DB2 instances to the latest DB2 Version 9.7 and consolidation of these to System x3550 M2 and x3850 M2 servers. Scenarios draw upon the experiences of more than 30 organizations that have conducted DB2 consolidation initiatives. They are consistent with “best practice” norms for the numbers of instances and workloads that may run on these platforms.</p>
<p>DB2 instances include mixes of DB2 Enterprise Edition and Workgroup Edition, while servers are configured with Enterprise and Standard Editions of Windows Server 2003 and (for Current Technologies scenarios) Windows Server 2008.</p>
<p>Software support costs include IBM Software Maintenance (SWMA) and Microsoft Software Assurance for DB2 and Windows Server licenses respectively. Hardware, maintenance and software support costs are calculated based on “street” prices; i.e., discounted prices paid by the organizations upon which<br />installations are based.</p>
<p>Current Technologies scenarios do not include use of virtualization tools such as VMware and Microsoft Hyper-V. Although these may be employed to support multiple database instances, organizations that contributed to this report were able to achieve high levels of database consolidation without them.</p>
<p><b>Download the rest of the PDF <a href="http://software.intel.com/file/23677">here</a>.</b></p> ]]></description>
      <link>http://software.intel.com/en-us/articles/the-cost-benefit-case-for-database-migration-to-intel-servers</link>
      <pubDate>Wed, 11 Nov 2009 17:39:10 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/the-cost-benefit-case-for-database-migration-to-intel-servers#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/the-cost-benefit-case-for-database-migration-to-intel-servers</guid>
      <category>Xeon</category>
    </item>
    <item>
      <title>Updated Tools Spice Up  New Ghostbusters* Game </title>
      <description><![CDATA[ <p><b>Terminal Reality used the popular Intel® Graphics Performance Analyzers to bring the newest <i>Ghostbusters*</i> game to life.</b></p>
<p>Admit it. For the past 25 years, you ain’t been afraid of no ghosts. Thanks to a certain movie about four intrepid heroes, we all know a well-aimed proton stream and a handy trap can bag any ghoul within range. The venerable <i>Ghostbusters*</i> franchise has spun out at least eight different video games since 1984, each taking advantage of the movie’s supernatural feel and sci-fi effects. The newest version <i>Ghostbusters: The Video Game</i> has received good reviews since its release earlier in 2009, thanks in no small part to its updated effects.</p>
<p>As Mark Randel, president and chief technology officer of Terminal Reality, Inc. described it in his <a href="http://software.intel.com/en-us/blogs/author/mark-randel/">blog</a>, “The results of having a massively parallel game engine were stunning. When we finally got rendering and simulation of the game in parallel in the last weeks of <i>Ghostbusters</i>, the game became solely render-bound. Jobs were totally asynchronous, and we were able to fully utilize three to four cores. When there wasn’t any action in the game, the game was waiting on the vertical blank. With a lot of action, the job model allowed the heavy lifting to be absorbed over as many processors as the system had.”</p>
<p>The game is published by Atari, who wanted a great mainstream game to reach the largest possible target market. Atari pushed the team to make sure the game was optimized for integrated graphics systems, in order to maximize their investment and ensure good performance.The developers at Terminal Velocity took extensive advantage of Intel® Graphics Performance Analyzers (Intel® GPA) and their membership in the Intel® Software Partner Program to bring out the best special effects required to chase down vapors, slimers, and poltergeists. Intel’s tools helped identify a performance bottleneck so the game could be optimized for desktops and laptops that use Intel® Graphics processors. And once performance problems are solved for the Intel® Graphics world, they are essentially solved for the rest of the graphics universe.</p>
<p>Thanks to fine-tuning for multi-core and extensive testing for bottlenecks, <i>Ghostbusters: The Video Game</i> really shines, especially on the newest Intel-based systems. What follows is a step-by-step analysis performed on an exceptionally low-performing scene in <i>Ghostbusters: The Video Game</i> by a team consisting of both Intel and Terminal Reality developers. The team’s comprehensive work is a model for anyone who wants to troubleshoot similar game-performance issues.</p>
<p><b>Optimizing a Slow Game Scene</b><br />Jeff LaFlam and Shankar Swamy, application engineers with the Intel® Visual Computing Enabling Team, worked with Mark Randel, president and chief technology officer of Terminal Reality, Inc., to detect and analyze a serious bottleneck in a specific scene in <i>Ghostbusters</i>. This scene was running so slowly, and with a barely acceptable frame rate, that the gameplay was visually stuttering. This scene had stymied progress in optimizing the game’s overall performance.</p>
<p>The troublesome scene contains about 200,000 books in a library where two human characters and a “ghost” character might interact. When the characters are fully outside the library they cannot see the books; hence, there is no need for the game to render the books. However, as a character enters the library, the books are gradually exposed to the viewer and displayed in the gameplay scene.</p>
<p>The team of LaFlam, Swamy, and Randel analyzed this scene to determine solutions for increasing the frame rate.</p>
<p><b>Step 1: Visually Analyze the Scene</b><br />The team began by visually analyzing the entire scene sequence in order to determine a direction for further investigation.</p>
<p style="text-align: center;"><img width="436" src="http://software.intel.com/file/23640" height="344" /></p>
<p>The team observed that when a character was staring at the wall and the books were partially exposed, the frame rate was very low and the scene stuttered (Fig. 1). When they then advanced the scene and moved a character closer to the wall but with no books visible in the scene, the frame rate did not change noticeably. This indicated to the team that the books were being rendered in the scene even when they were not visible.</p>
<p><b>Step 2: Render with Z-Test Disabled</b><br />The goal of the second step in the analysis was to determine how many occluded objects were being rendered in the library scene. This was done by rendering all the objects in the scene with the Z-test disabled.</p>
<p style="text-align: center;"><img width="451" src="http://software.intel.com/file/23641" height="371" /></p>
<p>In Figure 2, notice that the character is standing very close to the wall and staring directly at it. Prior to optimizing this scene, during normal gameplay (with the Z-test enabled), the books shown would not be visible because of the direction the character is looking. However, because the team disabled the Z-test for Figure 2, all the books being rendered by the game are also now visible.</p>
<p>This confirmed that books are being rendered all the time—even when they are completely occluded during normal gameplay. Of course, only the books that are visible to the characters at any point in the game play need to be rendered.</p>
<p><b>Step 3: Conduct a Single-Frame Analysis</b></p>
<p>The team wanted to investigate other possible hot spots in the scene by using the Intel® GPA Frame Analyzer.</p>
<p style="text-align: center;"><img width="423" src="http://software.intel.com/file/23642" height="329" /></p>
<p>According to the Intel® GPA Frame Analyzer, the Library scene had 12,564 Draw() calls (Fig. 3). However, other scenes in the game typically had about 3,000 Draw() calls, and those scenes had higher frame rates. The conclusion was that there were too many Draw() calls in the Library scene, indicating to the team that further testing should be aimed at reducing the number of Draw() calls in the troublesome scene. The team also wanted to investigate how many of these calls were coming from the rendering of the books.</p>
<p><b>Step 4: Estimate the Cost of Rendering the Books</b><br />The team placed the camera in front of a wall that had no objects behind it. Because this is a third-person view game, the characters in the Library scene are still rendered—as they should be. However, the books, which are now behind the camera, are invisible and should not be submitted for rendering due to the game’s culling algorithm.</p>
<p>The team wanted a reliable estimate of the cost of rendering the books. By submitting the scene to the Intel GPA Frame Analyzer (Fig.4), the team discovered the scene had 14,731 Draw() calls, confirming that the books were quite expensive to render. In fact, the overhead of rendering the books is significant enough that it negatively affected the frame rate when the books were occluded yet still rendered.</p>
<p><b>Step 5: Verify the Potential Gains</b><br />Next, the team included a software switch in the graphical user interface (GUI) that allowed them to completely turn off rendering for all the books (whether visible or occluded). They then rendered the scene by dynamically turning this switch on and off, allowing them to determine the change in frame rate when books were rendered versus when they weren’t.</p>
<p>When book rendering was turned off, the frame rate increased by approximately 2.5 times, as shown by the data from the Intel® GPA System Analyzer within the red oval in Figure 5. this indicated that the cost of rendering the books in the scene was quite high.</p>
<p style="text-align: center;"><img src="http://software.intel.com/file/23643" /></p>
<p>At this point in the analysis, the obvious options for increasing the performance of this scene were either:<br />  •  Don’t render the books that aren’t visible in the scene, or <br />  •  Reduce the number of books in the scene.</p>
<p><b>Step 6: A Third Solution is Created</b></p>
<p>When the Intel team shared their findings with the developers at Terminal Reality, Mark Randel suggested—and implemented—a third solution: a “pixel height test.”<br /><br />Figure 6 shows the idea behind the pixel height test.  The bounding sphere of an object is shown as circles in Figure 6 and indicates the pixel coverage on the screen required for that object either when the object is close to the camera or when it is farther away.</p>
<p>Using the pixel height test on the objects in a scene, the test can determine which objects contribute less than one full pixel to the displayed frame. To approximate the pixel coverage, the test determines the object height in screen space in pixels. This testing code is executed on the processor. As a result of the pixel height test, if the pixel height of an object is less than a pixel, the object is not submitted for rendering.</p>
<p>In the troublesome Library scene, the fact that the objects (books) all had identical dimensions—because they are instantiations of a single object—made the test easier and faster to run because the bounding spheres for all tested objects (books) were identical.</p>
<p style="text-align: center;"><img src="http://software.intel.com/file/23644" /></p>
<p><b>Step 7: The Results of the Pixel Height Test</b><br />Figure 7 shows the result of implementing the pixel height test on the Library scene in <i>Ghostbusters</i>. Using the software switch created by Randel, developers were able to turn the test on and off. When the pixel height test is running, objects (books) that are less than one pixel in height in the scene, are not rendered. As shown by the data in the green oval in Figure 7, where the test was turned on, the frame rate of this scene doubled when the books less than one pixel in height were not rendered.</p>
<p>The data in Figure 7 also shows that the overall usage of the graphics resources went up, with the test indicating that the game was now using resources more optimally.<br /><br />Figures 8 and 9 are the screen captures of the scene before and after the test was enabled. There is no visual difference between the two renderings, because no visible object was affected by the change.</p>
<p>When the team first started this analysis, the scene was rendering so slowly that it was considered the major issue preventing the game from being highly playable. Based on a thorough analysis and the implementation of the pixel height test that followed, the scene ended up rendering at double the original frame rate. Other scenes in the game enjoy even higher frame rates.</p>
<p style="text-align: center;"><img src="http://software.intel.com/file/23645" /></p>
<p><b><br />New Features for Intel® GPA, Version 2.1</b><br />As good as the Intel® GPA tool was for the development of the latest <i>Ghostbusters</i> game, several new features have been added subsequent to that project. Randel reports that he is finally enjoying a little downtime after working since 2006 on <i>Ghostbusters</i>, but he’s already looking forward to the next project. “It will be really nice to have the new Intel GPA tools,” he said recently. “There are still a few more things we can do to add those key details to a highly believable, fully destructible environment.”</p>
<p>Here are some of the key new features that have been added to the Intel GPA to make it even easier to find and quickly address performance issues in games, as well as debug rendering problems:</p>
<p><b>• Pixel History</b><br />Pixel history is a great new feature in Intel GPA that provides a wealth of information on any pixel in any render target. A zoom feature (using the mouse wheel) was also added for a more exact selection of a particular pixel of interest. To select a pixel, simply left-click a pixel in any render target. After a pixel is selected, the history of all GPU operations (draw calls, clears, and so on) that affected that pixel is displayed in the pixel history tab, which is automatically opened. This lets you see exactly which draw calls affected that pixel location for the render target from which it was selected. For each draw call in the list, the number of times the pixel was touched and the final pixel <br />color are also displayed. If the pixel was rejected, for example if Z-test was enabled, the reason for the rejection is noted as well.</p>
<p>Pixel history enables two key use cases: visual debug and overdraw analysis. The visual debug workflow allows you to diagnose why a pixel was rendered incorrectly. It also shows which draw call in the history caused the selected pixel to be the color that it is. The overdraw analysis workflow depicts how much overdraw exists at any pixel location and specifically which draw calls contribute to it.</p>
<p><b>• Overdraw Visualization per Render Target</b><br />The Intel GPA render target viewer has a new overdraw visualization mode. When enabled, each render target is visualized in gray scale. Overdraw corresponds to lighter pixels in the gray-scale visualization. By enabling this mode, you can immediately see which portions of the render target are being written to most often.<br /><br />Intel GPA also allows you to combine the usage of both pixel history and overdraw visualization. This allows you to seamlessly find overdraw hotspots with the visualization and then immediately select any of the hot pixels to understand which draw calls are contributing to overdraw at that location.</p>
<p><b>• Vertex Shader and Pixel Shader Durations</b><br />Shader durations are now enabled as metrics for all DirectX* devices. These metrics are available in three places: the bar chart graph at the top of the user interface, the scene overview spreadsheet view on the left, and the details tab on the right.</p>
<p>With the bar chart, you can now select any metric in the x- and y-axis. For example, you can configure vertex shader duration in the x-axis and pixel shader duration in the y-axis. By looking at the shape of each rectangle in the bar chart you now can compare two metrics at the same time. Within the scene overview, you can view these new metrics in spreadsheet form by clicking the Customize button, and then selecting any metrics of your choice. Finally, the details tab always lists all possible metrics and enables you to view their values summed across the current draw call selection set.</p>
<p><b>• Single Step Frame</b><br />Intel GPA has a new single step feature that enables better control over the frame to be captured and analyzed. When using the System Analyzer, simply press the pause button to pause the game in real time, then press the single step button as many times as needed to reach a frame of interest. The capture button can be pressed at any time.<br /><br /><strong>• In-Game Hot Key<br /></strong>The new hot-key feature allows easy frame captures on a single computer while playing the game. Simply launch the game using Intel GPA, run it full screen, and then press CTRL+SHIFT+C (or configure any keys you want to use) for each frame you want to capture. When you are ready to analyze, close the game, and then open the Frame Analyzer on the same computer or a remote system for analysis.</p>
<p><b>• Export Metrics to a CSV File</b><br />With CSV (comma separated value) file export, detailed frame performance data can be saved and later pulled into Microsoft Excel* or any other program that can process CSV files. This feature allows you to track game performance changes over time, compare game performance with various game options enabled, or even compare game performance on various graphics cards—all at a per-draw level of detail.</p>
<p>Because this feature is draw call selection set-based, you can select the draw calls you are interested in (or the whole frame) and export only those calls, so you don’t have to wade through large amounts of data to find the details you want.</p>
<p><b><br />Conclusion</b><br />Intel GPA tools help game developers make sure that performance issues don’t detract from a game’s entertainment value. Developers can run code experiments that measure and report performance results in real time. Intel GPA provides open, accessible libraries that can both customize tools for specific needs and pull data for deeper analysis. Better use of screen real estate avoids the intrusive display overlay of other interfaces, and the ability to share captured frames with team members increases the efficiency of optimization.</p>
<p>Thanks to the Intel® GPA tools, developers can learn more about what’s going on “behind the curtain” on their games. The new features take an already strong engineering toolset and turn it into a formidable asset manager. Thanks to interaction with game developers around the world, Intel continues to fine-tune these tools. Priced at USD 299, the Intel GPA tools are free to anyone willing to take the time to fully register. Go to www.intel.com/software/gpa and grab the tools and the documentation, read the case studies and white papers, and get involved in the developer forums. Your game’s performance—and fun factor—are at stake.</p>
<hr />
<p>Capture the buzz. Subscribe to <a href="http://www.intelsoftwaregraphics.com/?lid=XE2kehZk8mw=&amp;siteid=cqMoF5H/37o=">Intel® Software Dispatch for Visual Adrenaline</a>. (Did we mention it's fun, informative, visually stimulating, free, and you can unsubscribe at any time?)</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/updated-tools-spice-up-new-ghostbusters-game</link>
      <pubDate>Tue, 10 Nov 2009 15:29:24 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/updated-tools-spice-up-new-ghostbusters-game#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/updated-tools-spice-up-new-ghostbusters-game</guid>
      <category>Visual Computing</category>
      <category>Game Development</category>
    </item>
  </channel></rss>