Recent posts
https://software.intel.com/en-us/recent/334389
enFast Random Number Generator on the Intel® Pentium® 4 Processor
https://software.intel.com/en-us/articles/fast-random-number-generator-on-the-intel-pentiumr-4-processor
<p><strong>by Kipp Owens,</strong> Applications Engineer &<br /><strong>Rajiv Parikh,</strong> Sr. Applications Engineer<br />Software Solutions Group, Intel Corporation.</p>
<hr /><h3>Abstract</h3>
<p><em>This paper shows how to speed up a commonly used pseudo-random number generation algorithm easily by taking advantage of the <strong>Streaming SIMD Extensions 2 (SSE2)</strong>instruction set on the Intel® Pentium® 4 processor. The paper includes code that utilizes SSE2 intrinsics (Intrinsics) for generating pseudo-random integers. Some quantitative analysis is also presented for comparison with the original algorithm.</em></p>
<hr /><h3>Introduction</h3>
<p>Random number generators are used in many applications today. They are useful in applications ranging from a simplistic application for buying a lottery ticket to complex applications such as cryptography. The most commonly used algorithm is called Linear Congruential Generator (LCG). It is also the algorithm used in most C library <em>rand()</em> functions. This paper will provide a brief overview of the original LCG algorithm and implementation. It assumes the reader is familiar with general C and assembly programming. Next, it will show the SSE based solution to 'vectorize' the algorithm. The paper will also compare the original algorithm with the new algorithm, showing some tradeoffs, and applications. All performance data presented was collected on an Intel® Pentium® 4 Processor 3.06 GHz with HT Technology enabled. The Intel® C compiler 7.0 under Microsoft* Visual Studio* 6.0 was used for all compilations. All SSE2 coding was implemented using Intrinsics (avoiding pure assembly use), as supported by the Intel C compiler.</p>
<hr /><h3>LCG Algorithm</h3>
<p>The classic LCG algorithm is widely used in many applications. The most general form is:</p>
<blockquote>
<p>Where X<sub>n</sub> depends on X<sub>n-1</sub>, scalars <em>a</em> and <em>c</em>, and the final value is modulo <em>m</em>. Usually <em>m</em> is a power of two, so a simple mask can be used. This returns a random value of <em>n = log </em><sub>2</sub><em> m</em> bits. The selection of values <em>a</em> and <em>c</em> for any given <em>n</em> is a subject of many large studies, which will not be discussed here. A list of values can be found in several of the listed references. We chose to use the same <em>a</em> and <em>c</em> values for <em>fast_rand()</em> as the standard math library routine <em>rand()</em> uses (discussed in Cycle Length Analysis section below).<br /><br />The algorithm for the <em>rand()</em> function in the C library is, however, still a slight modification to the above. The equation is the same but the value it returns to the calling function is shifted right by 16 bits to reduce low order bits correlation and masked by 7FFFh, eliminating the sign bit. The <em>fast_rand()</em> function shown below implements the same scalar LCG function that <em>rand()</em> uses. It uses unsigned 32 bit integer, but to be compatible with <em>rand()</em> The range is reduced by shifting and masking out the upper bit. Listing 1 shows the code for the <em>fast_rand()</em>.</p>
</blockquote>
<p><strong>Listing 1: Fast_rand() function code in C</strong></p>
<table style="width:100%" border="0"><tbody><tr><td>
<pre><pre class="brush: cpp">static unsigned int g_seed;
//Used to seed the generator.
inline void fast_srand( int seed )
{
g_seed = seed;
}
//fastrand routine returns one integer, similar output value range as C lib.
inline int fastrand()
{
g_seed = (214013*g_seed+2531011);
return (g_seed>>16)&0x7FFF;
}
</pre></pre>
</td>
</tr></tbody></table><p> </p>
<p>All the code in this paper is compiled using the Intel® C/C++ compiler version 7.0, with no optimization flags. The <em>rand_sse()</em> function implements a vectorized version of this <em>fast_rand()</em> function, where the integer math operations are done in fours, using the SIMD architecture. For this, however, the function requires a pointer to the array as a parameter because C doesn't allow returning an array of values. This usage model is different from the above routines. Listing 2 shows the code for the <em>rand_sse()</em> in C (using Intrinsics) along with its assembly comments. Notice also the compatibility flag for applications that require similar numeric range and reduced low order bits correlation as the <em>rand()</em> function, with the shift right and mask operation.</p>
<p><strong>Listing 2: Listing of SSE2 implementation of RNG</strong></p>
<table style="width:100%" border="0"><tbody><tr><td>
<pre><pre class="brush: cpp">/////////////////////////////////////////////////////////////////////////////
// The Software is provided "AS IS" and possibly with faults.
// Intel disclaims any and all warranties and guarantees, express, implied or
// otherwise, arising, with respect to the software delivered hereunder,
// including but not limited to the warranty of merchantability, the warranty
// of fitness for a particular purpose, and any warranty of non-infringement
// of the intellectual property rights of any third party.
// Intel neither assumes nor authorizes any person to assume for it any other
// liability. Customer will use the software at its own risk. Intel will not
// be liable to customer for any direct or indirect damages incurred in using
// the software. In no event will Intel be liable for loss of profits, loss of
// use, loss of data, business interruption, nor for punitive, incidental,
// consequential, or special damages of any kind, even if advised of
// the possibility of such damages.
//
// Copyright (c) 2003 Intel Corporation
//
// Third-party brands and names are the property of their respective owners
//
///////////////////////////////////////////////////////////////////////////
// Random Number Generation for SSE / SSE2
// Source File
// Version 0.1
// Author Kipp Owens, Rajiv Parikh
////////////////////////////////////////////////////////////////////////
#ifndef RAND_SSE_H
#define RAND_SSE_H
#include "emmintrin.h"
#define COMPATABILITY
//define this if you wish to return values similar to the standard rand();
void srand_sse( unsigned int seed );
void rand_sse( unsigned int* );
__declspec( align(16) ) static __m128i cur_seed;
void srand_sse( unsigned int seed )
{
cur_seed = _mm_set_epi32( seed, seed+1, seed, seed+1 );
}
inline void rand_sse( unsigned int* result )
{
__declspec( align(16) ) __m128i cur_seed_split;
__declspec( align(16) ) __m128i multiplier;
__declspec( align(16) ) __m128i adder;
__declspec( align(16) ) __m128i mod_mask;
__declspec( align(16) ) __m128i sra_mask;
__declspec( align(16) ) __m128i sseresult;
__declspec( align(16) ) static const unsigned int mult[4] =
{ 214013, 17405, 214013, 69069 };
__declspec( align(16) ) static const unsigned int gadd[4] =
{ 2531011, 10395331, 13737667, 1 };
__declspec( align(16) ) static const unsigned int mask[4] =
{ 0xFFFFFFFF, 0, 0xFFFFFFFF, 0 };
__declspec( align(16) ) static const unsigned int masklo[4] =
{ 0x00007FFF, 0x00007FFF, 0x00007FFF, 0x00007FFF };
adder = _mm_load_si128( (__m128i*) gadd);
multiplier = _mm_load_si128( (__m128i*) mult);
mod_mask = _mm_load_si128( (__m128i*) mask);
sra_mask = _mm_load_si128( (__m128i*) masklo);
cur_seed_split = _mm_shuffle_epi32( cur_seed, _MM_SHUFFLE( 2, 3, 0, 1 ) );
cur_seed = _mm_mul_epu32( cur_seed, multiplier );
multiplier = _mm_shuffle_epi32( multiplier, _MM_SHUFFLE( 2, 3, 0, 1 ) );
cur_seed_split = _mm_mul_epu32( cur_seed_split, multiplier );
cur_seed = _mm_and_si128( cur_seed, mod_mask);
cur_seed_split = _mm_and_si128( cur_seed_split, mod_mask );
cur_seed_split = _mm_shuffle_epi32( cur_seed_split, _MM_SHUFFLE( 2, 3, 0, 1 ) );
cur_seed = _mm_or_si128( cur_seed, cur_seed_split );
cur_seed = _mm_add_epi32( cur_seed, adder);
#ifdef COMPATABILITY
// Add the lines below if you wish to reduce your results to 16-bit vals...
sseresult = _mm_srai_epi32( cur_seed, 16);
sseresult = _mm_and_si128( sseresult, sra_mask );
_mm_storeu_si128( (__m128i*) result, sseresult );
return;
#endif
_mm_storeu_si128( (__m128i*) result, cur_seed);
return;
}
</pre></pre>
</td>
</tr></tbody></table><p> </p>
<hr /><h3>Characterization Of Each LCG RNG</h3>
<p>In this section, some of the high level characteristics of both implementations will be examined. First is the cycle length. This is the number of values returned before the entire sequence starts repeating. Second is 'uniformity' of the distribution of the numbers. This roughly measures whether the numbers are skewed towards any particular value or set of values. It is measured with respect to the output value range. Third, the speed of generating and filling a large array will be measured and compared to evaluate the throughput. It is this last characteristic that should improve with the <em>rand_sse()</em> implementation using SSE2.</p>
<h3>Cycle Length Analysis</h3>
<p>The cycle length of any LCG random number generator is completely dependent upon the constants one chooses for equation 1 noted above. The cycle will never exceed the modulus (<em>m</em>). For our application (using 32-bit numbers) we chose <em>m</em> to be 2^32-1 (the largest 32-bit unsigned integer possible). This also allowed us to let the natural overflow of the 32-bit unsigned integer to act as the modulus. With a modulus selected, choosing the constants <em>a</em> and <em>c</em> becomes a bit easier. To reach the theoretical maximum of <em>m</em> numbers per cycle, one must follow very precise rules when selecting <em>a</em> and <em>c</em>.<br /><br />The cycle of an LCG random number generator of the form X<sub>n</sub> = (a X<sub>n-1</sub> + c) mod m will only be of length <em>m</em> if and only if the following three conditions are met:</p>
<ol><li><em>c</em> is relatively prime to <em>m</em></li>
<li><em>a</em>-1 is a multiple of <em>p</em>, where <em>p</em> is every prime number that divides <em>m</em></li>
<li><em>a</em>-1 is a multiple of 4 when <em>m</em> is a multiple of 4</li>
</ol><p> </p>
<p><em>fast_rand()</em> is implemented using the first set of constants in Table 1 below. The result is a random number generator that produces a sequence of random numbers that only repeats every 2^32 numbers generated (<em>m</em>+1 in this case, because the number zero is included). In the case of <em>rand_sse()</em> we could use the same formula and constants for each of the four simultaneous calculations completed using SIMD, but that would result in the same 2^32 length cycle. On the other hand, if four sets of constants were used, all of which satisfied the rules above, one could extend the length of the generator's cycle to the sum of the four cycles contained within it. Fortunately, there is more than one set of constants to choose from that fit within our constraints. We selected the four in the table below.</p>
<p><strong>Table 1: Constants and resulting cycles</strong></p>
<table border="1" style="width:100%"><tbody><tr><td><strong>a</strong></td>
<td><strong>c</strong></td>
<td><strong>m</strong></td>
<td><strong>cycle</strong></td>
</tr><tr><td>214013</td>
<td>2531011</td>
<td>2^32-1</td>
<td>2^32</td>
</tr><tr><td>17405</td>
<td>10395331</td>
<td>2^32-1</td>
<td>2^32</td>
</tr><tr><td>214013</td>
<td>13737667</td>
<td>2^32-1</td>
<td>2^32</td>
</tr><tr><td>69069</td>
<td>1</td>
<td>2^32-1</td>
<td>2^32</td>
</tr></tbody></table><p> </p>
<p>With four different LCG functions running–each with a cycle of 2^32–the cycle length for 32-bit random numbers is extended to 4*2^32 or 2^34 for <em>rand_sse()</em>. Or, if one desired, one could use <em>rand_sse()</em> to generate 64-bit or 128-bit random numbers (with cycles of 2^33 and 2^32 respectively).</p>
<h3>Uniformity Analysis</h3>
<p>So the cycle length of <em>rand_sse()</em> is four times larger than that of <em>fast_rand()</em>, but is it "as good?" There are numerous tests that have been invented to check the value of random number generators. While these tests do warrant a closer look, they are beyond the scope of this paper. That said, in the attempt to show the quality of <em>rand_sse()</em>, we have included two of the most basic of such tests – 1-D and 2-D distribution.</p>
<h3>1-D Distribution</h3>
<p>The distribution of numbers produced by an ideal random number generator should be uniform from 0 to its maximum number produced– in our case this is <em>m</em>. Using statistics one would expect, therefore, a mean <em>m</em>/2 and a standard deviation of <em>m</em>/(2*3).<br /><br />To test how close both the <em>fast_rand()</em> and <em>rand_sse()</em> functions come to producing this distribution, we produced 1 million random numbers with each generator (<em>fast_rand()</em> and <em>rand_sse()</em>) and then sorted them into one hundred equal bins. The resulting histogram of each function can be seen in Figure 1 and Figure 2 below and the measured mean and standard deviation along with the percent error of each function can be found in Table 2.</p>
<p><strong>Figure 1: 1-D Distribution of fast_rand()</strong><br /><img src="/sites/default/files/m/d/4/1/d/8/38033_fastgen_rannum_fig1.gif" border="0" /></p>
<p><strong>Figure 2: 1-D Distribution of rand_sse()</strong><br /><img src="/sites/default/files/m/d/4/1/d/8/38034_fastgen_rannum_fig2.gif" border="0" /></p>
<p><strong>Table 2: Statistics for fast_rand() and rand_sse() </strong></p>
<table border="1" style="width:100%"><tbody><tr><td><strong></strong></td>
<td><strong>mean</strong></td>
<td><strong>standard deviation</strong></td>
</tr><tr><td><strong>ideal uniform distribution</strong></td>
<td>(2^32)/2 = 2147483648</td>
<td>(2^32)/(2*3) = 1239850262</td>
</tr><tr><td><strong>rand_sse</strong></td>
<td>2146969795</td>
<td>1239930971</td>
</tr><tr><td><strong>rand_sse %error</strong></td>
<td>0.024%</td>
<td>0.007%</td>
</tr><tr><td><strong>fast_rand</strong></td>
<td>2146454860</td>
<td>1239659448</td>
</tr><tr><td><strong>fast_rand %error</strong></td>
<td>0.048%</td>
<td>0.015%</td>
</tr></tbody></table><p></p>
<p> </p>
<p><strong>2-D Distribution</strong></p>
<p>2-D distribution is really just an extension of 1-D. Two random numbers are generated to represent an x and a y coordinate pair. Ideally, the random numbers should be spread evenly throughout the 2-D space ranging from 0 to m in both the x and y directions. Shown in histogram form, the ideal 2-D distribution would appear as a perfect cube. Below both the <em>fast_rand()</em> and <em>rand_sse()</em> 2-D histograms can be seen to approach this state.</p>
<p><strong>Figure 3: 2-D Distribution of fast_rand()</strong><br /><img src="/sites/default/files/m/d/4/1/d/8/38035_fastgen_rannum_fig3.gif" border="0" /></p>
<p><strong>Figure 4: 2-D Distribution of rand_sse()</strong><br /><img src="/sites/default/files/m/d/4/1/d/8/38036_fastgen_rannum_fig4.gif" border="0" /></p>
<p> </p>
<h3>Throughput Analysis</h3>
<p>Lastly, and for many, most importantly, is the speed of the random number generator. Speed was tested by timing how long each random number generator took to produce one billion random numbers. The functions tested were the standard math library function <em>rand()</em>, <em>fast_rand()</em>, and <em>rand_sse()</em>. The results of each test including acceleration relative to <em>rand()</em> are listed in Table 3 below.</p>
<p><strong>Table 3: Time to compute one billion random numbers</strong></p>
<table border="1" style="width:70%"><tbody><tr><td><strong>Function</strong></td>
<td><strong>Time (sec)</strong></td>
<td><strong>Speedup</strong></td>
</tr><tr><td>rand()</td>
<td>10.03</td>
<td>1.00</td>
</tr><tr><td>fast_rand()</td>
<td>4.99</td>
<td>2.01</td>
</tr><tr><td>rand_sse()</td>
<td>1.83</td>
<td>5.48</td>
</tr></tbody></table><p> </p>
<hr /><h3>Conclusion</h3>
<p>Based on a raw speed comparison, <em>rand_sse()</em> is the clear winner, computing one billion random numbers 2.73 times faster than <em>fast_rand()</em> and 5.48 times faster than the standard <em>rand()</em> function. This performance is achieved without compromising on cycle length or uniformity. In fact, in both cases, <em>rand_sse()</em> shows a relative improvement over the scalar implementations.<br /><br />In addition to its significantly longer cycle and significantly quicker production than <em>fast_rand()</em> and <em>rand()</em>, <em>rand_sse()</em> is also more flexible. One could use <em>rand_sse()</em> to produce 32-bit, 64-bit, 96-bit, or 128-bit random values.<br /><br />With its significantly faster performance and flexibility over the scalar implementations, <em>rand_sse()</em> could prove useful in a wide variety of applications today.</p>
<hr /><h3>Related Resources</h3>
<p><em>Reference Materials</em></p>
<ul><li><a href="/en-us/articles/intel-compilers/" rel="nofollow">Intel® C++ Compiler for Windows* Product Information</a></li>
<li><a href="http://developer.intel.com/products/processor/manuals/index.htm">Intel® 64 and IA-32 Architectures Software Developer's Manuals</a></li>
<li><a href="http://www.amazon.com/Numerical-Recipes-C-Scientific-Computing/dp/0521431085/ref=sr_11_1?ie=UTF8&qid=1235595142&sr=11-1" rel="nofollow">Numerical Recipes in C</a>, Second Edition, Press, et al., ISBN 0-521-43108-5</li>
<li><a href="http://cssvc.ecsu.edu/computational_physics/monte12/P001.html" rel="nofollow">Computation Physics/Carleton University/ Random Numbers</a>*</li>
<li>A collection of selected pseudorandom number generators with linear structures, Karl Entacher; <a href="http://random.mat.sbg.ac.at/results/karl/server/node3.html" rel="nofollow">Linear Congruential Generator</a>*</li>
<li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.1597" rel="nofollow">Parallel Monte Carlo Methods for Derivative Security Pricing</a>* (PDF), Giorgio Pauletto</li>
<li><a href="http://www.itl.nist.gov/div898/handbook/index.htm" rel="nofollow"><em>NIST/SEMATECH e-Handbook of Statistical Methods</em></a>*</li>
</ul><p><em>Developer Centers</em></p>
<ul><li>Pentium® 4 Processor Performance Optimization</li>
<li><a href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html">Intel® Xeon® Processor Performance Optimization</a></li>
</ul><p> </p>
<hr /><h3>About the Authors</h3>
<p><strong>Rajiv D. Parikh</strong>, Sr. Applications Engineer, Intel Corporation, Folsom, CA. Mr. Parikh has held various technical positions at Intel since joining in 1993. His focus has been multi-processor systems and software engineering. He has a Masters and Bachelors of Science in Computer and Electrical Engineering from Virginia Tech.</p>
<p><strong>Kipp Owens,</strong> Applications Engineer, Intel Corporation, Folsom, CA. Mr. Owens has worked predominantly in the area of software optimization. He has a Bachelors of Science in Electrical Engineering from Purdue University.</p>
<hr />Tue, 27 Mar 12 07:54:52 -0700Christopher Owens (Intel)143227Z Fighting Code Sample
https://software.intel.com/en-us/articles/z-fighting-code-sample
<p>by <strong>Matt McClellan</strong> and <strong>Kipp Owens</strong><br />
Intel Corporation<br />
Client Enabling Technology</p>
<hr /><h3>Why Should I Care About This Code Sample?</h3>
<p>This source code demonstrates how game developers can work around any Z fighting issue they may find. The sample source demonstrates several techniques for working around Z fighting issues in DirectX* 9.0.<br /><br /><strong>Target Audience</strong><br /><br />
Game developers<br />
General interest developers<br /><br /><strong>Sample Category</strong><br /><br />
Complete project including source files, headers and executable.<br /><br /><strong>Implementation Language</strong><br /><br />
C++<br /><br /><strong>Operating Systems:</strong> Microsoft Windows* 2000 and Windows XP</p>
<hr /><h3><a href="/en-us/articles/z-fighting-code-sample-license" rel="nofollow">Download Source Code</a></h3>
<h3>Whitepaper: <a href="/en-us/articles/alternatives-to-using-z-bias-to-fix-z-fighting-issues" rel="nofollow">Alternatives to Using Z-Bias to Fix Z-Fighting Issues</a></h3>
Thu, 09 Feb 12 15:13:00 -0800Christopher Owens (Intel)140653Alternatives to Using Z-Bias to Fix Z-Fighting Issues
https://software.intel.com/en-us/articles/alternatives-to-using-z-bias-to-fix-z-fighting-issues
<h3>Introduction</h3>
<p><strong>by Matt McClellan and Kipp Owens</strong><br /><br />
There are many instances in 3D applications where two polygons may lie on the same plane, as in the cases of effects like bullet holes or posters on walls. Because these polygons lie on the same plane, they share the same z-buffer values, and this can result in "z-fighting" issues, where results vary based on rendering order. In the past, DirectX* allowed developers to resolve z-fighting issues by applying a “z-bias” to co-planar polygons. While applying a z-bias is an effective solution, it does not generate the same results on all graphics hardware.<br /><br />
Unfortunately, in versions of DirectX prior to DirectX 9, different graphics vendors interpreted the DirectX specification details on the z-bias feature differently. As a result, different graphics vendors applied slightly different algorithms to address the application of z-bias. Worse, a legacy of applications taking these different driver behaviors into account meant that the different hardware vendors could not subsequently change their driver behaviors to be more consistent.<br /><br />
For the developer community, this ambiguity resulted in a lot of custom ‘tweaking’ of the z-bias values and subsequent testing across a wide array of hardware; in short, each piece of hardware might behave differently. Microsoft has resolved this issue in DirectX 9 by replacing z-bias with "depth-bias," in hopes of providing a more predictable and uniform technique for removing z-fighting issues.<br /><br />
The D3DRS_ZBIAS render state was used on DirectX 8 and earlier applications, while the D3DRS_DEPTHBIAS render state is used in DirectX 9. This article outlines three alternatives to using the legacy method of D3DRS_ZBIAS.</p>
<div><img width="461" height="394" src="/sites/default/files/m/d/4/1/d/8/168844_168844.gif" border="0" /></div>
<div><strong>Figure 1. Z-fighting from co-planar polygons.</strong></div>
<p>Figure 1 above shows the affects of z-fighting on co-planar polygons. <strong>ZFightingDemo</strong> is a modified version of Billboard, available in the DirectX SDK, that demonstrates z-fighting.</p>
<hr /><h3>Alternative Method 1: Projection Matrix</h3>
<p><br />
The first method considered here is the use of a new projection matrix. This new projection matrix is loaded with near and far clipping planes pushed out (away from the viewer). The new, 'closer' projection matrix is loaded after the 'far' object and before the object or objects that the developer would like to appear in front. The desired 'front' objects are effectively placed closer to the viewer in the z-buffer, but their location in the view space is not noticeably changed. The sample code below accomplishes this technique. In this case it is applying a z-bias to the posters.<br /><br />
The following code snippet shows the Projection Matrix alternative to using a DirectX z-bias call:</p>
<table border="0"><tbody><tr><td>
<pre>
</pre>
<pre>// ZFighting Solution #1 - Projection Matrix
D3DXMATRIX mProjectionMat; // Holds default projection matrix
D3DXMATRIX mZBiasedProjectionMat; // Holds ZBiased projection matrix
// Globals used for Projection matrix
float g_fBaseNearClip = 1.0f;
float g_fBaseFarClip = 100.0f;
// Code indicates no change. ie states 'near and far clipping planes pushed out' but only far appears pushed
float g_fNearClipBias = 0.0f;
float g_fFarClipBias = 0.5f;
// Projection Matrix work around
// Best if calculation are done outside of render function.
// The "zbiased" projection has it near and far clipping
// planes pushed out...
D3DXMatrixPerspectiveFovLH( &mZBiasedProjectionMat, D3DX_PI/4,(mProjectionMat._22/mProjectionMat._11),
g_fBaseNearClip + g_fNearClipBias,
g_fBaseFarClip + g_fFarClipBias );
. . .
// Original projection is loaded
m_pd3dDevice ->SetTransform( D3DTS_PROJECTION, & mProjectionMat);
// Billboards are rendered...
// The "zbiased" projection is loaded ...
m_pd3dDevice->SetTransform(D3DTS_PROJECTION, &mZBiasedProjectionMat);
// Posters are rendered...
// Original projection is reloaded...
g_pd3dDevice->SetTransform( D3DTS_PROJECTION, & mProjectionMat);
. . .</pre>
<br /><br />
</td>
</tr></tbody></table><p>While some adjustments to the projection matrix may still be necessary to get the desired results, this technique is more consistent across a variety of graphics hardware. The result of the alternate solution is pictured below:<br />
</p>
<div><img width="462" height="391" src="/sites/default/files/m/d/4/1/d/8/168846_168846.gif" border="0" /><div><strong>Figure 2. Z-fighting resolved with projection modification solution.</strong></div>
</div>
<hr /><h3>Alternative Method 2: Viewport</h3>
<p><br />
The viewport method is similar to the projection matrix method, in that it effectively pushes the selected object nearer to the user in the z-buffer. The viewport method achieves this resolution by loading a new viewport object with new minimum and maximum z-values. The sample of code below accomplishes this by applying a z-bias to the posters, so that they are correctly displayed on the billboards.<br /><br />
The following code snippet shows the viewport alternative to using a DirectX z-bias call:</p>
<table border="0"><tbody><tr><td>
<pre>
</pre>
<pre>// ZFighting Solution #2 - Viewport
D3DVIEWPORT9 mViewport; // Holds viewport data
D3DVIEWPORT9 mNewViewport; // Holds new viewport data
// Global used for Viewport
// Hard coded for ZBIAS of 1 using this formula
// MinZ - 256/(2^24-1) and
// MaxZ - 256/(2^24-1)
// 2^24 comes from
the amount of Zbits and the 256 works
// for Intel ® Integrated Graphics, but can be any
// multiple of 16.
float g_fViewportBias = 0.0000152588f;
// Projection Matrix work around
// Viewport work around
m_pd3dDevice->GetViewport(&mViewport);
// Copy old Viewport to new
mNewViewport = mViewport;
// Change by the bias
mNewViewport.MinZ -= g_fViewportBias;
mNewViewport.MaxZ -= g_fViewportBias;
. . .
// Original viewport is reloaded …
m_pd3dDevice->SetViewport(&mViewport);
// Billboards are rendered …
// The new viewport is loaded …
m_pd3dDevice->SetViewport(&mNewViewport);
// Posters are rendered …
// Original viewport is reloaded …
m_pd3dDevice->SetViewport(&mViewport);
. . .</pre>
<br /><br />
</td>
</tr></tbody></table><p>Again, some adjustments to the new viewport values may still be necessary to get the desired results, but this technique is more consistent across a variety of graphics hardware than using z-bias. The hard-coded example above is the equivalent of D3DRS_ZBIAS = 1. The result of the alternate solution is pictured below:<br />
</p>
<div><img width="462" height="397" src="/sites/default/files/m/d/4/1/d/8/168847_168847.gif" border="0" /><div><strong>Figure 3. Z-fighting resolved with viewport modification solution.</strong></div>
</div>
<hr /><h3>Alternative Method 3: Depth Bias</h3>
<p><br />
The last method addressed in this article uses the DirectX 9 Depth Bias method to solve z-fighting. A check to verify that the graphics card is capable of performing depth bias is needed. Intel Integrated Graphics will support depth bias in the next graphics core code named Grantsdale. After checking the cap bits to verify that depth bias is supported, this technique merely requires setting D3DRS_SLOPESCALEDEPTHBIAS and D3DRS_DEPTHBIAS to the proper values to get the desired effect.<br /><br />
The following code snippet shows the depth-bias alternative to using a DirectX z-bias call:</p>
<table border="0"><tbody><tr><td>
<pre>
</pre>
<pre>BOOL m_bDepthBiasCap; // TRUE, if device has DepthBias Caps
// Globals used for Depth Bias
float g_fSlopeScaleDepthBias = 1.0f;
float g_fDepthBias = -0.0005f;
float g_fDefaultDepthBias = 0.0f;
// Check for devices which support the new depth bias caps
if ((pCaps->RasterCaps & D3DPRASTERCAPS_SLOPESCALEDEPTHBIAS) &&
(pCaps->RasterCaps & D3DPRASTERCAPS_DEPTHBIAS))
{
m_bDepthBiasCap = true; // TRUE, if DepthBias Caps
}
// Billboards are rendered...
// DepthBias work around
if ( m_bDepthBiasCap ) // TRUE, if DepthBias supported
{
// Used to determine how much bias can be applied
// to co-planar primitives to reduce z fighting
// bias = (max * D3DRS_SLOPESCALEDEPTHBIAS) + D3DRS_DEPTHBIAS,
//
where max is the maximum depth slope of the triangle being rendered.
m_pd3dDevice->SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(g_fSlopeScaleDepthBias));
m_pd3dDevice->SetRenderState(D3DRS_DEPTHBIAS, F2DW(g_fDepthBias));
}
// Posters are rendered...
if ( m_bDepthBiasCap ) // TRUE, if DepthBias supported
{
// DepthBias work around
// set it back to zero (default)
m_pd3dDevice->SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(g_fDefaultDepthBias));
m_pd3dDevice->SetRenderState(D3DRS_DEPTHBIAS, F2DW(g_fDefaultDepthBias));
}
. . .</pre>
<br /><br />
</td>
</tr></tbody></table><p>Like the other methods (and like the original z-bias), some tweaking may be necessary, but using D3DRS_SLOPESCALEDEPTHBIAS and D3DRS_DEPTHBIAS is a relatively consistent technique for resolving z-fighting issues across a wide selection of graphics devices. The figure below shows the result of this alternate solution:<br />
</p>
<div><img width="637" height="532" src="/sites/default/files/m/d/4/1/d/8/168849_168849.gif" border="0" /><div><strong>Figure 4. Z-fighting solved with depth bias solution.</strong></div>
As Figure 4 shows, care should be taken for adjusting the D3DRS_SLOPESCALEDEPTHBIAS and D3DRS_DEPTHBIAS. They can be very sensitive and lead to other issues like the problem below for distant objects:<br />
<div><img width="209" height="200" src="/sites/default/files/m/d/4/1/d/8/168850_168850.gif" border="0" /></div>
<div><strong>Figure 5. Depth-bias solution possible issue: unwanted overlapping polygons.</strong></div>
</div>
<hr /><h3>Conclusion</h3>
<p><br />
Z-fighting is an inevitable issue when dealing with co-planar polygons. The three methods shown in this paper – loading a new projection matrix, loading a new viewport, and using the new DirectX 9 Depth Bias – can all be used as alternatives to z-bias with broad success. These techniques cannot eliminate the need for solid testing, but they can limit the amount of tweaking that is required as new problems arise stemming from the inconsistent behavior of z-bias.<br /><br />
The included sample code provides the developer with simple examples of z-bias alternatives that can be used to eliminate z-fighting.</p>
<hr /><h3>Additional Resources</h3>
<p>The following materials will be of particular interest to the audience of this article:</p>
<ul><li><a href="http://msdn.microsoft.com/en-us/library/bb318771(VS.85).aspx" rel="nofollow">Microsoft DirectX**</a> contains the DirectX* runtime and software required to create DirectX-compliant applications.</li>
<li><a href="http://developer.amd.com/samples/Pages/default.aspx" rel="nofollow">ATI Depth Bias Example*</a> is a sample application that demonstrates the use of depth bias to eliminate z-figh ting.</li>
<li><a href="http://www.intel.com/software/media/">Intel® Digital Media Developer Center</a> provides insight, tips, tools, and training to create top-notch media applications.</li>
<li><a href="/en-us/gamedev" rel="nofollow">Intel® Games Developer Center</a> presents coding resources for game developers on Intel® architecture.</li>
</ul><hr /><h3>About the Authors</h3>
<p><img align="left" src="/sites/default/files/m/d/4/1/d/8/196087_196087.jpg" border="0" /><strong>Matt McClellen</strong> is a applications engineer with Intel Corporation in Folsom, CA. He has held various positions since joining Intel in 1996. His current focus is predominantly in the area of software optimization. He has a Bachelors of Science in Computer Engineering from California State University, Sacramento.</p>
<p><strong>Kipp Owens</strong> (photo not available) is an applications engineer with Intel Corporation in Folsom, CA.</p>
<p> </p>
Wed, 19 Aug 09 15:46:39 -0700Christopher Owens (Intel)140651