Tips For Developers: The Deadly Light

Submit New Article

June 20, 2009 12:00 AM PDT


by Yakov Sumygin


Tips on optimization of applications for Intel GMA X3000- X3100
  • The latest Intel video adapters are limited in terms of pixel rendering speed. Thus, it is necessary to provide the user with an opportunity to change game settings, e.g. resolution, shadow support and quality. Another consequence is an advantage of complex single-threaded shaders over multithreaded shaders. That means an ability to render several light sources in a single thread, select more important light sources, and disable those sources that don’t affect the object.
  • Using Early Z clipping. For multithreaded algorithms, it is preferable firstly to render all visible object into the Z-buffer, having disabled rendering to the frame buffer, after which to change Z—test to equal and to render brightness for really visible objects only. This is especially necessary because GMA X3000 has the Early Z function which can discard a pixel that didn’t pass the Z-test before sending it to the shader or to Render Target. Also for improved performance objects can be sorted and rendered “front-to-back”.
  • GMA X3000 supports Occlusion Query, i.e. the display adapter can be prompted for an object’s visibility. An object’s box needs to be rendered after disabling the filling of the rendering buffer and the Z-buffer, and then query results have to be processed. The object’s grid itself should be complex enough to warrant additional rendering for the test.
  • For the early discarding of non-lit pixels rendering to the stencil buffer of the light source’s model can be used. For instance, for omni light it will be a sphere, for a spot source – a sphere section. Lit pixels can be marked using the Z-fail Carmack algorithm for shadows rendering. The two-sided stencil that now is supported by almost all display cards should be especially useful.
  • As Intel’s integrated adapters use the main memory it is necessary to lower the load on the memory. To this end, if possible, compressed textures should be used, when creating an object D3DPOOL_MANAGED or D3DPOOL_DEFAULT flag should be inserted into memory, the user should be provided with an ability to change textures’ quality necessarily using textures’ and objects’ level of detail.
  • To use an efficient method for clipping hidden areas, such as Portals or Binary Separation Planes (BSP).

 


Deferred Shading

Until recently, all graphics engines used Forward rendering. For the rendering of an object lit by several light sources the object was rendered once for each source with the ADD blending mode on. Also for each light source shadows were calculated and light wasn’t rendered in shaded areas. Usually bump maps were used, as well as specular maps (maps of reflected light). A shining example of such an engine is idTech 4, earlier called Doom3 engine. As exemplified by this game the disadvantages of this approach can be seen. If bump mapping is disabled a very simplified levels’ geometry is visible. Moreover, the engine has a limitation on the number of light sources per object.
All this because of a significant load on vertex transformations and rasterization.

The solution to this problem lies in Deferred Shadering.

The essence of the algorithm is to place geometrical data of an object into several special textures which are used for lightening thereafter. That is an object is rendered only once.

Now let us look at it step by step:

  • Create 3 float32 textures (supported starting with GeForce 6xxx)
  • In the iteration of rendering cycle, set the textures as currently rendered using MRT (Multiple Render Targets)
  • Render all visible objects into the set textures. Let us have a detailed look at what is being rendered and whereto. Coordinates of a point in view space go to the first texture (I will explain why view space later). A normal of a point also in view space goes to the second texture. An object’s diffused texture goes to the third texture. We have w coordinates of the textures left which can be used for gloss maps and additional objects.
  • Lightening steps. For each visible light source a rectangular is rendered on a display covering this source with a defined special shader. Also the textures filled in the previous step should be defined. Coordinates, normals, and albedo (diffused texture) are extracted in the shader. After that standard pixel lightening operations are used – calculation of diffused lightening, specular component, light dimming calculation.

 


Advice on optimization and details:
  • Using view space – a point is stored in the view space for more accurate and optimized calculation.
  • Rendering a light source model into the stencil buffer before calculating the illumination for marking the pixels lit with this source. A simple pixel shader operates like stencil shadows (algorithm Carmack). For instance, in case of a sphere-like light source a sphere is rendered into the stencil.
  • Shadow maps for soft shadows. When using shadow maps it is necessary to solve the problem of omni (sphere) light sources. I have solved this problem by replacing an omni source with 6 spot sources. To achieve the effect of softness for now a simple PCF is used – blending of adjacent pixels of the shadow map.

 


Convenient multithreading

Convenient and safe multithreading can be achieved by the division of an application into several modules communicating with each other using messages. Windows uses this model, but it isn’t convenient enough. It is necessary to compile a message handler, to create identifiers  etc. etc. I have compiled a basic subsystem class where boost functors function as messages. Functors enable us to save and postpone the call of a function with parameters. This is an RPC (remote procedure call) of sort based on С++. Now in order to call a function from another thread it is necessary to set an example: physics.post_msg(&CPhysics::SetPlayerPosition, position), the call of a function will be placed in a queue and the messages queue will be executed in the next iteration of the thread. A function’s parameter will be copied automatically, and as a consequence there won’t be any problems with the synchronization access to memory. A system for the transmission of functions’ calls over the network also can be improved.


Advice on optimization and details:
  • If multithreading isn’t needed this system can be used in a single-threaded application. In my engine this procedure is static, i.e. to switch between the modes recompilation is needed. This done for performance reasons. In this case the compiler optimizes a code and there are almost no functions call overheads.
  • The subsystem operates in a separate thread depending on a special compilation key.
  • It looks like this:

 

void AddMessage(boost::function0 func)

{

//try to get access to resource

Lock();

m_queue.push_back (func);



//free resource

Unlock();

}

 

 

Execution of a queue of messages

bool execute()

{

//try to get access to resource

Lock();



//copy messages to reserve array

std::vector< boost::function0 > queue = m_queue;

m_queue.clear();



//free resource

Unlock();

//execute messages

bool b = true;

foreach(boost::function0& msg, queue)

b &=msg();

reburn b;

}

 

  • The system is fully based on С++  and boost, and thanks to that it is transferable to other platforms.

 


Support for a wide range of display adapters

To support different hardware it is necessary to work with rendering through an interface – an abstract basic class. To develop different basic interface threads for adapters with different capabilities. The engine supports 3 methods of lightening: Deffered Shading for modern adapters with Shader Model 3.0 support such as Intel X3500 (G35), Doom3 adaptive illumination for the previous generation of adapters, and lightening with recalculated illumination maps.


Pixel by pixel lightening without pixel shaders

Modern lightening consists of the following components: pixel by pixel fading, pixel bump mapping, shadows. All this can be achieved even it an adapter doesn’t support pixel shaderes. Dimming can be achieved using the rendering of two textures of dimming into the alpha buffer, stencil shadows don’t require pixel shaders, pixel by pixel bump mapping may be implemented using integrated  DirectX Dot3 texturing (D3DTOP_DOTPRODUCT3 flag). All this is supported by Intel’s early generations integrated graphics adapters such as Intel® 82810, 82815, 82830M, 82845G.


Mobile gaming support

To support gaming on a mobile device Intel Laptop Gaming TDK library  can be very useful. In this library in a friendly interface power level control, connection level, and processor load control functions are accumulated. I have added a simple manager that makes it even easier to work with these functions.

Here is the external interface:

//wrapper for Intel Mobile TDK

class CManagerMobile: public

CSingletonAccessor

{

public:

CManagerMobile();



public:



bool Init();

bool Update (sys_time time);

float GetPower():

float GetProcessorUtil():

CONNECTIVITY_INFO& GetConnectivityInfo();



protected:

IntelLaptopGamingTDKInterface* m_interface;





void CManagerMobile::UpdatePower()

{

//Get the power source

m_power_source = m_interface->GetPowerSrc():



if (m_power_source ==AC_Power)

{

m_battery_mode = false;

}

else if (m_power_source == Battery_Power)
{

m_battery_mode = true;

}

else

{

return;

}



if ( m_battery_mode )

{

m_battery_life_time = m_interface->GetSecBatteryLifeTimeRemaining();

m_battery_remaining = m_interface->GetPercentBatteryLife();

}



if ( m_power_source == AC_Power)

{

m_battery_life_time = m_interface->GetSecBatteryLifeTimeRemaining();



//Min physics

//Min effects



//check connection
if ( !m_mbConnected || m_bWiredConnection )

{

if (m_interface->IsWirelessAdapterEnabled()

&& !m_bDisableWirelessQnAsked)

{

//Suggest to turn off wireless adapter....

//Never ask the question again

bDisableWirelessQnAsked = true;

}

}

}

}

 

Upon initialization the manager calculates the number of processors, upon the refreshment it calculates power level and connection quality. When power decreases to the threshold level the manager shows the corresponding message window.

It is also possible to lower the load on the graphics adapter and the processor if the user unplugged the mains cable from the laptop. For instance, by lowering the quality of shadows and disabling certain effects.

Advice on optimization: information about the level of load on processor is very slow. Frames per second drop approximately three-fold, for this reason this function should be called not more than once per 3 seconds.


A technology for moving bots in the shadow.

In the Deadly Light project it was necessary to implement the movement of characters which don’t carry light. To this end I have decided to calculate the brightness of a character in every point of the way. When a level is loaded the way point brightness is calculated.

Brightness is calculated as follows:

  • For each point all light sources that illuminate it are calculated.
  • For each light source ray casting from a way point to the light is carried out, and if this ray intersects something that means that the point is in the shadow.
  • The level of illumination from each source is summed.
  • For static sources the level of illumination is stored for dynamic – is recalculated each time.
  • The level of illumination is calculated in a character’s box points.
  • When calculating a route for a zombie (A*) for the movement to lit points higher weights are defined so that the zombie would avoid lit routes. Now the zombie will avoid the torchlight and lit areas, but if necessary it can jump over these areas at the expense of its health.