Performance per What?

In this short note, we argue that performance per watt, which is often cited in the graphics hardware industry, is not a particularly useful unit for power efficiency in scientific and engineering discussions. We argue that joules per task and watts are more reasonable units. We show a concrete example where nanojoules per pixel is much more intuitive, easier to compute aggregate statistics from...
Authored by TOMAS A. (Intel) Last updated on 06/07/2017 - 12:26

A Compressed Depth Cache

We propose a depth cache that keeps the depth data in compressed format, when possible. Compared to previous work, this requires a more flexible cache implementation, where a tile may occupy a variable number of cache lines depending on whether it can be compressed or not. The advantage of this is that the effective cache size increases proportionally to the compression ratio. We show that the...
Authored by Jon Hasselgren (Intel) Last updated on 06/07/2017 - 12:26
Blog post

Case Study: How Intel® GPA Measurements Alerted Me to Greatly Improve the FPS of my Windows* 8 Store App: The DispatcherTimer

Last year, I wrote a blog about creating your own simple collision detection code.  I implemented this for a children's math game I created.  You can refer to my blog here

Authored by Last updated on 06/14/2017 - 16:35

Stochastic Depth Buffer Compression using Generalized Plane Encoding

In this paper, we derive compact representations of the depth function for a triangle undergoing motion or defocus blur. Unlike a static primitive, where the depth function is planar, the depth function is a rational function in time and the lens parameters. Furthermore, we show how these compact depth functions can be used to design an efficient depth buffer compressor/decompressor, which...
Authored by Magnus Andersson (Intel) Last updated on 06/01/2017 - 11:21

A Sort-based Deferred Shading Architecture for Decoupled Sampling

Stochastic sampling in time and over the lens is essential to produce photo-realistic images, and it has the potential to revolutionize real-time graphics. In this paper, we take an architectural view of the problem and propose a novel hardware architecture for efficient shading in the context of stochastic rendering. We replace previous caching mechanisms by a sorting step to extract coherence,...
Authored by Franz Clarberg (Intel) Last updated on 06/01/2017 - 11:21

Asynchronous Adaptive Anti-Aliasing using Shared Memory

Edge aliasing continues to be one of the most prominent problems in real-time graphics, e.g., in games. We present a novel algorithm that uses shared memory between the GPU and the CPU so that these two units can work in concert to solve the edge aliasing problem rapidly. Our system renders the scene as usual on the GPU with one sample per pixel. At the same time, our novel edge aliasing...
Authored by TOMAS A. (Intel) Last updated on 06/07/2017 - 12:30

Dynamic Stackless Binary Tree Traversal

A fundamental part of many computer algorithms involves traversing a binary tree. One notable example is traversing a space-partitioning acceleration structure when computing ray-traced images. Traditionally, the traversal requires a stack to be temporarily stored for each ray, which results in both additional storage and memorybandwidth usage. We present a novel algorithm for traversing a binary...
Authored by TOMAS A. (Intel) Last updated on 06/07/2017 - 12:24

Theory and Analysis of Higher-Order Motion Blur Rasterization

A common assumption in motion blur rendering is that the triangle vertices move in straight lines. In this paper, we focus on scenarios where this assumption is no longer valid, such as motion due to fast rotation and other non-linear characteristics. To that end, we present a higher-order representation of vertex motion based on Bezier curves, which allows for more complex motion paths, and ´we...
Authored by Jacob Munkberg (Intel) Last updated on 06/07/2017 - 12:27

Advances in Real-Time Rendering in Games: Pixel Synchronization: Solving old graphics problems with new data structures

In this SIGGRAPH 2013 session, we will introduce a new synchronization primitive for pixel shaders that enables a whole new way of attacking graphics problems on the GPU. The new method is easy to use, requires a fixed amount of memory and provides stable and consistent performance. Several applications will be detailed and demonstrated in real-time, including programmable blending, single pass...
Authored by Marco Salvi (Intel) Last updated on 04/26/2017 - 15:48

Hot3D: Haswell Processor Graphics

This talk will detail new graphics hardware and software capabilities introduced in the 4th generation of Intel® Core™ Processors, codename “Haswell”.
Authored by Marco Salvi (Intel) Last updated on 06/07/2017 - 12:28
For more complete information about compiler optimizations, see our Optimization Notice.