Processor graphics—the integration of graphics functionality into the CPU—will soon be found in computers everywhere.
At the 2010 Intel Developer Forum, Intel announced the 2nd Generation Intel® Core™ processor (formerly code-named Sandy Bridge). This processor is Intel’s first new microarchitecture utilizing 32nm technology. Its smaller transistors and new architectural design result in higher performance at lower power.
One key feature of this architecture is the tighter integration of visual functionality, including graphics and media, into the processor. This integration is often called processor graphics. The ring architecture that connects the x86 processor cores together is now connected to the processor graphics. This ring interconnect enables high-speed and low-latency communication between the processor cores, processor graphics, and other integrated components, such as the memory controller and the display. Basically, the processor cores and graphics engine communicate through a shared cache, creating some interesting possibilities for tight integration between the CPU and GPU.
The integration of process components also provides some new improvements to Intel® Turbo Boost Technology. Dubbed Intel® Turbo Boost Technology 2.0, this improved feature enables the processor to adjust the processor core and processor graphics frequencies to increase performance and maintain the allotted power/thermal budget. This means the processor can increase individual core speed or graphics speed as the workload dictates.
During performance analysis, it’s important to pay attention to dynamic frequencies. A typical game will load down the CPU and GPU with the processor finding a good balance. However, if you are playing back a captured frame, the CPU workload might not be as high and the graphics dynamic frequency might affect your results. This should be a relative scaling within the frame, but the time to complete the frame might be unexpected.
Because Intel Turbo Boost Technology 2.0 is automatically controlled by the CPU, developers cannot directly control it. However, understanding how it works is important. Most games I have investigated benefit well from the graphics dynamic frequency scaling.
The addition of Intel® Advanced Vector Extensions (Intel® AVX) is another interesting 2nd Generation Intel Core processor feature. Intel AVX extends single instruction, multiple data (SIMD) instructions from 128 bits to 256 bits. For applications that are floating-point intensive, Intel AVX enables a single instruction to work on eight floating points at a time instead of the four that the current SIMD provides. (It’s important to note that other hardware vendors have also announced support for Intel AVX.1)
Most developers will use the latest Microsoft Visual Studio* compiler or the Intel® C/C++ Compiler to take advantage of Intel AVX. But for the clock-counting, bit-shifting, tech-heads out there, you can learn more at the Intel AVX Web site. (It even includes an emulator.)
The best way to work with Intel AVX is through intrinsics, which are supported by both the Microsoft and Intel® compilers. Anyone familiar with programming Streaming SIMD Extensions (SSE) or Sony PlayStation* 3’s SPUs will be good “frenemies” with intrinsics. Intrinsics are compiler-specific functions that usually compile down to highly efficient inline machine instructions. Because the compiler has a strong understanding of intrinsics, it will often generate code faster than inline assembly code. Intrinsics are the best way to write high-throughput, compute-intensive code on the CPU and are an acquired taste, like wine or Remedy Entertainment’s Alan Wake*.
Intel engineers and performance-hungry developers are already exploring the ways Intel AVX can benefit game and graphics applications. For example, my coworker, Stan Melax, just wrote a great article presenting a programming pattern to improve the performance of geometry computations by transposing packed 3D data on-the-fly.
Intel AVX is interesting, but let’s shift our focus to the processor graphics engine.
The 2nd Generation Intel Core processor family is Microsoft DirectX* (DX) 10.1 compatible. I like the DX11 multi-threaded API, so most of my current code is DX11 with the proper DX10.1 “feature level” set for 2nd Generation Intel Core processors. With full DX10.1 support, you’ll encounter no major surprises when programming for Intel’s processor graphics. However, you’ll need to keep a few things in mind.
The memory layout for processor graphics is different than it is for a discrete card. Graphics applications often check for the amount of available free video memory early in execution. As a result of the dynamic allocation of graphics memory performed by the processor graphics (which includes Intel® HD Graphics), you need to know the total amount of memory that is truly available to the graphics device. Memory checks that supply only the amount of “local” or “dedicated” graphics memory available do not supply an appropriate value for these devices.
All video memory on processor graphics and earlier generations of Intel® integrated graphics (including Intel® Graphics Media Accelerator Series 3 and 4) use Dynamic Video Memory Technology (DVMT). DVMT memory is considered “local memory.” “Non-local video memory” will show as ZERO (0). This should not be used to determine compatibility with Accelerated Graphics Port (AGP) or PCI Express*.
To accurately detect the amount of memory available, you’ll need to check the total video memory availability. The Microsoft DirectX SDK (June 2010) includes the VideoMemory sample code and describes five commonly used methods to detect the total amount of video memory. Applications targeting Microsoft Windows Vista* and Microsoft Windows* 7 should reference GetVideoMemoryViaDXGI. For Microsoft Windows XP applications, GetVideoMemoryViaWMI is a good starting place. For more information, see the Microsoft sample code site
The best place to get started with Intel® processor graphics is to check out the Intel Graphics Developer’s Guide. This guide focuses on the 2nd Generation Intel Core processor family, but you’ll find useful information covering all Intel® graphics.
Processor graphics excels at more than just 3D graphics and includes hardware dedicated to accelerating media processing. The easiest way to utilize this hardware is with the Intel® Media SDK.
The Intel Media SDK has functions to streamline and simplify video encoding, decoding, and preprocessing operations. Support is provided for encoding (including H.264 and MPEG-2 formats) and decoding (including H.264, MPEG-2, and VC-1 formats). The Intel Media SDK has software fallbacks for any missing hardware acceleration, helping make it platform-agnostic. Platforms that lack hardware acceleration still utilize the optimized (and threaded) software-based video encoding and decoding.
With nearly a million PCs shipped each day, the available market for processor graphics is growing quickly. It’s worthwhile to understand and validate on processor graphics. Soon, processor graphics will be everywhere.
About the Author
Orion Granatir works as a senior software engineer in Intel’s Visual Computing Software Development team, which is just a fancy way of saying “Orion works on video games at Intel.” Most of his current work focuses on optimizing for the latest Intel® technology, including Intel® Streaming SIMD Extensions, Intel® Advanced Vector Extensions, multi-threading, and processor graphics. While in this role Orion has presented at Game Developers Conference (GDC) 2008, Gamefest 2008, GDC Online 2008, GDC 2009, GDC 2010, and the Intel Developer Forum. Orion also writes a column that is published on Gamasutra.com. Prior to joining Intel, Orion worked on several Sony PlayStation* 3 titles as a senior programmer with Insomniac Games. The game titles he has worked on include Resistance*: Fall of Man and Ratchet and Clank Future*: Tools of Destruction.
Sign up today for Intel® Visual Adrenaline magazine: http://va.softwaredispatch.intel.com/ »