A Look at Sandy Bridge: Integrating Graphics into the CPU

Processor graphics will soon be found in computers everywhere. Intel calls this new capability Intel® HD Graphics; at AMD, it’s called AMD Fusion*. Both names refer to the integration of graphics functionality into the CPU.

At CES this year, Intel introduced the 2nd Generation Intel® Core™ processors (codenamed “Sandy Bridge”). This processor is Intel’s first new microarchitecture utilizing 32nm technology. Smaller transistors and new architectural design result in higher performance at lower power.

One advantage of working at Intel is having early access to leading-edge technology. I’m writing this blog on my Sandy Bridge machine. Right now it may be another big dev box on my desk, but this chip is now reaching numerous mainstream laptops. The processor graphics are pretty good. It runs one of my favorite games, Bioware’s Mass Effect* 2, at a solid fps in glorious detail (by the way, my mission was truly suicidal).

One key feature of this architecture is this tighter integration of graphics into the processor. The ring architecture that connects the processor cores (x86 cores) together is now connected to the processor graphics. This ring interconnect enables high-speed and low-latency communication between the processors cores, processor graphics, and other integrated components, such as the memory controller and the display. Basically, the processor cores and graphics engine communicate through a shared cache, creating some interesting possibilities for tight CPU/GPU interaction. We’ll explore some of these topics in future articles.

The integration of process components also provides some new improvements to Intel® Turbo Boost Technology. Intel® Turbo Boost Technology enables the processor to adjust the processor core and processor graphics frequencies to increase performance and maintain the allotted power/thermal budget. This means the processor can increase individual core speed or graphics speed as the workload dictates.

During performance analysis, it’s important to pay attention to dynamic frequencies. A typical game will load down the CPU and GPU with the processor finding a good balance. However, if you are playing back a captured frame, the CPU workload might not be as high and the graphics dynamic frequency might affect your results. This should be a relative scaling within the frame, but the time to complete the frame might be unexpected.

Since Intel® Turbo Boost Technology is automatically controlled by the CPU, a developer cannot directly control it. However, it’s important to understand how it works. Most games I have investigated benefit well from the graphics turbo scaling.

The addition of Intel® Advanced Vector Extensions (Intel® AVX) is another interesting Sandy Bridge feature. AVX extends SIMD (single instruction multiple data) instructions from 128 bits to 256 bits. For applications that are floating point intensive, AVX enables a single instruction to work on eight floating points at a time instead of the four that the current SIMD provides.

It’s important to note other hardware vendors have also announced support for AVX.

Most developers will just use the latest Microsoft® Visual Studio compiler or Intel’s C/C++ Compiler to take advantage of AVX. But, for the clock counting, bit shifting, tech heads out there, you can learn more at the Intel’s AVX website.

The best way to work with AVX is through intrinsics, which are supported by both the Microsoft and Intel compilers. Anyone familiar with programming SSE or the PlayStation* 3’s SPUs will be good “frenemies” with intrinsics. Intrinsics are compiler specific functions that usually compile down to highly efficient inlined machine instructions. Since the compiler has a strong understanding of intrinsics, it will often generate code faster than inlined assembly. Intrinsics are the best way to write high-throughput, compute-intensive code on the CPU (intrinsics are an acquired taste, like wine or Remedy Entertainment’s Alan Wake*).

Intel engineers and performance hungry developers are already exploring ways AVX can benefit game and graphics applications. For example, my coworker Stan Melax, just wrote a great article presenting a programming pattern to improve the performance of geometry computations by transposing packed 3D data on-the-fly.

AVX is interesting, but let’s shift our focus to the graphics engine.

Remember, Sandy Bridge is a DirectX 10.1 part. I like the DX11 multithreaded API, so most of my current code is DX11 with the proper DX10.1 “feature level” set for Sandy Bridge. With full DX10.1 support, there are no real major surprises when programming for Intel graphics. However, there are few things to keep in mind.

The memory layout for processor graphics is different than it is for a discrete card. Graphics applications often check for the amount of available free video memory early in execution. As a result of the dynamic allocation of graphics memory performed by the Intel® HD Graphics devices (based on application requests), you need to know the total amount of memory that is truly available to the graphics device. Memory checks that supply only the amount of “local” or “dedicated” graphics memory available do not supply an appropriate value for the Intel® HD Graphics devices.

All video memory on Intel® HD Graphics and earlier generations, including Intel® Graphics Media Accelerator Series 3 and 4, use Dynamic Video Memory Technology (DVMT). DVMT memory is considered “local memory.” “Non-local video memory” will show as ZERO (0). This should not be used to determine compatibility with Accelerated Graphics Port (AGP) or PCI Express*.

To accurately detect the amount of memory available, you’ll need to check the total video memory availability. The Microsoft DirectX* SDK (June 2010) includes the VideoMemory sample code and describes five commonly used methods to detect the total amount of video memory. Applications targeting Microsoft Windows Vista* and Microsoft Windows* 7, should reference GetVideoMemoryViaDXGI. For Microsoft Windows* XP applications, GetVideoMemoryViaWMI is a good starting place. For more information, see the Microsoft sample code site.

The best place to get started with Intel processor graphics is to check out the Intel Graphics Developer’s Guides

With nearly a million PCs shipped each day, the available market of processor graphics is growing quickly. It’s worthwhile to understand and validate on processor graphics. Soon, processor graphics will be everywhere.