The first step was to ensure that the VTune Amplifier timeline would be appropriately annotated with markers indicating frames, as well as event regions marking the start and end of the loading screen and the actual gameplay. While it is possible to use pause and resume annotations to ensure that data is not collected outside the area of interest, we opted not to do so in this case. Once the game was compiled with the frame and event annotations, we began with a microarchitecture exploration analysis (although a hotspots analysis would have been equally appropriate).
The annotations create marks on the timeline in the VTune Amplifier results. The yellow blocks along the very top are frames; the blue chart below them represents framerate. Each row below that is a thread, with running threads marked in green and CPU time in brown. On the topmost thread bar there are two long brackets running along the top. These are the event regions; the yellow one on the left is the loading screen, and the green one on the right is the gameplay.
The game’s usage pattern is characterized by the long loading time, some very slow frames on first entering the game, then a relatively stable staccato frame pattern through the rest of the gameplay. Zooming and filtering in on a representative section in the middle of the gameplay not only makes this pattern of short frames with large gaps between them more obvious, it also shows that a significant amount of the game time is spent “[Outside any known module].”
Since the frames themselves are very short, we filter into one of the gaps between them to display only data associated with this delay. Many of the identifiably gameplay-related functions drop away, leaving the unknown module data and lower-level functions taking up most of the time.
Since the majority of activity in the delays between frames comes from outside the pyrogenesis code, we ran a CPU/GPU Concurrency analysis to determine whether the GPU was active during these times. However, the GPU showed the same staccato pattern, indicating that the game was not waiting on the GPU.
Next, the large number of threads with very little activity prompted us to run a Threading analysis. Sure enough, there were many thread transitions, denoted with yellow lines, occurring in the spaces between frames. Two sync object entries comprised the majority of the time spent.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804