A few weeks back I posted a few blogs to the Atom Developer site that contained useful information about optimizing for small mobile form factor devices. I wanted to give a brief mention of those blogs here so that the broader audience might know they are there and to also give a heads up for the Atom Developer focused site. (New blogs are auto posted as necessary now, but these were posted before that system was in place, thus this notice )
- PROGRAMMING: Automated tuning features in DB2 9.7 outperformed expert IBM engineers.
- PERFORMANCE: Intel Xeon processor 5500 series delivers 9x the performance compared to installed single-core servers.1
- STORAGE: Compression enhancements to indexes and temporary tables increased available space by 60%.2
- SETUP: Right out of the box, running DB2 9.7 on Intel Xeon processor 5500 series based servers – delivered 78% better performance than on previous-generation Intel processors
In 2.1, GPA allows you to configure both the X and Y axis to any available metric within the bar chart. This allows you to visually see the relationship between multiple per-draw call metrics at the same time. For example, you can select vertex shader duration in the X-axis and pixel shader duration in the Y-axis.
After configuring the bar chart this way, the wider the bar is - the more vertex shader heavy it is, the taller, the more pixel shader heavy it is.
See the screenshots below for a view of this feature in action...
GPA 2.1 includes a feature to allow you to better view render targets and textures. Let's say you have a texture with a very narrow dynamic range, all values in the texture are nearly white. When you select this texture to view, by default - it looks white The GPA histogram feature allows you not only to see where the data falls in any buffer, but also to increase the dynamic range of the buffer for viewing purposes.
Pixel history has been the most requested feature from the gaming community since the launch of Intel GPA. The great news is that we have added this feature to our currently available 2.1 release. The cool thing about how Intel GPA Frame Analyzer implements pixel history is that you can take a history from not only a normal render target, but also from a render target that has a visualization enabled!
An important issue to be aware of when measuring application performance on a virtualized system is that of time drift.
A key responsibility of every VMM (hypervisor) is distributing clock ticks generated by the hardware to each VM (guest OS) running on the system. Likewise, it is the responsiblity of each VM to process that clock tick when it arrives so that system time is maintained in the expected fashion.
Schedule instructions properly for optimal performance on the Intel® Itanium® processor. Optimal scheduling will minimize the chances of implicit stops or unexpected dispersal-related stalls.
Observe the following heuristics whenever possible, which are based on best-known methods for instruction scheduling on 64-bit Intel architecture:
Measure the performance penalty associated with bank conflicts on floating-point loads. If you are dealing with a looping algorithm and have unrolled the loops (or if the compiler has done this for you), then more than the minimal latency can be absorbed by the scheduling of the instructions. Even so, removing the bank conflicts will reduce the OzQ activity and can improve the throughput of L2 access.