Instruction Latencies in Assembly Code for 64-Bit Intel® Architecture


Optimize assembly-language code for the Itanium® processor family in terms of instruction latencies. The latency of an instruction is the length of time that has elapsed from when the instruction is issued until the time that its results can be used. For most simple integer math operations, like "add r32=r33,r34", the latency is a single cycle, so it is possible to use the results of many operations in the very next set of parallel instructions. This is generally not true for floating-point operations or loads from memory.

  • performance
  • Processori Intel® Itanium®
  • Mobile and Netbook optimization blogs posted on the Atom Developer site

    A few weeks back I posted a few blogs to the Atom Developer site that contained useful information about optimizing for small mobile form factor devices.  I wanted to give a brief mention of those blogs here so that the broader audience might know they are there and to also give a heads up for the Atom Developer focused site.  (New blogs are auto posted as necessary now, but these were posted before that system was in place, thus this notice )

    Putting -lm Before User Objects/Libraries on Link Line Can Impact Performance

    Recommended linking model: icc/icpc/ifort [user objs] [user libs] [sys libs] Using -lm (the GNU math library) prior to user-created objects or libraries causes the GNU libm to be used instead of the Intel math library, impacting performance.
  • Linux*
  • C/C++
  • Fortran
  • Compilatore C++ Intel®
  • Compilatore Fortran Intel®
  • math
  • library
  • performance
  • static
  • libm
  • libimf
  • GNU
  • -lm
  • Ottimizzazione
  • IBM DB2 9.7 on Intel Xeon Processor 5500 Series

    Are you spending too much on your database?

    • PROGRAMMING: Automated tuning features in DB2 9.7 outperformed expert IBM engineers.
    • PERFORMANCE: Intel Xeon processor 5500 series delivers 9x the performance compared to installed single-core servers.1
    • STORAGE: Compression enhancements to indexes and temporary tables increased available space by 60%.2
    • SETUP: Right out of the box, running DB2 9.7 on Intel Xeon processor 5500 series based servers – delivered 78% better performance than on previous-generation Intel processors
  • Server
  • Xeon
  • performance
  • database
  • GPA 2.1 Feature Highlight: Configurable X and Y Axes in the Bar Chart

    In 2.1, GPA allows you to configure both the X and Y axis to any available metric within the bar chart. This allows you to visually see the relationship between multiple per-draw call metrics at the same time. For example, you can select vertex shader duration in the X-axis and pixel shader duration in the Y-axis.

    After configuring the bar chart this way, the wider the bar is - the more vertex shader heavy it is, the taller, the more pixel shader heavy it is.

    See the screenshots below for a view of this feature in action...

    GPA 2.1 Feature Highlight: Buffer Histograms

    GPA 2.1 includes a feature to allow you to better view render targets and textures. Let's say you have a texture with a very narrow dynamic range, all values in the texture are nearly white. When you select this texture to view, by default - it looks white The GPA histogram feature allows you not only to see where the data falls in any buffer, but also to increase the dynamic range of the buffer for viewing purposes.

    GPA 2.1 Feature Highlight: Pixel History based on Overdraw Visualization

    Pixel history has been the most requested feature from the gaming community since the launch of Intel GPA. The great news is that we have added this feature to our currently available 2.1 release. The cool thing about how Intel GPA Frame Analyzer implements pixel history is that you can take a history from not only a normal render target, but also from a render target that has a visualization enabled!

    Iscriversi a performance