Monitoring Integrated Memory Controller Requests in the 2nd, 3rd and 4th generation Intel® Core™ processors

Authors: Roman Dementiev and Angela D. Schmid

Dear Software Tuning, Performance Optimization & Platform Monitoring community,

The recent and upcoming Intel® Core™ processors of 2nd,3rd and 4th generation (previously codenamed Sandy-Bridge, Ivy-Bridge and Haswell) expose model specific counters that allow for monitoring requests to DRAM.

  • 开发人员
  • 合作伙伴
  • 教授
  • 学生
  • 高级
  • Perfmon
  • Performance Counters
  • monitoring
  • - New timeline view

    We have been working full time on getting all the major bugs out of MeshCentral, our remote monitoring and control web site. With any big projects, there are lots of problem all over the place, but getting the basics working is the top priority right now. By the way, thank you to everyone submitting feedback, it's very appreciated.

    Dissecting STREAM benchmark with Intel® Performance Counter Monitor

    Intel® Performance Counter Monitor (Intel® PCM) is an API and a set of tools that should help developers to understand how their applications utilize the underlying compute platform. In this blog I will explain how to instrument the well-known STREAM benchmark with library functions of Intel® PCM reading statistics directly from integrated memory controllers available on the latest Intel® Xeon® 5500, 5600, 7500 and Core™ processor series.

    Avoid short functions on Atom

    One reoccurring theme we have seen in several software stacks running on Atom is that the architecture can take a significant hit in performance from “short” functions.  By “short” functions I mean a function that has very few instructions (~10 instructions) separating the call and the matching return.  The Atom architects came back with a nice explanation of this phenomenon.  The lesson has been to aggressively avoid “short” functions whenever possible for the Atom architecture.  The best methods to avoid “short” functions are listed below:

    Utilizing Performance Monitoring Events to find Problematic Loads Due to Latency in the Memory Hierarchy

    The most common bottleneck found across applications is stalls on loads due to latencies in the memory hierarchy.  Admittedly this is one of the most difficult issues to fix as well.  I plan to utilize this blog to help users identify the issue but will follow this blog with another on methodologies to alleviate the issue.  I promise not to recommend generic fixes such as cache blocking which only be applied to a very small percentage of applications i
    订阅 monitoring