Software Tuning, Performance Optimization & Platform Monitoring

Is there any way to detect the Intel RST is running or mSATA existence?


My product had a case that customer can't detect the mSata in Windows7, that will caused our product can't be worked.

As I know, Intel provides their own drivers(iastora.sys) which enable some features not found natively. These features are mostly found in the Rapid Storage Technology UI which allows for raid volumes to be managed and monitored from within the OS itself.

As the title, my question is that is there any way(interface/SDK) to detect the mSata existence or intel RST running?



Could you explain me a difference between those two events UOPS_RETIRED.ALL_PS and UOPS_RETIRED.RETIRE_SLOTS_PS on Sandy Bridge?

I would expect that those events should give approximately the same numbers, since number of used slots should agree we with number of retired uops during period of time. Data below shows that number of used retirement slots is lesser by ~20%  than number ups retired.

Is it possible that uops retired w/o using slot? 

UOPS_RETIRED.ALL_PS - This event counts the number of micro-ops retired.

Unable to generate 'GPA' data with my Intel HD Graphics 4000

I'm trying to profile the execution of an OpenCL kernel on Intel HD Graphics 4000. I've installed the 30-day trial of both VTune and INDE.

If I right-click on my Project in VS2013 > Intel VTune Amplifier XE 2015 > New Analysis, I see this message in the window that opens:

Some GPU metrics are currently disabled by the BIOS. See the product Release Notes for details.

Intel Memory Latency checker w/ Windows support released

We just released v2.3 of Intel Memory Latency checker ( This adds support for Windows o/s while previous versions already supported Linux o/s. In addition, single socket Xeon processors (E3) are also supported. 

Intel Memory Latency checker can be used to measure latencies and bandwidth on Intel Xeon processors


Theoretical Sp and DP Peak Performances of an Intel Core i7 950 3.06 Ghz LGA 1366

We are doing some benchmarks and to determine the efficiency of some codes we need to know the theoretical peak performance of your Core i7 950 3.06 Ghz  LGA 1366.
I an not able to locate such data on any of your data sheets. I need to know the theoretical peak performances of the Intel Core i7 950 3.06 Ghz LGA1336 in number of single and double precision floating point instructions per second.

Let me know if you know where I can find these data or better, if you can, send me the peak performances and please explain me how you calculated them.

Benchmark performance increases with deeper C-states (linux)


I have been running the linpack and netperf benchmarks using Ubuntu 12.04. My machine has 2 physical (2 logical) core SandyBridge processors. I ran using different frequency and C-states configurations. I found that benchmark performance increased when dma_latency was not set to 0 (deeper C-states were allowed). How can this be?

Here are the details:

Linux vs. Windows: package C-state residency over Haswell


I’m using a DELL E7440 Haswell platform with dual OS: Win 8.1 and Ubuntu (3.16). While measuring Package C-state over Windows, I get ~90% PC7 during idle periods. However, when measuring Package C-states over Ubuntu I get only as high as PC3 with nearly 0% residency, while the cores are 99% at C7. I tried to investigate the issue and noticed that the MSR 0xe2 (MSR_PKG_CST_CONFIG_CONTROL) has Package C-State Limit set to 0 (with CFG lock set).



The question might sound strange as SDM explicitly says that those events are not available on SNB. But some posts on forum assume that they are So I could make a conclusion it is only matter of documentation.




PCM can't find second socket on system


I faced a problem with PCM, it can't recognize second CPU available on the board and attribute all core to the single one.

Below is an output from PCM  and after that from wmic 

Briefly in reality there are two CPU E5-2680 with SMT disabled so 8 cores each and 16 totally. PCM in turn found single CPU with 16 cores.

What could be a source of the problem?



Subscribe to Software Tuning, Performance Optimization & Platform Monitoring