Software Tuning, Performance Optimization & Platform Monitoring

Write Combine Performance and Out of Order


I am a newbie in Write Combine subject. I am measuring a burst IO write performance via mmap on 64 bit Linux and try to understand the WC issue on the IO memory write.  I have several basic questions about WC use for this purpose.

The following is my example test setup for a burst write with Write Combine mode enabled:

1. The device driver set IO memory region using ioremap_wc (MTRR).  This IO memory is the non prefetchable region. The PAT can be set with write combine or non cached flag.

LBR Empty records

I'm using LBR to trace a program, each time before i'm starting to trace I overwrite the MSR_LASTBRANCH_(N-1)_FROM_IP/TO values with value 0 and execute a few instructions and read out the values. sometimes I'm getting empty records (or skipped record) with values of zero, why would this ever happen?


feasibility of an application on atom processor


     My project is fog removal from video.Right now I'm using Intel atom processor.But I want to know whether it is feasible to get 30 frames per sec on atom processor or is it necessary to go for dual core or i3 or i5 processors .Could you please tell me how machine cycles vary among these processors.

Intel PCM Syntax for Linux

I have been trying to use Intel PCM in Linux for quite a few days

When I run pcm.x the usage is posted out as:

 Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43)

 Copyright (c) 2009-2013 Intel Corporation

 Usage: pcm <delay>|"external_program parameters"|--help <other options>

performance counters interrupt and virtualiztion

I'm trying to write an extension to kvm that stops execution after a fixed number of branch instructions (for example 1000).
I've set PERFEVTSEL0 and set the PMC0 (msr 0xc1) to -1000, and wrote an ISR for PMC.
the hw raises an interrupt which causes a vmexit but when reading the PMC0 register the value is more than 0, why is that so?
Is the performance counters not precise?

Writeback in DRAM


I am looking for the hardware performance counters to measure the number of writebacks in DRAM (dirty lines evicted from LLC). I haven't found anything for sandybridge/ivybridge system. OFFCORE_RESPONSE_0:WB measures the writeback in LLC (dirty lines evicted from L2).

Any chance it might be possible to measure in other architectures (atom, etc.). I will really appreciate any help. Thanks

PCU event FREQ_TRANS_CYCLES not changing

        Hi all
        Trying to determine how many cycles are spent on the transition from turbo boost frequency to nominal frequency (event FREQ_TRANS_CYCLES page 85).
        In turbostat I can see that the frequency is changing, but then I read the value of FREQ_TRANS_CYCLES does not change when frequeny change.
        Is this normal?

Iscriversi a Software Tuning, Performance Optimization &amp; Platform Monitoring