Can't use PCM

Can't use PCM

Hello,I'm trying to use Intel Performance Counter Monitor to measure L2 and L3 cache hit ratio in a custom code. I'm trying to follow the example snippet in website, but I obtain the following error when I try to initialize processor counters:

Probando Intel PCM
Num (logical) cores: 4

Num sockets: 1

Threads per core: 1

Core PMU (perfmon) version: 3

Number of core PMU generic (programmable) counters: 4

Width of generic (programmable) counters: 48 bits

Number of core PMU fixed counters: 3

Width of fixed counters: 48 bits

Nominal core frequency: 2799999993 Hz

LLEGA a crear instancia

WARNING: Core 1 IA32_PERFEVTSEL0_ADDR are not zeroed 1114660


And my code:
#include "cpucounters.h"

#define F 10000

#define C 10000
using namespace std;


	cout<<"Probando Intel PCMn"<program() != PCM::Success) return -1;

	cout<<"Llega a programar contadores"<The problematic line is:[bash]if (m->program() != PCM::Success) return -1;  
The rest of executables provided in PCM download (like pcm.x, pcm-sensor.x work seamlessy)Any advice? It's almost copied to this page code: in advance!PD: I can't find any documentation for PCM. It's available anywhere?PD2: My system is a i7 860 with Ubuntu 10.04 x86_64

29 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Korso,what is the value returned bym->program() ? It contains error code, one of the

    enum ErrorCode {

        Success = 0,

        MSRAccessDenied = 1,

        PMUBusy = 2,


Best regards,Roman

Hi Roman,You had given me the key. I'd obtained a 2 error code, so I rebooted the computer (It's a WS so normally it's on 24/7) and now same code works. Thank you.Nevertheless, when I executed the program, I obtain "Number of PCM instances" which increases every time I execute the code. I suposse it should be a method or something to destoy the instance. In fact, I'm guessing that probably this problem could be the cause for my former issues. Can you help me with this?Thank you again.PD: There's no documentation available about Intel PCM? Seems a pretty useful library, but with the lack of README of docs is difficult to use it properly.

Hi korso,good point. We did not have the clean up call in the article example. Callm->cleanup(); on your program exit to destroy the PCM instance properly.We are trying to put more of our time to document PCM better. You feedback is very helpful on what we need to improve.We have programmer documentation for PCM in doxygen HTML browsable format (the doxygen project file is included into the package) that documents PCM methods including program(...) , cleanup() and others.Thanks,Roman

Works perfectly, thank you Roman. I've generated doc files, so I'll study them next week.

Hi Roman,
Can you please share the link to programmer doxygen documentation?


we do not host doxygen documentation on a web site. You can easily generate it locally if you just install the doxygen tool. After you installed it just execute "doxygen" without parameters in the main PCM directory (it contains "Doxyfile" project file). It will generate the html documentation which you can open with your browser.

Best regards,

Hey Korso and Roman, I am new to PCM, I also try to use Intel Performance Counter Monitor to measure L2 and L3 cache hit ratio in my custom code. I use exactly the same code and I try to use the command as following to compile the code:

g++ -O mycode.cpp -o mycode -I ./PCM/ -L ./PCM/cpucounters.o ./PCM/msr.o -lpthread

However, it still keeps on showing that 

undefined reference to `PCM::getInstance()'
undefined reference to `PCM::program(PCM::ProgramMode, void*)'
undefined reference to `getSystemCounterState()'

I try to -L the other o files. It still doesn't work. Could you guys give me some hints regards how to link the PCM library to my custome code? I didn't find any detail user manual regards how to use PCM in the custom code except this page. Thanks a lot!

Hello Bingyi,

I'm guessing that you are calling the undefined reference functions with arguments that don't agree with declaration.

The declaration for getinstance() is:

static PCM * getInstance();

Is that how you are calling it?

What happens when you compile it like:

g++ -O mycode.cpp ./PCM/cpucounters.cpp ./PCM/msr.cpp -o mycode -I ./PCM/ -lpthread

The problem might also be the ordering of the object files... if g++ uses a single pass linker... you might need to repeat cpucounters.o like:

g++ -O mycode.cpp -o mycode -I ./PCM/ -L ./PCM/cpucounters.o ./PCM/msr.o ./PCM/cpucounters.o -lpthread

but I doubt this is the issue (since msr.cpp doesn't use getInstance()).

There is also the possibility that c++ name mangling is the issue... perhaps you are enabling mangling in some cases and not in others...

But to me, this error is almost always a c++ compiling/linking issue, not a problem with PCM.


Hey Pat,

Thanks for the fast reply. It works!!! 

If I use the first solution you mentioned, the command line is as following

g++ -O mycode.cpp ./PCM/cpucounters.cpp ./PCM/msr.cpp ./PCM/pci.cpp ./PCM/client_bw.cpp -o 8-4 -I ./PCM -lpthread

It can compile, the warning is as following:

./IntelPerformanceCounterMonitorV2.5.1/cpucounters.cpp: In function ‘void print_mcfg(const char*)’:
./IntelPerformanceCounterMonitorV2.5.1/cpucounters.cpp:2606:61: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
./IntelPerformanceCounterMonitorV2.5.1/cpucounters.cpp:2615:65: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
./IntelPerformanceCounterMonitorV2.5.1/pci.cpp: In constructor ‘PciHandleMM::PciHandleMM(uint32, uint32, uint32, uint32)’:
./IntelPerformanceCounterMonitorV2.5.1/pci.cpp:610:65: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
./IntelPerformanceCounterMonitorV2.5.1/pci.cpp:615:69: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]

Afterward, I run the code, the results is as belowing:

Instructions per clock:-1L3 cache hit ratio:0.479371Bytes read:461888

I wonder if you can find where PciHandleMM() is defined....

The output doesn't seem very reasonable:

Instructions per clock:-1
L3 Cache Misses2049425
L2 Cache Misses3376068
L3 cache hit ratio:0.392955
L2 cache hit ratio0


Downloadtext/x-c++src 8-4.cpp0 bytes

Is there anything wrong with my code?

Probably. IPC should not be -1. I would start looking at why IPC is -1. Or, if there are error message before that, fix those message first.

I assume PCM compiled correctly doesn't report IPC = -1. So I would look for what you are doing differently than what 'unchanged PCM' does. I'm sorry but I don't have time to debug your code.


Bingyi, I'm unable to download your code snippet. Can you post it again?

Sure, Rolf, the code is as following:

#include "/local/homes/bingyiloc/IntelPerformanceCounterMonitorV2.5.1/cpucounters.h"

#define F 200

#define C 200
using namespace std;

PCM *m = PCM::getInstance();
SystemCounterState before_sstate = getSystemCounterState();
// Begin of custom code
cout<<"bingyi's code is working"<<endl;
double matrix[F][C];
for(int i=0;i<100;i++){
for(int j=0;j<100;j++){
matrix[F][C] = 1.0;

cout<<"bingyi's code is finished!!"<<endl;
// End of custom code
SystemCounterState after_sstate = getSystemCounterState();
cout << "Instructions per clock:" << getIPC(before_sstate,after_sstate)<<endl;
cout <<"L3 Cache Misses"<< getL3CacheMisses(before_sstate,after_sstate)<<endl;
cout << "L2 Cache Misses"<<getL2CacheMisses(before_sstate,after_sstate)<<endl;
cout << "L3 cache hit ratio:" << getL3CacheHitRatio(before_sstate,after_sstate)<<endl;
cout << "L2 cache hit ratio"<<getL2CacheHitRatio(before_sstate, after_sstate)<<endl;


Please read Roman's reply to Korso above and follow the already given instructions.



it seems you are missing a call to the program method. May I suggest that you add:



PCM* m = PCM::getInstance ();

I got the following result on my machine:

bingyi's code is working
bingyi's code is finished!!
Instructions per clock:0.584452
L3 Cache Misses: 6893
L2 Cache Misses: 11591
L3 cache hit ratio: 0.405314
L2 cache hit ratio: 0.409767
Cleaning up

Hope this helps,

just wrote a post that got queued for review for some reason;
to add to that post, it seems that your matrix access is out of bounds (using F and C, instead of i and j)

I'll repost my previous comment if it gets lost.


Hey Rolf, 

Thanks so much for your reply!! I appreciate it very much!!

I add the the code 


However the output turns out to be:

Num logical cores: 24
Num sockets: 2
Threads per core: 2
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2200000000 Hz
Package thermal spec power: 95 Watt; Package minimum power: 46 Watt; Package maximum power: 145 Watt;

WARNING: Core 0 IA32_PERFEVTSEL0_ADDR are not zeroed 5439548
bingyi's code is working
bingyi's code finished!

Instructions per clock:-1

L3 Cache Misses4301948
L2 Cache Misses4301948
L3 cache hit ratio:0
L2 cache hit ratio0
Cleaning up

It doesn't seem right to me. Is that because of the waring? Should I set IA32_PERFEVTSEL0_ADDR to zero?

So... Roman told Korso to check the error code. Are you checking the error code?

Also, because I would like to get the cache miss rate from PCM for each Query in the MonetDB system. There are two ways for me to do it. The first way is to insert the following code:

PCM *m = PCM::getInstance();
SystemCounterState before_sstate = getSystemCounterState();

To the monetdb System. However, since the monetdb is in c not c++. So I have to change the source code in PCM a lot...So I decide to get an executable file and call the executable file in my python script. So I add some functions in the cpucounters.h file and make sure I can get the value of a few counters each time and I plan to use them to calculate the cache miss/hit rate. The executable is generated from the following c file:

#include "/local/homes/bingyiloc/IntelPerformanceCounterMonitorV2.5.1/cpucounters.h"

using namespace std;

PCM *m = PCM::getInstance();
SystemCounterState before_sstate = getSystemCounterState();
cout << "get L2Miss"<<getL2Miss(before_sstate)<<endl;
cout << "get L2Re"<<getL2Ref(before_sstate)<<endl;
cout << "get L3Miss"<<getL3Miss(before_sstate)<<endl;
cout << "get L3UnsharedHit"<<getL3UnsharedHit(before_sstate)<<endl;
cout << "get L2HitM"<<getL2HitM(before_sstate)<<endl;
cout << "get L2Hit"<<getL2Hit(before_sstate)<<endl;


I would like to get the value of the parameter and then calculate it the same way in the getL2CacheMiss and etc function. Is that reasonable? Or everytime new the instance PCM is not a good way to calculate?

Thanks Patrick. I did the check it before it is normal. When I check it now, I always get the error code 2. It doesn't go way even when I reboot the machine.

That would be a good start.

It is possible that something else is running which is using the PMU registers.


have a look at:

Roman suggests that this is a feature and I think he refers to the fact someone may indeed be using PMU regs. If you know you are alone on the machine you can always try to run: "pcm.x "sleep 1" which will tell you that the PMU is busy and ask you if you want to reset. Answer "y" and then re-run your program. I also suggest that you also add an error printout and exit if the "m->program ()" call returns an error.


Thank you guys for the help. I solved the problem for the L2&L3 hit ratio. I have another question. Does PCM provide support to get the L1 cache hit/miss ratio? I googled and didn't get very much information. From the cpucounter.h and cpucounter.cpp file, I didn't see any performance counter for L1 cache hit/miss. Thanks again!


I was able to run a couple of examples of using the PCM library in my custom code. However, there is a concern that I hope can be addressed: my understanding of using the PCM library inside a custom c/c++ code is to evaluate the custom code's measurements such as L2 cache misses, bytes read/written to MC and so on. Is this correct? The reason why I am asking is because I have a piece of code that does 2 calls to getSystemCounterState() and in between these calls, there is no code. When I measure statistics such getBytesReadFromMC(), I get a non-zero value when I expect a zero value because there is no activity happening between these calls to measure. What does the non-zero value indicate?


Maybe between those two calls your thread was swapped out and another thread was scheduled to run.


the memory controller counters are shared by many cores in the system. These cores are issuing memory operations while your thread is not doing anything. Also PCM API calls (like getSystemCounterState) are performing some small number of memory operations themselves (a measurement overhead).



Leave a Comment

Please sign in to add a comment. Not a member? Join today