I was trying to compare the output of PCM vs PAPI for a software router tool (Click)
I find a non-trivial difference between the output of PAPI and PCM for simple metrics like total cycles and total L3 misses. Each tool is self-consistent (modulo some stochastic noise)
with PAPI PAPI_TOT_CYC = 560548883 PAPI_L3_TCM = 993702
with PCM getCycles = 1288193707, getL3CacheMisses = 746465
If I understand the semantics correctly, both are accessing the same
hardware counters and for the same workload the values should be in the
same ballpark, but this huge discrepancy is really puzzling
Are there known issues that result in different outputs with different performance tools?