QPI counters aren't available on Dell r620. Q_P0_PCI_PMON_BOX_CTL=0x0

QPI counters aren't available on Dell r620. Q_P0_PCI_PMON_BOX_CTL=0x0

Hello 

I have aproblem that has been disscued here many times before. PCM has no access to PQI counters. 

The difference is is that in all forum posts where problem is discussded  registers return 0xffffffff. And accodingly to documentation it shoud be -1 if application can't inicialize PMU is PMU is unavailable.

So the question is can 0x0 value be an indication that QPI PMU device is disabled in BIOS or I should be looking for another reason?

Regarding Dell R620 BIOS, I couldn't find any options that would be even close to enabling PMU, performance monitoring devices, juste devices 8 and 9. Can somebody give me any leads? What shoudl I look for? 

PCM output:

ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

Thanks,

Alexander

22 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Alexander,

Probably BIOS locks the access to QPI (PCI address space) in SMM mode.Afaik there is no possibility to access SMM space from within ring0, so  one of the theoritical possibilities is  reversing the BIOS and installing own SMM handler or wait for the new revision of the BIOS which enables sampling of the QPI perf counters.

Hello

Could you ellaborate what "SMM mode", "SMM space" and "SMM handler" are?

What procedure are you sugest with "reversing the BIOS and installing own SMM handler"?

I wouldn't rely on BIOS update as I already installed latest version and I has no access to this options. The next probably will not be realeased soon.

Tnanks,

Alexander

Hi,

sorry for beign not informative:)

SMM stands for System Management Mode , which is special operation mode of CPU accessed from BIOS.

>>>What procedure are you sugest with "reversing the BIOS and installing own SMM handler"?>>>

I'm not suggesting such a procedure.It was published in few sites and it is complex task which require broad knowledge of assembly , IDA disassembler and simply locating the routine(s) which are accessing PCI address space and enabling devices 8 and 9 performance measurement.Btw Intel documentation does not specify exactly which bus and offset is used to access QPI control registers for device 8 and 9.

>>>The next probably will not be realeased soon.>>>

As I was told by one of the Intel engineers BIOS vendor is not obligated to provide such a implementation in its next revision.

Seems I found a real cause for the error. But I can't find a reson for that behavior.

In spite of valid configuration for PCI configuration. Data cannot be properly set.

size = HalSetBusDataByOffset(PCIConfiguration, input_pcicfg_req->bus, slot.u.AsULONG,
&(input_pcicfg_req->write_value), input_pcicfg_req->reg, input_pcicfg_req->bytes);

This call returns zero. And size==0 should be considered as a error.

Input data for a call:

msr.sys: input_pcicfg_req->bus = 63
msr.sys: slot.u.AsULONG = 72
msr.sys: slot.u.bits.DeviceNumber = 8
msr.sys: slot.u.bits.FunctionNumber = 2
msr.sys: &(input_pcicfg_req->write_value) = FFFFFA8037C5D898
msr.sys: input_pcicfg_req->reg = 244
msr.sys: input_pcicfg_req->bytes = 4

I see that I was wrong kernel mode driver can access pci configuration space and write do device 8 and 9.Does the code snippet in your post belong to msr.sys driver?

Citação:

iliyapolak escreveu:

I see that I was wrong kernel mode driver can access pci configuration space and write do device 8 and 9.Does the code snippet in your post belong to msr.sys driver?

Yes. msrmain.c around line 200

Thanks Alexander!

I have some doubts about bus value used to read and write QPI LL PMU registers. xeon-e5-2600-uncore-guide.pdf  says nothing about a bus, only device and funcion. It even doesn't mention a prcedure to find out right bus.

So in the current execution the following value is used:

msr.sys: input_pcicfg_req->bus = 63

As far as I can see this value is taken from the procedure below. Procedure is simple but I couldn't find any documentation about CPU bus location.

Why procedure starts from bus zeor, why device 5 funcion 0 are used? Which spec define this.

int getBusFromSocket(const uint32 socket)
{
    int cur_bus = 0;
    uint32 cur_socket = 0;
    // std::cout << "socket: "<< socket << std::endl;
    while(cur_socket <= socket)
    {
        // std::cout << "reading from bus 0x"<< std::hex << cur_bus << std::dec << " ";
        PciHandleM h(0, cur_bus, 5, 0);
        uint32 cpubusno = 0;
        h.read32(0x108, &cpubusno); // CPUBUSNO register
        cur_bus = (cpubusno >> 8)& 0x0ff;
        // std::cout << "socket: "<< cur_socket<< std::hex << " cpubusno: 0x"<< std::hex << cpubusno << " "<<cur_bus<< std::dec << std::endl;
        if(socket == cur_socket)
            return cur_bus;
        ++cur_socket;
        ++cur_bus;
        if(cur_bus > 0x0ff)
           return -1;
    }
    return -1;
}

Thanks,

Alexander

Regarding the bus number and offset I mentioned in one of my posts that those values were not available in Uncore Guide.I suppose that developers of msr.sys probably had access to this information.

Windbg running in kernel mode can be used to scan pci buses and address space.Command !pci should provide an info about pci configuration space,next commands like eb,ed can write directly to pci registers.

Alexander,

the CPUBUSNO register location (device 5, function 0) and format are documented in https://www-ssl.intel.com/content/www/us/en/processors/xeon/xeon-e5-1600-2600-vol-2-datasheet.html

best regards,

Roman

Thanks Roman for the information.

Problem was related to BIOS settings.

Now it is solved.

Hi Alexander,

sorry for off topic question which is related to your other post,but have you checked with process explorer all your threads ID?

Alexander Alexeev wrote:
> Problem was related to BIOS settings.
> Now it is solved.

I'm experiencing the same problem with a Dell machine; would you be kind enough to reveal what setting should be modified to expose the QPI counters ?

Thanks
Tim

Have you looked at http://software.intel.com/en-us/articles/bios-preventing-access-to-qpi-performance-counters ?

Or go to this forum and search for 'pcm qpi bios'.

Pat

Thanks for link  a lot of valuable information can be found in those cpu datasheets.

Quote:

Tim Day wrote:

I'm experiencing the same problem with a Dell machine; would you be kind enough to reveal what setting should be modified to expose the QPI counters ?

I didn't find a way to enable counters. Dell support confirmed that PCI config space cannot be made accesable with current version of BIOS. They simple recomended to wait for update. 

I switched to another HW to continue development.

Thanks for the response.  I do actually have a support request in with Dell on this now; their latest report was

“We made some experimental BIOS changes that allows us to see the hidden devices in the OS. As Intel does not release drivers for these devices there would be yellow bang in the device manager. Unfortunately even after that Intel’s tool complains about unrecognized CPUs. We are asking for Intel’s help to figure out what might be wrong. Appreciate your patience on this."

(yes I have pointed out the unknown device is an expected result) so I am hopeful the necessary BIOS fixes might appear at some point.  Meanwhile we have found PCM's QPI counters seem to work as expected on an older T7500 system, but of course I'd rather be getting some numbers on more current HW.

Just tried out an experimental BIOS supplied by Dell for my T7600 which allows these devices to be unhidden, and now I'm seeing QPI related info from the PCM lib.  Fantastic!  Kudos to Dell's support for taking the trouble to develop this... not sure whether the option will be released generally in a future BIOS update?

Quote:

Tim Day wrote:

Just tried out an experimental BIOS supplied by Dell for my T7600 which allows these devices to be unhidden, and now I'm seeing QPI related info from the PCM lib.  Fantastic!  Kudos to Dell's support for taking the trouble to develop this... not sure whether the option will be released generally in a future BIOS update?

Hello Tim,

I may have the same problem as you. We are using Dell T7600 too. Currently our reading of QPI are always zeros. Can you tell me where I can BIOS which can enable you to unhide the QPI option? Thank you very much.

Zheng Luo

Come On !!! Do the research !!!

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!