Loading...
You are not logged-in Login/Register





  • Posts   Search Threads
  • rodionkarimovFebruary 24, 2011 2:09 AM PST   
    Little profiler in MASM

    Hello.

    I don't know where to post, so I'll post here. I'm writing profiler in MASM, which I'll use in my games and other programs. I've made profiler as two parts system - one is ring 0 driver, written in MASM, which writes and reads MSRs. And other is front end in Free Pascal, which launches driver, configures it, sends commands on performance counters reads and unloads it.

    So now phase of errors in driver and reboots have been passed. Driver loads, configures MSRs' reads and writes, perform these reads and writes and unloads. But results, which it returns - very strange - for LLC misses there is very big number - of the same magnitude, as for "UnHalted Core Cycles", "UnHalted Reference Cycles" and "Instruction Retired" with slight differences. In attachment there are results, that program shows.

    So, I need some help, maybe I'm making some obvious mistakes. Here is MASM code of procedure DispatchControl, which handles all DeviceIoControl calls -



    DispatchControl proc uses esi edi pDeviceObject:PDEVICE_OBJECT, pIrp:PIRP

    ; DeviceIoControl was called
    ; We are in user process context here

    local status:NTSTATUS
    local dwBytesReturned:DWORD

    and dwBytesReturned, 0

    mov esi, pIrp
    assume esi:ptr _IRP

    IoGetCurrentIrpStackLocation esi
    mov edi, eax
    assume edi:ptr IO_STACK_LOCATION

    push ebx

    mov ebx, [esi].AssociatedIrp.SystemBuffer

    .if [edi].Parameters.DeviceIoControl.OutputBufferLength >= sizeof PERFORMANCE

    .if [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_END_PC_READ

    mov ecx, 38fh
    mov eax, 00000000h
    mov edx, 0h
    wrmsr

    mov dwBytesReturned, sizeof PERFORMANCE
    mov status, STATUS_SUCCESS



    .elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_READ

    ;---LLC miss---------------------
    mov ecx, 0c1h
    rdmsr

    mov dword ptr [ ebx + 48 ], eax
    mov dword ptr [ ebx + 52 ], edx

    ;---BranchMissesRetired----------
    mov ecx, 0c2h
    rdmsr

    mov dword ptr [ ebx + 64 ], eax
    mov dword ptr [ ebx + 68 ], edx

    ;---Fixed function---------------

    ;---InstrRetired.Any-------------
    mov ecx, 309h
    rdmsr

    mov dword ptr [ ebx + 32 ], eax
    mov dword ptr [ ebx + 36 ], edx

    ;---CPU_CLK_Unhalted.Core--------
    mov ecx, 30ah
    rdmsr

    mov dword ptr [ ebx + 16 ], eax
    mov dword ptr [ ebx + 20 ], edx

    ;---CPU_CLK_Unhalted.Ref---------
    mov ecx, 30bh
    rdmsr

    mov dword ptr [ ebx + 24 ], eax
    mov dword ptr [ ebx + 28 ], edx

    mov dwBytesReturned, sizeof PERFORMANCE
    mov status, STATUS_SUCCESS



    .elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE



    ;---Zeroing performance counters-
    mov eax, 0
    mov edx, 0

    mov ecx, 0c1h
    wrmsr

    mov ecx, 0c2h
    wrmsr

    ;---Fixed function---------------
    mov ecx, 309h
    wrmsr

    mov ecx, 30ah
    wrmsr

    mov ecx, 30bh
    wrmsr



    ;---Thread processor affinity----
    invoke KeGetCurrentThread

    add ebx, 72
    invoke ZwSetInformationThread, eax, ThreadAffinityMask, DWORD ptr [ ebx ], sizeof KAFFINITY

    ;---LLC miss---------------------
    mov ecx, 186h
    mov eax, 41412eh
    mov edx, 0

    wrmsr

    ;---BranchMissesRetired----------
    mov ecx, 187h
    mov eax, 4100c5h
    mov edx, 0

    wrmsr

    ;---Fixed function---------------

    ;---MSR_PERF_FIXED_CTR_CTRL------
    mov ecx, 38dh
    mov eax, 222h
    mov edx, 0
    wrmsr

    ;---MSR_PERF_GLOBAL_CTRL---------
    mov ecx, 38fh
    mov eax, 00000011h
    mov edx, 7h
    wrmsr

    mov dwBytesReturned, sizeof PERFORMANCE
    mov status, STATUS_SUCCESS



    .else
    mov status, STATUS_INVALID_DEVICE_REQUEST
    .endif

    .else
    mov status, STATUS_BUFFER_TOO_SMALL
    .endif



    ;---Returning from procedure---------------------
    pop ebx

    assume edi:nothing

    push status
    pop [esi].IoStatus.Status

    push dwBytesReturned
    pop [esi].IoStatus.Information

    assume esi:nothing

    fastcall IofCompleteRequest, esi, IO_NO_INCREMENT

    mov eax, status
    ret

    DispatchControl endp





    Structure, in which driver writes results is of this type -





    PERFORMANCE STRUCT
    tscEAX DWORD ?
    tscEDX DWORD ?

    RD_MSR_tscEAX DWORD ?
    RD_MSR_tscEDX DWORD ?

    UnHaltedCoreCycles QWORD ?
    UnHaltedReferenceCycles QWORD ?

    InstructionRetired QWORD ?

    LLCReference QWORD ?
    LLCMiss QWORD ?

    BranchInstructionRetired QWORD ?
    BranchMissesRetired QWORD ?

    ProcessorCore DWORD ?

    PERFORMANCE ENDS




    And IOCtl constants are so -




    IOCTL_CACHE_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 801h, METHOD_BUFFERED, FILE_READ_ACCESS )
    IOCTL_BRANCH_MISSPRED_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 802h, METHOD_BUFFERED, FILE_READ_ACCESS )
    IOCTL_LLC_AND_BRANCH_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 803h, METHOD_BUFFERED, FILE_READ_ACCESS )

    IOCTL_END_PC_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 804h, METHOD_BUFFERED, FILE_READ_ACCESS )

    IOCTL_CACHE_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 805h, METHOD_BUFFERED, FILE_READ_ACCESS )
    IOCTL_BRANCH_MISSPRED_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 806h, METHOD_BUFFERED, FILE_READ_ACCESS )
    IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 807h, METHOD_BUFFERED, FILE_READ_ACCESS )





    Now Free Pascal code, that interacts with driver, is so -





    ZeroMemory ( @ BBefore, SizeOf ( TPerformanceData ) );
    BBefore.ProcessorCore := CORE_TO_WORK_ON;

    ZeroMemory ( @ BAfter, SizeOf ( TPerformanceData ) );

    Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );

    Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );

    for i := 0 to 10000 do if TempCardinal div 3 > 100 then TempCardinal := TempCardinal shr 1
    else TempCardinal := TempCardinal + i div 5;

    Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );

    Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_END_PC_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );



    //---Showing results-----------------------------------
    ShowMessage ( 'UnHalted Core Cycles - ' + IntToStr ( BAfter.UnHaltedCoreCycles - BBefore.UnHaltedCoreCycles ) + #13#13 +
    'UnHalted Reference Cycles - ' + IntToStr ( BAfter.UnHaltedReferenceCycles - BBefore.UnHaltedReferenceCycles ) + #13#13 +
    'Instruction Retired - ' + IntToStr ( BAfter.InstructionRetired - BBefore.InstructionRetired ) + #13#13 +

    'LLC miss - ' + IntToStr ( BAfter.LLCMiss - BBefore.LLCMiss ) + #13#13 +
    'Branch Misses Retired - ' + IntToStr ( BAfter.BranchMissesRetired - BBefore.BranchMissesRetired ) );





    Structure, which is used to interact with driver is so -





    TPerformanceData = record
    tscEAX, tscEDX : Cardinal;
    RD_MSR_tscEAX, RD_MSR_tscEDX : Cardinal; // 8

    UnHaltedCoreCycles : QWORD; // 00 3c - 16
    UnHaltedReferenceCycles : QWORD; // 01 3c - 24

    InstructionRetired : QWORD; // 00 c0 - 32

    LLCReference : QWORD; // 4f 2e - 40
    LLCMiss : QWORD; // 41 2e - 48

    BranchInstructionRetired : QWORD; // 00 c4 - 56
    BranchMissesRetired : QWORD; // 00 c5 - 64

    ProcessorCore : Cardinal; // 72

    end;





    Constants, that are used to create IOCtlCodes, are so -





    FUNCTION_CACHE_MISS_READ = $801;
    FUNCTION_BRANCH_MISSPRED_READ = $802;
    FUNCTION_LLC_AND_BRANCH_MISS_READ = $803;

    FUNCTION_END_PC_READ = $804;

    FUNCTION_CACHE_MISS_CONFIGURE = $805;
    FUNCTION_BRANCH_MISSPRED_CONFIGURE = $806;
    FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE = $807;





    So, I install and run driver as service - everything is correct, and I checked, that all calls to DeviceIoControl are properly handled by driver. In the beginning I attach frontend and driver to the same core, using constant CORE_TO_WORK_ON, which equals 0. Driver reads this value from field ProcessorCore in record TPerformanceData.



    I work in Windows 7 and my processor is E5200. So, if somebody have experience or sees any errors - please help me. I'll be very grateful to you.


     Attachments 

    Steve Hughes (Intel)February 25, 2011 5:20 AM PST
    Rate
     
    Little profiler in MASM

    Hi rodionkarimov



    Writing a profiler is quite an undertaking, I'm trying to find an expert who can help you out. 

    In the mean time, have you considered looking at any of the profilers we have? AmplifierXE springs to mind, or maybe look round whatif.intel.com - Performance Tuning Utility (PTU) might be right up your street.

    If you used one of our profilers you would be free to write the games you wanted to profile?

    Regards

    Steve

    rodionkarimovFebruary 25, 2011 7:08 AM PST
    Rate
     
    Little profiler in MASM

    > Writing a profiler is quite an undertaking.

    I'm writing simple profiler, it is targeted only on my CPU with Architectural Performance Counters and, at beginning, it will measure only simple parameters, like LLC and branch mispredictions.

     

    Profilers, that you offer, cost money, but I have now not to much, to buy them. And besides - I have such a feeling, that I almost completed profiler. Plus my own profiler is more flexible - I can use it in any way and in any place. And it is very good work for education.



    Steve Hughes (Intel)March 2, 2011 3:13 AM PST
    Rate
     
    Little profiler in MASM

    Hi Rodionkarimov

    You will find quite a lot of useful information about setting up counters here http://software.intel.com/en-us/articles/intel-performance-counter-monitor/

    There is also example driver source code and lots of other useful information too. 

    I hope this helps.

    Regards

    Steve

    rodionkarimovMarch 2, 2011 6:19 AM PST
    Rate
     
    Little profiler in MASM

    Hello, Steve Hughes.

     

    Thank you, I've already read this article and hoped, that somebody more experienced in MSR programming can point me on my errors. So, now I'll more thoroughly study this driver.



Forum jump:  

Intel Software Network Forums Statistics

17,025 users have contributed to 48,321 threads and 172,762 posts to date.

In the past 24 hours, we have 11 new thread(s) 45 new posts(s), and 38 new user(s).

In the past 3 days, the most popular thread for everyone has been Optimalization of sine function\'s taylor expansion The most posts were made to Most likely, the issue is that The post with the most views is Optimalization of sine function\'s taylor expansion

Please welcome our newest member mehakchehal52


For more complete information about compiler optimizations, see our Optimization Notice.