Little profiler in MASM
Hello.
I don't know where to post, so I'll post here. I'm writing profiler in MASM, which I'll use in my games and other programs. I've made profiler as two parts system - one is ring 0 driver, written in MASM, which writes and reads MSRs. And other is front end in Free Pascal, which launches driver, configures it, sends commands on performance counters reads and unloads it.
So now phase of errors in driver and reboots have been passed. Driver loads, configures MSRs' reads and writes, perform these reads and writes and unloads. But results, which it returns - very strange - for LLC misses there is very big number - of the same magnitude, as for "UnHalted Core Cycles", "UnHalted Reference Cycles" and "Instruction Retired" with slight differences. In attachment there are results, that program shows.
So, I need some help, maybe I'm making some obvious mistakes. Here is MASM code of procedure DispatchControl, which handles all DeviceIoControl calls -
DispatchControl proc uses esi edi pDeviceObject:PDEVICE_OBJECT, pIrp:PIRP
; DeviceIoControl was called ; We are in user process context here
local status:NTSTATUS local dwBytesReturned:DWORD
and dwBytesReturned, 0
mov esi, pIrp assume esi:ptr _IRP
IoGetCurrentIrpStackLocation esi mov edi, eax assume edi:ptr IO_STACK_LOCATION
push ebx
mov ebx, [esi].AssociatedIrp.SystemBuffer
.if [edi].Parameters.DeviceIoControl.OutputBufferLength >= sizeof PERFORMANCE
.if [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_END_PC_READ mov ecx, 38fh mov eax, 00000000h mov edx, 0h wrmsr
mov dwBytesReturned, sizeof PERFORMANCE mov status, STATUS_SUCCESS
.elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_READ
;---LLC miss--------------------- mov ecx, 0c1h rdmsr
mov dword ptr [ ebx + 48 ], eax mov dword ptr [ ebx + 52 ], edx
;---BranchMissesRetired---------- mov ecx, 0c2h rdmsr
mov dword ptr [ ebx + 64 ], eax mov dword ptr [ ebx + 68 ], edx
;---Fixed function---------------
;---InstrRetired.Any------------- mov ecx, 309h rdmsr
mov dword ptr [ ebx + 32 ], eax mov dword ptr [ ebx + 36 ], edx
;---CPU_CLK_Unhalted.Core-------- mov ecx, 30ah rdmsr
mov dword ptr [ ebx + 16 ], eax mov dword ptr [ ebx + 20 ], edx
;---CPU_CLK_Unhalted.Ref--------- mov ecx, 30bh rdmsr
mov dword ptr [ ebx + 24 ], eax mov dword ptr [ ebx + 28 ], edx
mov dwBytesReturned, sizeof PERFORMANCE mov status, STATUS_SUCCESS
.elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE
;---Zeroing performance counters- mov eax, 0 mov edx, 0
mov ecx, 0c1h wrmsr
mov ecx, 0c2h wrmsr
;---Fixed function--------------- mov ecx, 309h wrmsr
mov ecx, 30ah wrmsr
mov ecx, 30bh wrmsr
;---Thread processor affinity---- invoke KeGetCurrentThread
add ebx, 72 invoke ZwSetInformationThread, eax, ThreadAffinityMask, DWORD ptr [ ebx ], sizeof KAFFINITY
;---LLC miss--------------------- mov ecx, 186h mov eax, 41412eh mov edx, 0
wrmsr
;---BranchMissesRetired---------- mov ecx, 187h mov eax, 4100c5h mov edx, 0
wrmsr
;---Fixed function---------------
;---MSR_PERF_FIXED_CTR_CTRL------ mov ecx, 38dh mov eax, 222h mov edx, 0 wrmsr
;---MSR_PERF_GLOBAL_CTRL--------- mov ecx, 38fh mov eax, 00000011h mov edx, 7h wrmsr
mov dwBytesReturned, sizeof PERFORMANCE mov status, STATUS_SUCCESS
.else mov status, STATUS_INVALID_DEVICE_REQUEST .endif
.else mov status, STATUS_BUFFER_TOO_SMALL .endif
;---Returning from procedure--------------------- pop ebx
assume edi:nothing
push status pop [esi].IoStatus.Status
push dwBytesReturned pop [esi].IoStatus.Information
assume esi:nothing
fastcall IofCompleteRequest, esi, IO_NO_INCREMENT
mov eax, status ret
DispatchControl endp
Structure, in which driver writes results is of this type -
PERFORMANCE STRUCT tscEAX DWORD ? tscEDX DWORD ?
RD_MSR_tscEAX DWORD ? RD_MSR_tscEDX DWORD ?
UnHaltedCoreCycles QWORD ? UnHaltedReferenceCycles QWORD ?
InstructionRetired QWORD ?
LLCReference QWORD ? LLCMiss QWORD ?
BranchInstructionRetired QWORD ? BranchMissesRetired QWORD ?
ProcessorCore DWORD ?
PERFORMANCE ENDS
And IOCtl constants are so -
IOCTL_CACHE_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 801h, METHOD_BUFFERED, FILE_READ_ACCESS ) IOCTL_BRANCH_MISSPRED_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 802h, METHOD_BUFFERED, FILE_READ_ACCESS ) IOCTL_LLC_AND_BRANCH_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 803h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_END_PC_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 804h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_CACHE_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 805h, METHOD_BUFFERED, FILE_READ_ACCESS ) IOCTL_BRANCH_MISSPRED_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 806h, METHOD_BUFFERED, FILE_READ_ACCESS ) IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 807h, METHOD_BUFFERED, FILE_READ_ACCESS )
Now Free Pascal code, that interacts with driver, is so -
ZeroMemory ( @ BBefore, SizeOf ( TPerformanceData ) ); BBefore.ProcessorCore := CORE_TO_WORK_ON;
ZeroMemory ( @ BAfter, SizeOf ( TPerformanceData ) );
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
for i := 0 to 10000 do if TempCardinal div 3 > 100 then TempCardinal := TempCardinal shr 1 else TempCardinal := TempCardinal + i div 5;
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_END_PC_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
//---Showing results----------------------------------- ShowMessage ( 'UnHalted Core Cycles - ' + IntToStr ( BAfter.UnHaltedCoreCycles - BBefore.UnHaltedCoreCycles ) + #13#13 + 'UnHalted Reference Cycles - ' + IntToStr ( BAfter.UnHaltedReferenceCycles - BBefore.UnHaltedReferenceCycles ) + #13#13 + 'Instruction Retired - ' + IntToStr ( BAfter.InstructionRetired - BBefore.InstructionRetired ) + #13#13 +
'LLC miss - ' + IntToStr ( BAfter.LLCMiss - BBefore.LLCMiss ) + #13#13 + 'Branch Misses Retired - ' + IntToStr ( BAfter.BranchMissesRetired - BBefore.BranchMissesRetired ) );
Structure, which is used to interact with driver is so -
TPerformanceData = record tscEAX, tscEDX : Cardinal; RD_MSR_tscEAX, RD_MSR_tscEDX : Cardinal; // 8
UnHaltedCoreCycles : QWORD; // 00 3c - 16 UnHaltedReferenceCycles : QWORD; // 01 3c - 24
InstructionRetired : QWORD; // 00 c0 - 32
LLCReference : QWORD; // 4f 2e - 40 LLCMiss : QWORD; // 41 2e - 48
BranchInstructionRetired : QWORD; // 00 c4 - 56 BranchMissesRetired : QWORD; // 00 c5 - 64
ProcessorCore : Cardinal; // 72
end;
Constants, that are used to create IOCtlCodes, are so -
FUNCTION_CACHE_MISS_READ = $801; FUNCTION_BRANCH_MISSPRED_READ = $802; FUNCTION_LLC_AND_BRANCH_MISS_READ = $803;
FUNCTION_END_PC_READ = $804;
FUNCTION_CACHE_MISS_CONFIGURE = $805; FUNCTION_BRANCH_MISSPRED_CONFIGURE = $806; FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE = $807;
So, I install and run driver as service - everything is correct, and I checked, that all calls to DeviceIoControl are properly handled by driver. In the beginning I attach frontend and driver to the same core, using constant CORE_TO_WORK_ON, which equals 0. Driver reads this value from field ProcessorCore in record TPerformanceData.
I work in Windows 7 and my processor is E5200. So, if somebody have experience or sees any errors - please help me. I'll be very grateful to you.
|