hppiSAD and hppWait timeouts

hppiSAD and hppWait timeouts

Hello,

I have an issue with hppWait and hppiSAD. My target configuration is:

Intel ® (R) Atom CPU E3845 @ 1.91 Ghz (4 cores)

Windows 8 Enterprise (32-bit)

Intel Graphics Driver 10.18.10.3496 (3/11/2014)

IPP Preview 2014 February (2014.feb).

SSE42 instruction set / IvyBridge

 

The issue is basically that, for a specific instance of template matrix (100x100) and GPU acceleration, the hppWait function did not return. The issue was only observed on the target specified above. No problems were observed when using the AVX2 instruction set/ Haswell GT2 configuration.

 

Workarounds included :

  1. Changed HPP_TIME_OUT_INFINITE to a specific timeout value, so that the error code from hppWait could be accessed and analyzed. This was not successful, since the hppWait function never returned, even with a specified timeout value.
  2. Changed the template matrix specification : 8x8, 16x16, & 32x32 all worked, in lieu of 100x100.

 

Other symptoms of this issue were as follows:

  1. When using the 100x100 template and the GPU accelerator, an Error Window would be displayed on the Windows 8 desktop  to say “Display Device Driver Stopped Working and has recovered..” This would always occur at the point in the execution sequence where hppiSAD was activated.
  2. When using the 16x16 or 32x32 template specifications, the error messages below were observed:

[ERROR]   : CM EnqueueCopyGPUToCPUStride Failed: errCode=-56.

[ERROR]   : VAL ReadSurface error

[ERROR]   : m_pVAManager->ReadSurface Failed: erroCode=-156.

 

The associated hppWait call returned in this case, so the above error messages may or may not be related to the issue with the 100x100 template.

 

The code base under execution using the two different targets is the same, even though the hppWait failure to return was only observed on IvyBridge. I am not clear as to why the template size matters. This is a near term issue, mainly because it blocks execution of tests which are sequenced after hppiSAD invocation.

 

In trying to troubleshoot this issue, the hppiSAD function was invoked with two different scale factors. I have noted an earlier forum entry regarding negative hppiSAD scale factors resulting in erroneous hppWait return values. But I wanted to ask further about scale factor values  0 and 1. According to my testing, they are both legal, but from an internal processing perspective, is there a fundamental difference between the two ? How are the results different when these values are processed ?

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Brian,

Can you specify the image size that you use in hppiSAD call? We are trying to reproduce the problem on our side but image size is also important: it's possible that for sufficiently large images, GPU execution time of SAD becomes larger than maximum time allowed by Windows for one GFX task on your system and Windows reboots GFX driver; you can remove this limit by updating the registry

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrDelay"=dword:000004b0
"TdrDdiDelay"=dword:000004b0

This is likely the reason because 1) the failure depends on template size (for larger templates, execution time of SAD grows significantly) 2) test passes on faster system (HSW).

Regarding scale value, the scale factor is commonly used in IPP functions that operate with integer datatypes. Scale value 0 means the result will be output exactly as it was calculated; scale value 1 means the result will be divided by 2^1 i.e. 2. This is explained in the manual in more detail. Negative scale values are meaningless for SAD and not allowed.

Thanks Dmitry for the timely response. The base source image size for this test is {1920, 1080}. The suggestion you offered regarding the Registry configuration makes sense. I'll make that change and let you know the result. Also, thanks for the clarification regarding the scale factor setting.

I tried the Windows registry changes, but was unable to avoid Windows detection of issues with the driver. Initially , I added TdrDelay and TdrDdiDelay, as you suggested. The Regedit program tells me that the values are 1200 secs  (0x000004b0) . After switching the test software back to the 100x100 template size, the original issue re-surfaced, and was accompanied by the desktop indication "driver stopped responding and has recovered". I further added TdrLevel , with the value of zero (0x00000000), and retested, but the result was the same.

I have other workarounds I can instrument to get me around this issue. We will be using faster hardware in the future. I was hoping that the Windows configuration change would suffice, since it involved a one time registry update.

We reproduced the original problem on Atom-based system for image size 1920x1080 and template size 100x100. The problem is due to the long execution time of the GPU kernel that calculates SAD for these parameters, when Windows detects that GFX driver is not responding, the driver is rebooted.

The problem is fixed with the above mentioned registry changes with longer timeout values

"TdrDelay"=dword:00004b00
"TdrDdiDelay"=dword:00004b00

(the values are milliseconds, so it's about 19 seconds).

Thanks for getting back to me on this issue. I will try the Registry setting as you suggested. I agree with your new findings, and want to be able to verify via testing that the TDR can be switched off via the Delay values in milliseconds.

Leave a Comment

Please sign in to add a comment. Not a member? Join today