Incorrect values returned for IPP SAD computation using ippiSAD8x8_16u32s_C1R

Incorrect values returned for IPP SAD computation using ippiSAD8x8_16u32s_C1R

Incorrect values are often returned when using the IPP function ippiSAD8x8_16u32s_C1R() to compute an 8x8 SAD for 16 bit video. Video that is 15 bit or less appears to work correctly. The maximum possible 8x8 SAD value for 15 bit video is (215 -1) * 82. Incorrect values are returned once the SAD value becomes greater than the maximum possible value for 15 bit. A list of example values is attached and the source code used to generate them is listed below.

 In the source code that follows edit the values of mainVal and addVal to test different SAD sizes. The final SAD value should equal (mainVal * 82) + addVal.

The equivalent function for 4x4 SAD appears to have the same problem. Also, I am using IPP version 7.0.

void ippiSAD_test()


    Ipp16u cur[64], ref[64];

    Ipp16u *pCur = cur;

    Ipp16u *pRef = ref;

    I32 curStep = 8;

    I32 refStep = 8;

    Ipp32s isad = 0;

    I32 csad = 0;

    // SAD = (mainVal * 8 * 8) + addVal

    Ipp16u mainVal = 65535;

    Ipp16u addVal = 0;


    // set image pixel values

    for (I32 i = 0; i < 8; ++i) {

        for (I32 j = 0; j < 8; ++j) {

            pCur[i*8+j] = mainVal;

            pRef[i*8+j] = 0;



    pCur[0] += addVal;


    IppStatus stat;

    stat = ippiSAD8x8_16u32s_C1R(pCur, curStep*2,

                                 pRef, refStep*2,

                                 &isad, IPPVC_MC_APX_FF);

    ASSERT_TRUE(ippStsNoErr == stat);


    for (I32 j = 0; j < 8; ++j) {

        Ipp16u *p1 = &pCur[j * curStep];

        Ipp16u *p2 = &pRef[j * refStep];

        for (I32 k = 0; k < 8; ++k) {

            csad += abs(p1[k] - p2[k]);




    printf("IPP SAD:      %d\n", isad);

    printf("COMPUTED SAD: %d\n", csad);


    ASSERT_TRUE(isad == csad);



Downloadimage/png table.png5.81 KB
Downloadtext/x-c++src ippisad-test.cpp1.04 KB
8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Gregory, thanks for the report. I see the same results with  the 7.1.1version. we will check the reasons of this problem.

Gregory, I don't see the problem with the latest 8.0 version: 

Ipp16u mainVal = 50000;

ippIP PX (px) 8.0.0 (r40040)
IPP SAD: 3200000
Press any key to continue . . .

Ipp16u mainVal = 65535;

ippIP PX (px) 8.0.0 (r40040)
IPP SAD: 4194240
Press any key to continue . . .

Gregory, actually the code you provided, works fine only when static code initialized for SSE code only. 

ippStaticInit(); IppCpuType cputype = ippCpuSSE; // PASSED

for all others cases - the problem still persists with the latest 8.0 version too. The problem is escalated. We will inform you as soon as the problem will be fixed.


Gennady, thanks for the quick response. I don't think the static solution will work for my application. I have come up with a work around for now, but will revert back to the IPP call once the problem is fixed.

Thanks for your help. 

Hi Greg,

Could you please explain a bit - in general - why static solution won't fit your needs? We would like to know what needs to be improved in static libs.



The solution Gennady provided will not work because the most optimized instruction sets are required to meet specific performance criteria.


Just as a general note, this bug is no longer an issue for me. The problem only occurs when processing full 16 bit images. Currently, the application I am using it for only requires support for 10 bit. I'd imagine processing full 16 bit images is a rare case and probably why this bug has gone unnoticed. 


Leave a Comment

Please sign in to add a comment. Not a member? Join today