Motion Estimation Library (Part III, in Five Lines of Code)

In my earlier posts we tried to embrace the boundaries of the Motion Estimation algorithms and math as the ground base of modern video codecs. We’ll come back to the Motion Estimation techniques soon. And today we’re going to look into software implementation.

A desire to write some code related to video processing from scratch dwindles dramatically as soon as one starts dealing with reference frames, Sum of Absolute Difference, motion vectors and other unscientific things. Fortunately, we have various libraries which already contain the implementations of all these tricky things. Perhaps all we need is only five lines of code and we’re done. So, shall we try it?

Humph… Let’s first repeat what exactly we are going to do:

  • Initialize a motion estimation object.

  • Allocate memory

  • Put input data into memory.

  • Prepare structures for output data.

  • Finally, start the motion estimation algorithm.

I hope you are not as naive as I was when I chose this job ;). I have to slightly disappoint those ones who really believe into five lines of code. On the other hand, the situation is not that bad. Let’s get down to it.

Step 1: Initialization

Let’s assume that an application gives us pure video in YCbCr format with a size of width x height.

// the main hardware me structure

hw_me_handle_t hwhandle = new _HW_ME_HANDLE;

// parameters of hardware me


// auxiliary variables

Ipp32u sts, numMb, imageStep, imageSize;

// initialize hardware me parameters

hwPar.codecStandard = HW_ME_H264;

hwPar.meType = HW_ME_WHOLE_PIXEL;

hwPar.algType = HW_ME_RESIDUAL;

hwPar.strictWithinFrame = ippFalse;

hwPar.minSubblockSize = HW_ME_DIV_16X16;

hwPar.imageSize.width = width;

hwPar.imageSize.height = height;

pMeInfo.pMbInfo = NULL;

// number of reference frames for motion estimation

hwPar.maxNumRef = 2;

// initialize me handle by me parameters 

InitializeHWME(&hwPar, &hwhandle);

We initialized a motion estimation object adapted to H264 standard in full pixel mode (without interpolation) and based on a Full Search algorithm with a minimum split of 16x16 (that is to say, we aren’t even going to try splitting). The maximum number of reference frames equals two.

Step 2: Memory allocation

Allocating memory for the source and reference frames we also declare progressive frame structure.

// reference and source images

HW_ME_IMAGE hwRefImage;

HW_ME_IMAGE hwSrcImage;

// set images properties

hwRefImage.imageStructure = HW_ME_FRAME;

imageStep = (Ipp32u) align_value<size_t> (width, ALIGN_VALUE) | ALIGN_VALUE;

imageSize = (Ipp32u) align_value<size_t> ((imageStep * height * 3) / 2);

hwRefImage.imageStep = hwSrcImage.imageStep = imageStep;

// allocate memory

Ipp8u *pAllocatedReferenceMemory = new Ipp8u [imageSize + ALIGN_VALUE];

Ipp8u *pRefPlane = align_pointer<Ipp8u *> (pAllocatedReferenceMemory);

hwRefImage.pPlanes[0] = pRefPlane;

hwRefImage.pPlanes[1] = pRefPlane + imageStep * height;

hwRefImage.pPlanes[2] = pRefPlane + (imageStep * height * 5) / 4;

Ipp8u *pAllocatedSourceMemory = new Ipp8u [imageSize + ALIGN_VALUE];

Ipp8u *pSrcPlane = align_pointer<Ipp8u *> (pAllocatedSourceMemory);

hwSrcImage.pPlanes[0] = pSrcPlane;

hwSrcImage.pPlanes[1] = pSrcPlane + imageStep * height;

hwSrcImage.pPlanes[2] = pSrcPlane + (imageStep * height * 5) / 4;

Data loading

The next portion of code fills in allocated memory with data linked to the motion estimation object.

// YUV reader

CYUVReader reader;

// initialize YUV reader

sts = reader.Init(filename, width, height);


// read the first frame

sts = reader.ReadYUVData();


// copy data from yuv reader to source image memory

IppiSize iSize;

iSize.height = height;

iSize.width = width;

CopyPlane(reader.m_pYUVData[0], width, hwSrcImage.pPlanes[0], imageStep, iSize);

iSize.width = iSize.width / 2;

iSize.height = iSize.height / 2;

CopyPlane(reader.m_pYUVData[1], width / 2, hwSrcImage.pPlanes[1], imageStep / 2, iSize);

CopyPlane(reader.m_pYUVData[2], width / 2, hwSrcImage.pPlanes[2], imageStep / 2, iSize);

sts = CopyHWMESource(hwhandle, &hwSrcImage);


// start preparing of reference frames

for (Ipp32u i = 0; i < hwPar.maxNumRef; i += 1)


    // read the next frame


    // copy data from YUV reader to reference image memory

    iSize.height = height;

    iSize.width = width;

    CopyPlane(reader.m_pYUVData[0], width, hwRefImage.pPlanes[0], imageStep, iSize);

    iSize.width = iSize.width / 2;

    iSize.height = iSize.height / 2;

    CopyPlane(reader.m_pYUVData[1], width / 2, hwRefImage.pPlanes[1], imageStep / 2, iSize);

    CopyPlane(reader.m_pYUVData[2], width / 2, hwRefImage.pPlanes[2], imageStep / 2, iSize);

    // add reference image to the handle

    sts = CopyHWMEReference(hwhandle, i, &hwRefImage);



Step 4: Launch

And finally, we prepare output structures and launch the algorithm:

// results storage of me


pMeInfo.numFrameRefs[1] = hwPar.maxNumRef;

// calculate the number of macro blocks

numMb = (width * height) / (16 * 16);

// allocate mb info 

pMeInfo.pMbInfo = new HW_ME_MB_INFO [numMb];

pMeInfo.numFrameRefs[1] = hwPar.maxNumRef;

sts = DoHWME(hwhandle, &pMeInfo);


The code above split video frames into blocks and calculates blocks’ motion vectors within reference frames distance. Well, it took more than just five lines of code. But on a positive note… we got highly optimized, parallel implementation of Motion Estimation algorithm without an extra headache. Algorithm’s implementation details are hidden inside the library functions.
For more complete information about compiler optimizations, see our Optimization Notice.