h264 Encoder in 7.1 slower than in 7.0

h264 Encoder in 7.1 slower than in 7.0

Hi there,

I ported my H264 Encoder to IPP v7.1 using the samples that I built as dynamic multithreaded libraries and now the performance is dropped on H264 encoding with the same settings as in v.7.0.

Are you aware of the issues with h264 enc in 7.1?

I set m_iThreads to 0, and initialized the encoder correctly as well as the best ipp libs for my CPU using ippInit(). All my cores are used 100% when I am encoding something, but it's around 50-60% slower than it is when I do the same encoding with IPP v7.0 encoder. I am using separate threaded libs (which I download from the site). It's important to note that in v7.0, the usage of cores is around 40%, but still performs way faster than v7.1.

I tried v7.1 single threaded umc libs, as well, but they are way slower than the multithreaded ones.

Any help is really appreciated.

Thanks.

26 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

anyone? Intel?

Hi,

Could you try to link your application to single thread IPP libraries? It may happen that thread oversubscription takes place.

Regards,
Sergey 

Regards,
Sergey

Hi,

I tried building single threaded libs and using them in the application. It works, but it's way slower than with multithreaded ones. We tested it with various video formats and different resolutions. We seem to get for every video, no matter how big it is, around 20 seconds difference between v7.0 and v7.1. Didn't you change something in h264 encoder itself that could cause this behavior?

Thanks

The last issue regarding H.264 that has been solved was excessive CPU load during playback. So, we haven't seen encoding performance issues. Could you provide us with specifics of your encoding parameters: resolution, profile, bitrate and, of course, CPU model? We'll make experiments locally. By the way, it is your own application. Can you reproduce the low performance results with "umc_video_enc_con" ?

Regards,
Sergey 

Regards,
Sergey

I am using IPP's static threaded libraries (not dynamic ones). I downloaded them from the website. umc_video_enc_con uses dynamic libs as I can see. Maybe that's the problem? umc_video_enc_con application utilizes CPU to the maximum 100% all four cores (cpu is: i5 3570) but the encoding speed is twice as fast as with our application. I suspect something is wrong with the libraries along the line.

Do you have any advice what should I try?

Oh, and the parameters we're using are:

res: 480p
profile: UMC::H264_PROFILE_MAIN
bitrate: 3000kbps
rate controls method:  UMC::H264_RCM_CBR

Thanks...

 

First of all, you need to make sure that you use the proper optimized library. With static linking it is done by calling ippInit() function somewhere at the beginning of application. With dynamic linking it does not matter, because ippInit() is called by DllMain function. Then, as far as I see from umc_video_enc_con it sets number of internal IPP's threads to 1 by ippSetNumThreads. Some of video encoding functions (as long as video post/pre-processing functions) still use internal threading (by OpenMP), which brings no good if are used in externally threaded application (as H.264 encoder). So, to limit internal threading to single ippSetNumThreads(1) is used.

So, try to call ippInit() and ippSetNumThreads(1) in application initialization phase.

Regards,
Sergey

By the way, you can build umc_video_enc_con with any type of IPP library using options in IPP samples build script.

Regards,
Sergey

I am using ippInit(), as well as ippSetNumThreads(1)... it's still the same.

 Also, I don't understand the difference between dynamic and static libraries of samples? When I build both I get .libs in both cases, even though for dynamic I expected to see dlls. It was like that in previous version of IPP. So what does exactly mean dynamic and static in build options for the IPP samples? I tried with both versions, and they give the same results and the same encoding time. The only difference I managed to get is with single threaded versions which performed way slower than the multithreaded ones.

So, any more suggestions?

The terms "dynamic" and "static" in samples refer to which IPP libraries will be used during link. Dynamic libs (DLLs or .so) or static (.lib or .a). These terms don't relate to intermediate sample libraries which are generated during sample application build. Thus in both cases you will get static libraries (codecs, muxers, whatever).

Then, in IPP 7.1 UMC samples H.264 encoder can be parallel. Its parallelization is done by OpenMP. When you select "mt" libraries during sample build, the script does two things - it defines USE_OPENMP macro (which masks OpenMP constructs in codec. #ifdef USE_OPENMP etc.) and it puts multi-thread IPP libraries (*_t kind of them) to linker command line. So, basically there are two levels of parallelization - codec-level and function-level. It has been seen that function-level paralellization brings no additional performance benefit when external (upper-level) parallelization is active. You can manually modify linker input files from *_t libs to *_l (lowercase L) libs and will see no difference in performance. Thus, your goal should be enabling codec-level parallelization and disabling function-level (set numthreads to 1).

There are options in command line to umc_video_enc_con for both codec-level (-t <num>) and function-level (--ipp_threads <num>) number of threads. You can simulate your multi-thread encoding environment with this sample.
The command line should be like
umc_video_enc_con -c h264 -i <source>.yuv  -o <dest>.h264 -b 3000000 -r 720 480 -t <num_external> --ipp_threads <num_internal>

I see extra CPU load even during single-thread encoding. It needs to be investigated.

Regards,
Sergey 

Regards,
Sergey

So, let me know whether there will be a fix for this soon, or should I revert to IPP v7.0?

Hi,

To lower CPU loading add the following lines to the file umc_h264_core_enc.cpp at line ~2186

    if (core_enc->m_params.num_slices > nMB)
        core_enc->m_params.num_slices = (Ipp16s)IPP_MIN(nMB, 0x7FFF);
    if (core_enc->m_params.num_slices < core_enc->m_params.m_iThreads)
        core_enc->m_params.m_iThreads = core_enc->m_params.num_slices;
// These lines should be added
#ifdef USE_OPENMP
    omp_set_num_threads(core_enc->m_params.m_iThreads);
#endif
//
    switch (core_enc->m_params.level_idc)

Regards,
Sergey 

Regards,
Sergey

I tried this, but it's still the same. Even with iThreads to 1 in encoder, the load goes from 80 to 100% on all 4 cores.

With iThreads to 0, the load is between 70-90%, but the time needed to encode the file is the same as it was before adding the lines you suggested. I am testing with dynamic_mt libraries - those are the ones I built with the modified source.

Any more suggestions?

Sergey, should I go with revert?

I noticed that with reverted code to v7.0 AND the libiomp5md.dll FROM v7.1 it works equally slow!! BUT, with libiomp5md.dll from v7.0 it works as expected! Do you have any ideas about this? Maybe there's a bug in libiomp5md or I am not using it correctly (in v7.1).

As far as i remember there was problem with this (though, quite long ago), but "Intel compiler" forum knows better. This is their area.
Though, it deserves small test with #pragma omp. Thank you for finding this. We will check.

Regards,
Sergey 

Regards,
Sergey

The funny thing is that libiomp5md from v7.0 works with v7.1 libs as well :D Crazy...

Please report back once you fix it!

Any updates on this?

Hi,

I have just spoken with OpenMP support guys. They know nothing about this problem. If we could create a small reproducer for the problem, we can make them move :). Meanwhile, could you provide with version numbers of "good" and "bad" libiomp5md.dll files. Just in explorer right click on this file and look at Properties/Details tab. On my computer I see file version 5.0.2012.1207 for example.

Regards,
Sergey 

Regards,
Sergey

Hi, thanks for the update.

The one that works well has this version: 5.0.2011.606.

The one that doesn't work well has this version: 5.0.2012.914

Does it ring any bells?

Not yet ring. But we know which good version is. I will continue with reproducer. Please use good omp version for the present.

Regards,
Sergey 

Regards,
Sergey

We are also seeing a 30% speed fall between the 7.0 and 7.1 implementation of the h.264 *decode*. I suspect it is from the same libiomp5md.dll.

Have you reproduced this problem yet on the Intel side, and what is the ETA for a fix?

-Eliot

Hi Eliot,

Do you have an idea where your problematic libiomp5md.dll came from ? From which Intel product ?

I would like to reproduce the issue locally, but for now everything's still ok, and I wonder if it is OK, because DLL is good, or it is OK, because the situation is not modelled correctly.

Regards,
Sergey 

Regards,
Sergey

Yes. The previous version of the libraries we used were:

w_ipp_7.0.4.196_intel64.exe and w_ipp-samples_p_7.0.4.054.zip.

The current versions are

 w_ipp_7.1.1.119_intel64.exe and w_ipp-samples_p_7.1.1.013.zip

We are now compiling with static linking in the new version.

-Eliot

Hi Sergey,

Were you able to replicate the slowdown with h.264 decoding with the two library/ipp samples versions that I mentioned?

-Eliot

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!