Exploring AVS Video Decoder Parallelism Exploring AVS Video Decoder Parallelism on Intel® Manycore Testing Lab

Submit New Article

January 18, 2011 11:00 PM PST




Authors:

  • Konstantinos Krommydas, Department of Computer Science, Virginia Polytechnic Institute and State University
  • Dr. Christos D. Antonopoulos and Dr. Nikolaos Bellas, Department of Computer & Communications Engineering, University of Thessaly
  • Dr. Wu-chun Feng, Department of Computer Science, Virginia Polytechnic Institute and State University

ABSTRACT

Last year, we published a paper entitled “Mapping and Optimization of the AVS Video Decoder on a High-performance Chip Multiprocessor” [1]. Its main purpose was, as its title denotes, to optimize the Chinese “Audio Video Standard” (AVS) [2] decoder, on an Intel® Core™ i7 processor. In the context of this work we evaluated the performance of different code versions, in a variable number of cores (including/excluding Hyper-Threading and Turbo-Boost features).

Unfortunately, at that time, the biggest number of cores we had access to, was four. Figure 1 depicts the results of our experiments, on three different full high definition videos (1920x1080), at various bitrates. While real-time FullHD video decoding was achieved, what we observed was deterioration of performance after Hyper-Threading was enabled (i.e. execution with more than four threads on four SMT cores). Intel® VTune™ Performance Tools helped us identify the data and instruction load units of the cores as the main culprit. But, as we only had the opportunity to test the code in a quad-core machine, we could not be sure if this was the only cause, or how much better performance we could yield with more physical cores.

The opportunity Intel gave us, by granting access to the Intel® Manycore Testing Lab, allowed us to better explore the parallel nature of the AVS video decoder, our code’s scalability, and video decoding in general.

Results

As initial experiments on the Intel Manycore Testing Lab showed (Figure 2), deterioration of performance is observed, even now that the application runs on more than four physical cores. Now that previous limitations are not present, we would expect better performance with more physical cores. This was an interesting finding, which showed that our code did not actually scale as well as we had expected, leading us to re-examine the parallelization strategy we had used in general, and in particular some parallelization decisions we had made to try to make it more efficient. Since the trend shows that resolutions higher than 1080p are becoming prevalent in the very near future, and while more parallelism is inherently available in larger frames, more work needs to be done to overcome the additional computational workload.

Initial changes performed, after these recent measurements, yielded results similar to the previous ones for up to four cores, but this time we observed a small speedup for up to eight cores and a plateau in performance thereon. This is attributed to the single shared task-queue. Code parallelization always introduces new challenges as we move to more processors. Therefore, more things need to be done to exploit currently available parallelism and use known/find new ways to achieve more [3].

Figure 1. Sensitivity analysis.

Figure 2. Test runs on the MTL for Rush Hour, 20Mbps.

With this technical brief, we acknowledge and thank Intel for providing us the opportunity to experiment on the Intel Manycore Testing Lab. Given the aforementioned results and our observations, we are planning to reexamine our code and proceed with the next available steps in optimizing the AVS video decoder, with the purpose of achieving greater thread scalability. Another interesting area of research would be exploring the application’s behavior in varying combinations of cores, taking into account the NUMA architecture of the MTL. Given the fact that most video standards are based on the same principles, our final results and findings on the AVS may be possibly generalized to other video standards.

Sources
[1] Krommydas, Konstantinos; Tsoublekas, George; Antonopoulos, Christos D.; Bellas, Nikolaos; , "Mapping and optimization of the AVS video decoder on a high performance chip multiprocessor," Multimedia and Expo (ICME), 2010 IEEE International Conference on , vol., no., pp.896-901, 19-23 July 2010 doi: 10.1109/ICME.2010.5582558
[2] AVS Workgroup website: www.avs.org.cn/en
[3] See references of paper in [1] for relevant papers.