CPU Usage Increase in Multitasking Player

CPU Usage Increase in Multitasking Player

chang-li's picture

I run my video player on Windows 8 pro in Ultrabook Core i7. The cpu usage to run 1 player is 7%, to run 2 player is 14, but to run 3 player is 50%, to run 4 player is 80%. It is odd that the cpu usage is 3x more from 2 to 3 players. Does this mean Ivy Bridge and driver only allow two hardware accelaration for video decoding in two separate tasks? 

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Petter Larsson (Intel)'s picture

Hi,

I cannot really provide any feedback without knowing more about the workload you are running.

What is the video player you are referring to? Is it one of the video decoding samples part of the Media SDK?

From a HW and Media SDK API perspective there is no specific upper limit to the number of concurrent decode or encode operations. Workloads that are executed will share the underlying HW resources which naturally means that the each individual workload will run slower if other workloads execute simultaneously.

Regards,
Petter

chang-li's picture

The workload is simple. I double click player on desktop with 1080p .mov default video and read the CPU usage. Then I double click player again to launch 2nd, read CPU usage, and 3rd, and so on. In my extrem testing I launched 25 player on Windows 8 pro within Intel's Ivy Bridge Core i7 ultrabook. After 4th the CPU usage is near 100%. My player is DirectXShow compatible with Microsoft's DXVA2 decoder which may have Intel's HW acceleration used. The problem is not slow running but cpu usage increase in non-linear. 100% CPU usage is the limit to launch a new new task. In linear growth of CPU usage it is estimated that from 7% I can run 10 to 14 tasks simulaneously. But in non-linear case I can only run 5 tasks. That is a huge 2X difference. 

iliyapolak's picture

Quote:

chang-li wrote:

I run my video player on Windows 8 pro in Ultrabook Core i7. The cpu usage to run 1 player is 7%, to run 2 player is 14, but to run 3 player is 50%, to run 4 player is 80%. It is odd that the cpu usage is 3x more from 2 to 3 players. Does this mean Ivy Bridge and driver only allow two hardware accelaration for video decoding in two separate tasks? 

This is complicated question.You need to take into accoiunt many contributing factors.Every thread has a priority and MM related threads usually are getting their priority level boosted in order to maintain for example smooth playback of the video stream in any moment there can be more priviledged event for exmaple DIRQL or some critical system level thread like those which run scheduler and dispatcher code and in such a situation your video rendering thread will be swapped and put in queue "DispatcherReadyList".

Regarding the load on the CPU  this can be directly dependend on the frame(s) complexity.There is also question of available resources at any moment and by starting many players each of them spawning number of threads  you can not excpect linear scaling relevant to number of logical cores.I do no not know how much floating point data contain rendering threads ,but bear in mind that each physical core can run two threads simultaneously and even if one thread has some instruction interdependencies(I suppose in case of pixel data it won't be so much)

you have to wait for the completion.

In order to get the more clear picture of the load distribution please run profiling tool on your rendering thread(s).I would recommend you to use Xperf tool.

iliyapolak's picture

>>>The workload is simple. I double click player on desktop with 1080p .mov default video and read the CPU usage>>>

With the help of which tool do you read CPU usage?

chang-li's picture

The CPU usage reader is standard Windows Task Manager. I put it in taskbar as runing on background then read the average value. 

I guess that Hardware Acceleration in DXVA2 will be assigned as a device belong to a task, so other tasks will have to switch to software decoder.

Attachments: 

AttachmentSize
Download multitasking-corei7-2.png48.08 KB
iliyapolak's picture

>>>I guess that Hardware Acceleration in DXVA2 will be assigned as a device belong to a task, so other tasks will have to switch to software decoder>>>

Even if the part of the rendering is performed by hardware acceleration, you need to take into account some overhead which is related mostly to process creation memory allocation , thread creation spawning , swapping and termination.Also sending a large amount of data to the GPU over memory-mapped I/O takes some time,moreover CPU must wait for completion and must prepare next burst of data to be send for rendering.All this is time expensive and if more processes are created with all threads in such a case complexity will also increase.

iliyapolak's picture

>>>The CPU usage reader is standard Windows Task Manager. I put it in taskbar as runing on background then read the average value.>>>

Thanks forv the picture?

In order to get fine grained time mesurement I would recommend you to use Xperf tool.

Petter Larsson (Intel)'s picture

Hi,

From your descriptions it sounds like the video player you are using is not using Media SDK? Is that correct?

If that is the case then this is not the correct forum for the concerns you're having. Also, based on the information you shared it looks to me (due to the high CPU load) that the video decode is not HW accelerated. Also, the ability to HW accelerated decode depends on the content you're decoding. For instance, the platform you refer to does support HW decode of MPEG2, VC1 and H.264(AVC).

Regards,
Petter

 

iliyapolak's picture

@chang-li

As Peter said I would try to check the capability of your GPU to do hardware acceleration of the video context.

Login to leave a comment.