Multithreading

Multithreading

Solovyov Yuriy's picture

Hello, I have some problems with Havok multithreading mode. When I use Contitious simulation mode it's much faster than the same scene in multithreading mode. My Processor is Core Quad 2.4GHz. Even if set number of threads to 1 in multithreading mode it's slower than continious :(

My code is:

// From create_world() function.

info.m_simulationType = hkpWorldCinfo::SIMULATION_TYPE_MULTITHREADED;

hkpMultithreadingUtilCinfo ci;

ci.m_world = m_world;

ci.m_numThreads = 1;

m_multithreadingUtil = new hkpMultithreadingUtil(ci);

// From step() function

if (m_multithreadingUtil != HK_NULL && m_multithreadingUtil->m_state.m_world == m_world)

{

m_world->resetThreadTokens();

m_multithreadingUtil->stepWorld(dt, false);

}

else

{

m_world->stepDeltaTime(dt);

}

And that's all that I've done (I get this from Demo's sample solution). Maybe I forgot something and that's why it's very slow. Help me please.

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
havokdaniel's picture

Hi Yuriy,

How is the performance when you have m_numThreads as 3 or 4? Is this your own application or are you seeing this in the demo framework too? What do you mean when you say "slower": framerate is lower or can you see Havok timings changing for the worse?

Thanks,
Daniel

Solovyov Yuriy's picture

Thank for your answer. I'm doing our own engine now. As I know it will use Threading Building Blocks (TBB) instructions and fixed number of threads from the very beginnig, so our physics part do not need to use Havok's own multitheading implementation, but I try to understand how it works.

Now I've done own OpenGL render for testing my app. It starts in main() function. I alsow heve done my own implementation of PhysicsManager, which functions are create_world, destroy_world, step_physics... etc :) So, in RenderCallback() I call manager->step_physics() and then update graphics. This is main thread. What I've done in manager I wrote above.

For calculating time delays of physics I use time-stamp counter fuction:
#pragma warning(disable : 4035)
__int64 GetTics()
{
__asm rdtsc ; get time stamp
/*
__asm rdtsc
__asm mov dword ptr pTsc, eax
__asm mov dword ptr pTsc[4], edx
*/
}
#pragma warning(default : 4035)

So, I've done this:

__int64 StartTime;
__int64 EndTime;
unsigned long duration;

StartTime = GetTics();
//
if (m_multithreadingUtil != HK_NULL && m_multithreadingUtil->m_state.m_world == m_world)
{
m_world->resetThreadTokens();
m_multithreadingUtil->stepWorld(dt, false);
m_multithreadingUtil->waitForStepWorldFinished();
}
else
{
m_world->stepDeltaTime(dt);
}

EndTime = GetTics();
duration = (unsigned long ) (EndTime - StartTime); // cycles
// Duaraion in milliseconds
printf("Physics duration: %f ms
", duration / 2400000.0f);

My processor has 4 cores, 1 core is already used by my main loop and render, so I have 3 cores free for my physics threads. Manipulating with m_numThreads I've got next results with the same world:

m_numThreads = 1; duration = 11 ms
m_numThreads = 2; duration = 28 ms
m_numThreads = 3; duration = 70 ms

With a continious type of simulation with the same world I get:

duration = 5 ms

And usually I can see that my framerate is lower when I use 2-3 threads :(

In the demo framework example I'll check tomorrow.

When I initialize stack area i set size to 5MB, so havok doesn't wrote that there are not enougth memory and it will get from system memory. In the demo samples it wrote.

Solovyov Yuriy's picture

Thank for your answer. I'm doing our own engine now. As I know it will use Threading Building Blocks (TBB) instructions and fixed number of threads from the very beginnig, so our physics part do not need to use Havok's own multitheading implementation, but I try to understand how it works.

Now I've done own OpenGL render for testing my app. It starts in main() function. I alsow heve done my own implementation of PhysicsManager, which functions are create_world, destroy_world, step_physics... etc :) So, in RenderCallback() I call manager->step_physics() and then update graphics. This is main thread. What I've done in manager I wrote above. For calculating time delays of physics I use time-stamp counter fuction:

#pragma warning(disable : 4035)

__int64 GetTics()

{

__asm rdtsc ; get time stamp

}

#pragma warning(default : 4035)

So, I've done this:

__int64 StartTime;

__int64 EndTime;

unsigned long duration;

StartTime = GetTics();

//

if (m_multithreadingUtil != HK_NULL && m_multithreadingUtil->m_state.m_world == m_world)

{

m_world->resetThreadTokens(); m_multithreadingUtil->stepWorld(dt, false); m_multithreadingUtil->waitForStepWorldFinished();

}

else

{

m_world->stepDeltaTime(dt);

}

EndTime = GetTics();

duration = (unsigned long ) (EndTime - StartTime); // cycles // Duaraion in milliseconds printf("Physics duration: %f ms
", duration / 2400000.0f);

My processor has 4 cores, 1 core is already used by my main loop and render, so I have 3 cores free for my physics threads. Manipulating with m_numThreads I've got next results with the same world:

m_numThreads = 1; duration = 11 ms m_numThreads = 2; duration = 28 ms m_numThreads = 3; duration = 70 ms

With a continious type of simulation with the same world I get:

duration = 5 ms

And usually I can see that my framerate is lower when I use 2-3 threads :( In the demo framework example I'll check tomorrow.

When I initialize stack area i set size to 5MB, so havok doesn't wrote that there are not enougth memory and it will get from system memory. In the demo samples it wrote.

Solovyov Yuriy's picture

One more question, how can I edit or delete my previous post? :)) I made a tupo.

Solovyov Yuriy's picture

Ok, I've tested demo framework, example: PhysicsInSeparateThreadDemo. I change nothing, only add my time-stemp function and set number of ragdolls to 20 and max number of threads to 3.

Results are:

1) "20 ragdolls, single threaded", 20, 0, 0, detail: --- delay = 3 ms

2) "20 ragdolls, separate physics thread", 20, 1, 1, detail: --- delay = 5.5 ms

3) "20 ragdolls, separate multiple physics threads", 20, 3, 1, detail: --- delay = 6 ms

4) "20 ragdolls, separate multiple physics threads -- all threads running initially", 20, 3, 3, detail: -- delay = 23 ms

I alsow add my own mode:

5) "20 ragdolls, separate multiple physics threads 2 ", 20, 2, 1, detail: --- delay = 5 ms

6) "20 ragdolls, separate multiple physics threads 2 ", 20, 2, 2, detail: --- in this mode delay changes randomily from 4.8 ms to 14 ms from step to step.

Solovyov Yuriy's picture

I found problem. I compile demo example in Debug Multithreaded configuration. When I change it to Release Multithreaded it looks like all is allright.

But why in Debug mode last example

"50 ragdolls, separate multiple physics threads -- all threads running initially", 50, 6, 6, detail

is extremely slow and take much more memory than in Release mode? And what should I do in my engine? I can't work without debug.

jason.turbin's picture

Hi Yuriy,

Good one on moving to a release build configuration. The reason why Havok is much slower in debug vs. release is because of additional debug only mulithreading access checks. There is some information on these in the documentation here:

Havok Physics Multithreading Synchronization Issues

and in the forums here:

http://software.intel.com/en-us/forums//topic/59801

There is more information in the forum post than in the docs for this issue.

In relation to the slow down, the reason why it is so much slower in debug is because the ::markForRead and ::markForWrite calls use the hkMultiThreadCheck class internally. This class uses a critical section. The use of the critical section makes things significantly slower.

Does this help explain things?

Thanks,
Jason

zbychs's picture

The problems with timing can be caused by the use of the rdtsc instruction. It is not a good way to get reliable time information (due to: many cores in your CPU, power management, etc., see here: http://en.wikipedia.org/wiki/RDTSC).
On Windows you could use QueryPerformanceCounter() instead.

Solovyov Yuriy's picture

Thanks Jason. :)

Login to leave a comment.