We are implementing an SDK with both C++ and Python interface.
For performance reasons we had to link them with tbb_malloc_proxy.lib
For the C++ interface the performance increased by 5x.
while for the Python interface it didn't make any difference byt further investigation with Intel Amplifier we found that the python still uses the ntdll.dll not the tbb_malloc_proxy.dll.
I did build the python interpreter against tbb_malloc_proxy.lib and now it works as fast as the C++ interface.
Can I inject the tbb_malloc_proxy.dll with the standard python on windows 10 ?
I tried the appinit_dlls but it didn't work for me but may be I did something wrong
1. I did copy the tbb dlls to the system32,
2. I changed the registery key
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs with the names of the dlls (space separated)
3. I did change the registery key HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows\LoadAppInit_DLLs to 1
Any other suggestoins will be appreciated