I would like to try replacing the memory allocator with TBB's scalable memory allocator as detailed here:
I would like to do this for allocations in offload regions. This is on windows. What I've tried:
to the link line. This clearly only affects the host allocations
to the offload linker options. I also had to copy the .so's to the MIC and put them in /usr/lib64.
Memory allocation still seems to be slow, although I'm not sure how to definitively tell if I'm actually using the TBB allocator. Is there anything else that I need to do?