The recent advances in machine learning and artificial intelligence are amazing! It seems like we see something groundbreaking every day, from self-driving cars, to AIs learning complex games. Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure.
We are using the Intel® TBB concurrent container concurrent_unordered_map (as an outer level) and concurrent_vector (as the inner level) to create a hashmap, allowing concurrent fetching and growth. However, when using large amount of memory (>500G, the Linux machine has ~1T RAM space), the free operation brings segfault as follows:
We are using the tbb2017 Update 5 and would like to know what is the binary compatibility for versions provided under gcc4.1 gcc4.4 and gcc4.7 along with what glibc versions they shall be used. Are they Linux OS specific?
I have an application in which my application thread spawns an std::thread at the beginning of the program. I define two task_arena and two task_group that are shared by the two master threads of my application. I want the first thread to use the first arena and first group and the second thread to use the second arena and second group.
For the moment my code looks like that:
I have a set of data blocks that I process using a parallel_for loop. These data blocks are held in a pool that may be compressed. The first thread to access a block that is in the compressed pool triggers an uncompress routine.Now, I have a mutex that ensures that the uncompress routine is only executed by one task thread. But the uncompress routine uses its own parallel_for loop to speed up the decompression. When the inner parallel_for loop ends, control doesn't go back to the parent task that started the uncompress routine.
I started investigating Intel TBB recently and was thinking of the possibility of implementing an application specified as a Synchronous DataFlow Graph using function and queue nodes. I seems to me doable in a straightforward manner. Could someone confirm? Any thoughts?
I have looked through the forums and other TBB resources and based on vtune I can see my program is spending a lot of time spinning but I have not found out where it is spinning yet.
I have parallel studio and would appreciate any advice on how to find out where the program is spinning so I can fix it. Overall it seems my parallelization is not very well balanced and I am trying to figure out where the problems are.
I remember reading somewhere that if you link TBBMalloc or potentially use Scalabale Allocator, TBB will pre-allocates some amount of memory per thread to avoid implicit synchronizations. But i can't find this any more. I thought I found this in TBB Book but looks like it wasn't.
Is there any per thread preallocation happens in Scalable Allocator or in TBBMalloc?