TBB version: 2018 initial release
I am using OpenCV library compiled with TBB threading framework though python interface. After updating Kubuntu 16.04 kernel from 4.10 to 4.13 version, my code stopped working with segfault right in "import cv2" line. I tried to recompile OpenCV with TBB having new kernel ending up with the same problem. After recompiling with OpenMP instead of TBB problem disappears. It seems that 2018 Update 2 has the same problem.
I am working on a socket application, my app is receiving very fast packets and i have to process and replay each packet within specific time (say 100 milliseconds). i am adding every packet to a queue, a thread is picking a packet and executing a tbb::task to process packet. i have 16 cores, and not able to process all packets in given time.
my question is can i change task priority to high or kill task which is not started in 50 ms and execute new task ?
what i am doing in my queue processing thread is:
The source archive at https://github.com/01org/tbb/archive/2017_U8.tar.gz is wrong. It contains tbb2017_20161128oss and I confirmed that its tbb_stddef.h has TBB_INTERFACE_VERSION 9103 which 2017 Update 8 should be 9108.
I was wondering if anybody had suggestions on how to implement a summed area table with Intel TBB. The general idea of the algorithm:
1. Given an input, do an independent (inclusive) prefix scan on every row. Call this Intermediate.
2. Transpose Intermediate, call this IntermediateTranspose.
3. Do step (1) again, only do an inclusive prefix scan on every row of IntermediateTranspose. Call this OutputTranspose.
4. Transpose from (3) OutputTranspose -> Output.
Here is my attempt to benchmark the performance of intel tbb flow graph. Here is the setup:
- One broadcast node sending continue_msg to N successor nodes (broadcast_node<continue_msg>)
- Each successor node perform a computation that takes t seconds.
- The total computation time when performed serially is Tserial = N* t
- The ideal computation time if all cores are used is Tpar(ideal) = N * t / C, where C is the number of cores.
- The speedup is defined as Tpar(actual) / Tserial
- I tested the code with gcc5 on a 16 core PC.
I am using tbb::concurrent_unordered_map to replace std::map in my program like this:
Before: class KvSubTable; typedef std::weak_ptr<KvSubTable> KvSubTableId; std::map<KvSubTableId, int, std::owner_less<KvSubTableId> > mEntryMap;
Now, I use tbb::concurrent_unordered_map to replace std::map , but it has some compile errors:
Recently I found a microbenchmark on performance of different implementations of concurrent hash maps at https://le.qun.ch/en/blog/sharding/, where the test results are repeatable on my machine. I was wondering why simple std::unordered_map with std::shared_map outperforms tbb::conrurrent_hash_map? Is there any pitfall in that KVIntelTBB implementation?
Intel has already provided the source code for this. But I assume there is some issue with code in the file named "test.cpp" at line number 276 where it says.Severity Code Description Project File Line Suppression State -> Error name followed by "::" must be a class or namespace name
Here is the link to get the source code
Can anyone help me fix this issue?