Módulos Intel® de subprocesamiento

tbb::task priority and kill task


I am working on a socket application, my app is receiving very fast packets and i have to process and replay each packet within specific time (say 100 milliseconds). i am adding every packet to a queue, a thread is picking a packet and executing a tbb::task to process packet. i have 16 cores, and not able to process all packets in given time. 

my question is can i change task priority to high or kill task  which is not started in 50 ms and execute new task ?

what i am doing in my queue processing thread is:


packet* p=q.pop();

2D prefix scan (summed area table)



I was wondering if anybody had suggestions on how to implement a summed area table with Intel TBB.  The general idea of the algorithm:


1. Given an input, do an independent (inclusive) prefix scan on every row.  Call this Intermediate.

2. Transpose Intermediate, call this IntermediateTranspose.

3. Do step (1) again, only do an inclusive prefix scan on every row of IntermediateTranspose.  Call this OutputTranspose.

4. Transpose from (3) OutputTranspose -> Output.


Intel tbb flowgraph speedup

Here is my attempt to benchmark the performance of intel tbb flow graph. Here is the setup:

- One broadcast node sending continue_msg to N successor nodes (broadcast_node<continue_msg>)

- Each successor node perform a computation that takes t seconds.

- The total computation time when performed serially is Tserial = N* t

- The ideal computation time if all cores are used is Tpar(ideal) = N * t / C, where C is the number of cores.

- The speedup is defined as Tpar(actual) / Tserial

- I tested the code with gcc5 on a 16 core PC.

How to use weak_ptr as key in tbb::concurrent_unordered_map?

I am using tbb::concurrent_unordered_map to replace std::map in my program like this:


class KvSubTable;
typedef std::weak_ptr<KvSubTable> KvSubTableId;
std::map<KvSubTableId, int, std::owner_less<KvSubTableId> > mEntryMap;

Now, I use tbb::concurrent_unordered_map to replace std::map , but it has some compile errors:

concurrent_hash_map: Bad performance compared to std::unordered_map with shared_lock


Recently I found a microbenchmark on performance of different implementations of concurrent hash maps at https://le.qun.ch/en/blog/sharding/, where the test results are repeatable on my machine. I was wondering why simple std::unordered_map with std::shared_map outperforms tbb::conrurrent_hash_map? Is there any pitfall in that KVIntelTBB implementation?


Having issues compiling code of A Parallel Stable Sort Using C++11 TBB

Intel has already provided the source code for this. But I assume there is some issue with code in the file named "test.cpp" at line number 276 where it says.Severity Code Description Project File Line Suppression State  -> Error name followed by "::" must be a class or namespace name 

Here is the link to get the source code


Can anyone help me fix this issue?


opencl_node (or streaming_node) with more outputs than inputs


Is there a way to make an opencl node (or custom streaming_node) which has more output ports than input ports.

I have tried, but I cannot seem to get the graph to execute, as it wants me to call try_put() on the output ports as well before executing.

I have this example, which doesn't work:

Suscribirse a Módulos Intel® de subprocesamiento