Is this even possible using the pipeline class? It's not immediately obvious to me, how to extend the sample code provided in the documentation for the thread_bound_filter to work with multiple filters.
I have a very simple use case: a pipeline with three filters -- a source filter (serial), a transformation filter (parallel) and a sink filter (serial, but can be out of order). In my case I run the pipeline on an 8-core machine with the number of tokens in flight set usually to something between 64 and 128. I'd like to to restrict the execution of source and sink filters to the 'main' thread.
If I implement the sink as a thread_bound_filter, following the example in the docs, it all seems to work fine, with the exception of the source filter being executed on other threads (but still serialised). Looking at the flame chart that represents the pipeline run, I can see work being nicely distributed among child threads, with high CPU utilisation, just as I would expect.
Now, if I try to turn the source filter into another thread_bound_filter (and instead of testing the result of sink.process_item() in a loop, testing results of source.try_process_item() and sink.try_process_item()), I can see that both source and sink do their work on the main thread (as desired), but the transform filter does all its work on a single child thread (it seems to completely kill the parallelism).
Is there a way to do it properly using the pipeline? Am I missing something obvious? Or should I try something else?