opencl_node (or streaming_node) with more outputs than inputs

opencl_node (or streaming_node) with more outputs than inputs

Hi

Is there a way to make an opencl node (or custom streaming_node) which has more output ports than input ports.

I have tried, but I cannot seem to get the graph to execute, as it wants me to call try_put() on the output ports as well before executing.

I have this example, which doesn't work:

    graph g;

    gpu_device_selector gpu_selector;

    opencl_program<> program("myclprogram.cl");

    opencl_node< tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>> > myopenclnode(g, program.get_kernel("clCopy2"), gpu_selector);

    join_node < tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>>> join_node(g);

    function_node< tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>> > myOutputWriter(g, unlimited, [](const tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>>& input) {
        opencl_buffer<cl_uchar> buffer1 = std::get<0>(input);
        opencl_buffer<cl_uchar> buffer2 = std::get<1>(input);

        printf("'%s' '%s'\r\n", buffer1.data(), buffer2.data());
    });

    make_edge(output_port<1>(myopenclnode), input_port<0>(join_node));
    make_edge(output_port<2>(myopenclnode), input_port<1>(join_node));

    make_edge(join_node, myOutputWriter);

    const char str[] = "Hello world";
    opencl_buffer<cl_uchar> a(sizeof(str));
    std::copy_n(str, sizeof(str), a.begin());

    opencl_buffer<cl_uchar> b(sizeof(str));
    opencl_buffer<cl_uchar> c(sizeof(str));

    myopenclnode.set_range(std::deque<int>{sizeof(str)});
    myopenclnode.set_args(port_ref<0>(), b, c);

    input_port<0>(myopenclnode).try_put(a);

    g.wait_for_all();

The kernel just copies argument 1 to argument 2 and 3

However, the kernel is never executed in this example.

If I do a try_put() on inport_port<1> and <2>, it works fine.

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Nikolaj,

Implementation of opencl_node waits for the input on each input port before starting execute the kernel.

Let's try to understand use case in a bit more detail. Since it copies the first argument to the second and the third, the memory for the last two arguments should also be provided somehow, right. Otherwise, from where the node "understands" where to copy the data coming from the first parameter? The call to "try_put" to all of its ports is actually the way to "tell" the node about all the memory necessary to execute its encapsulated kernel.

If you have the use case where the described logic does not apply please tell us the details so we can better understand it and discuss.

Regards, Aleksei

Leave a Comment

Please sign in to add a comment. Not a member? Join today