Pipelining offloads

Pipelining offloads

I have a basic question. Suppose I offload the following three items ASYNCHRONOUSLY to the same mic device from the same thread on the host.

1. Offload a bunch of data tied to a pointer v at the host (in clause)

2. Offload a function call one of whose arguments is the pointer v

3. Offload a data output from the mic device to the pointer v (out clause)

Is it correct to assume that the mic device does not start running the function in #2 until the data input to it in #1 is complete? Is it correct to assume that the mic device does not do the data output in #3 until the function call in #2 is complete?


4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

When you offload asynchronously you should not depend on previous offload completing before the current.  You need to use signal/wait to control the order.

Thank you --- that is what I expected.

The Intel compiler manual says the following about signal (under #pragma offload signal as well as #pragma offload_transfer signal):


An optional integer expression that serves as a handle on an asynchronous data transfer or computational activity. The computation performed by theoffload clause and any results returned from the offload using out clauses occurs concurrently with CPU execution of the code after the pragma. If this clause is not used, then the entire offload and associated data transfer are executed synchronously. The CPU will not continue past the pragma until it has completed.

This clause refers to a specific target device so you must specify a target-number in the target clause that is greater than or equal to zero.


Why does the documentation not refer to the in clauses as opposed to out? Can we assume that in clauses inputting data from host to device also occur concurrently?

I assume that the async transfers are done using DMA. If so, memory on the host needs to be pinned or registered to prevent memory from being paged out. At what point is the memory pinned and then unpinned?


Hi divakar,

Signal can be used for transfering data asynchronously from the device to the host as well. Please refer to "About Asynchronous Data Trannsfer" (http://software.intel.com/en-us/node/459120). In this section, code sample demonstrates how to transfer data to and from the coprocessor asynchronously. Thank you.

Leave a Comment

Please sign in to add a comment. Not a member? Join today