Intel® ArBB is based on the Intel® ArBB Virtual Machine (VM) which consists of a C89 Programming Interface (API). The API is small, self-contained, and complete. The VM semantics allows users to express parallel operations that are automatically mapped onto the underlying hardware mechanisms. These mechanisms include vector instructions, multiple cores, prefetching and streaming of data as common in throughput computing. The VM is able to dynamically generate code using a Just-In-Time (JIT) compiler. The VM API can be used to implement a new front-end language. For example, the Intel® ArBB C++ API is such a front-end language implemented on the top of the Intel® ArBB VM.
For the purpose of exercising the VM API, two simple code samples are written using the VM API directly. The examples illustrate the basic usage of the Intel® ArBB VM API. The source code of these examples can be downloaded (ZIP file) from the attachment of this article. The source code contains detailed inline comments to help understanding each of the steps. Below, a high level description of each of the examples is given.
In this sample we create a function that takes two scalar variables (arbb_f32 type) as input arguments, and then adds them up and returns the sum as an output argument.
- First, we define the data types for the parameters of the function. In this case, there are 2 input arguments and 1 output argument. All arguments are of the type arbb_f32.
- Then we define the function type, which consists of the function signature. Two separate arrays are used to collect the types of the input arguments, and the type of the output argument.
- Actual arguments are then referred by their index positions in the argument lists.
- Then the operation to be performed (arbb_op_add) by this function is issued.
- After we get a function, we compile it by invoking arbb_compile(), which initiates the JIT to compiles and optimizes the function for the underlying hardware.
- Before executing the function, input data is prepared by creating two arbb_f32 scalar variables. Actual values to the input are given using the function arbb_write_scalar().
- An arbb_f32 scalar is also prepared for the output. Then, this function can be executed by invoking arbb_execute().
- The value calculated is extracted from the output using arbb_read_scalar().
In this sample we create a function that takes two 1D dense containers as inputs, which are element-wise added to give result as a 1D dense container. The flow is similar to the previous sample, except for the argument types, which are 1D dense containers of arbb_f32 scalars. Two ways are shown to manage and prepare input and output data. These variants are corresponding to the bindinterface and the memory mapping interface. The latter is also known as the range interface.
To use the bind interface, the input and the output containers are bound to existing memory buffers, which in this case are C++ STL vectors. The flow is as below:
- Create a binding object for each input/output container using arbb_create_dense_binding(). A binding object must contain information such as the address of the underlying C++ vector, the dimension and the size of the container.
- Create a 1D dense container variable for each input/output container arguments. Then, associate this variable to a corresponding binding object.
- Initialize the input containers by writing values to the underlying memory buffer.
- Execute the function using arbb_execute().
- The result is extracted by reading the memory buffer bound to the output 1D dense container.
- The binding objects created in the first step can now be freed up and given back to the system.
To use the memory mapping interface, the input and the output containers are allocated and managed by the virtual machine:
- Create a 1D dense container variable for each input/output container arguments using a null binding object.
- Allocate each input/output container with a proper size using VM operations (arbb_op_alloc).
- Map each properly sized and allocated input/output container into host address space using arbb_map_to_host(), and get back a pointer to the mapped buffer, which is valid in the application space.
- Initialize the input containers by writing to the mapped buffer.
- Execute the function using arbb_execute().
- Access the output container by reading from the mapped buffer.
These two samples are a good starting point to understand the ArBB VM API. One can look into the Intel® ArBB C++ API to inspect a complete implementation of a C++ front-end. The C++ API is a header-only library entirely based on the VM C89 API. Also, see this article for more introductory materials.