- What are the benefits of Intel® Array Building Blocks (Intel® ArBB)?
Intel ArBB's data-parallel capabilities provide an array of benefits to developers:
- Forward-scaling: Intel ArBB lets applications span multi-core and many-core processors without requiring developers to rewrite programs over and over. The benefits of only writing and debugging code once are substantial.
- Safety: Intel ArBB helps prevent parallel programming bugs such as data races and deadlocks. Intel ArBB guards against these problems enabling developers to code their algorithms at a high level and abstracting details pertaining to low-level optimization. Intel ArBB optimizes the application across all available cores, while preventing the developer from introducing parallelism errors (such as data races and deadlocks).
- Ease-of-use: Intel ArBB extends C++ and is compatible with standard C++ compilers.
- Can you do a "pre-JIT" (a priori Just-in-Time Compilation) for a specific platform to alleviate any overhead that may occur? How should one handle the case if they are only going to deploy on one (known) architecture every time?
Currently, Intel ArBB only supports dynamic compilation at runtime. However, unlike a traditional Just-In-Time (JIT) compiler, Intel ArBB allows the application to control when compilation occurs, so that it can be placed outside of main application loops. In fact, JITted code can be managed explicitly through objects called closures. . If an Intel ArBB function is to be invoked multiple times then capture a closure early on in the program. The JIT overhead will be amortized over the multiple invocations.
We do not at this time support storing a compiled closure to disk or some other form of storage, and then reloading it later. We are investigating ways to provide this functionality in future updates.
- What makes a language to be a candidate for the Language frontend for the Intel ArBB?
Any tool or library that has a need for high performance code generation and execution on the devices supported by Intel ArBB is a good candidate for using the Intel ArBB Virtual Machine (VM). The programming model exposed by the virtual machine is the same as that exposed by the C++ frontend, so the same kinds of algorithms that might be able to leverage Intel ArBB through the C++ frontend can be mapped easily to the VM. Because the VM uses a plain C API, it can be targeted from any language that supports calls to plain C functions, which includes most current scripting languages
The key to the Virtual Machine API is that any language is a "good candidate" - even domain specific languages. Rather than specifically creating language front ends for customers, we are doing something much more powerful - enabling anyone to create their own front end for Intel ArBB by being able to write directly to our runtime. We need their collaboration to determine what aspects of their favorite API they would like to have a high level of abstraction for parallelization and vectorization for. We are seeking to work with all language implementers to create new frontends for Intel ArBB. Please let us know if anyone you talked to is interested in talking more with our architects about this.
Intel is currently focusing on C++ support for Intel ArBB. Other languages may be supported in the future depending on customer demand and other factors. The Intel ArBB team strongly encourages you to experiment with your own frontends using the VM.
- When will GPU support be available? MIC support?
An alpha version of MIC support should be available by the end of 2010. We are planning to support other hardware platforms, including co-processors such as GPUs, in the future.
- Can I use Intel ArBB on AMD* CPUs?
Yes. Intel ArBB will run code on AMD* processors using a variety of techniques which may include use of Intel® Streaming SIMD Extensions (Intel® SSE), Intel® SSE2, and Intel® SSE3 instruction sets and other architecture features which are compatible to Intel processors.
- Is there any Intel ArBB Virtual Machine (VM) specification available now?
You can download the VM specification here.
- How does it work on a Virtualized environment? Like VMware/Xen etc.
We are not currently running specific tests to ensure Intel® Array Building Blocks operates in virtualized environments, but there is no technical reason to believe it should not work as well as any other application in these environments. Intel® ArBB will target whatever processor is presented to the system through the virtualization layer, and so performance will heavily depend on the virtualization layer being used.
- What happens if you have heterogeneous cores/current multi-core CPU + MIC cards, how does the VM distribute it?
There will be different stages of heterogeneous support for Intel ArBB for MIC. Upon the first release, you will be able to code as if you are programming on regular multicore but specifically target the MIC card exclusively through an environment variable.
The virtual machine does not currently distribute a single VM function (C++ frontend closure) invocation across multiple separate devices. We are focused on making a good choice at a per-function granularity and adding user controls to allow applications to choose where particular computations are executed. We may in the future pursue more automated ways of balancing loads across multiple devices, but this is still an open (and difficult) research topic.
- Does the runtime or VM to be installed when you run the built Intel ArBB binary on a client machine?
The VM is part of the Intel ArBB runtime, which is a dynamic library loaded at the run time. This dynamic library must be installed on the client machine and can be found by the loader.
- Intel ArBB looks similar to Brook+. How do you compare? Why should I use Intel ArBB over Brook+?
Brook is an academic project originally from Stanford University. It was an early project that explored high-level ways to program devices like GPUs. Brook+ is an AMD* project that has now been open-sourced and switched to being a "community driven" project [http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=117001]. Brook and Brook+ both use source-to-source translation rather than dynamic compilation, and have some programming model differences as well. AMD is now supporting the OpenCL standard, which is also supported by Intel. OpenCL provides a low-level programming interface to accelerators and multi-core CPUs.
- What are the similarities and differences between Intel ArBB, OpenCL, and CUDA*?
OpenCL and CUDA are two separate languages that require separate compilers. OpenCL provides a low-level programming interface to accelerators and multi-core CPUs. Unlike Array Building Blocks, lower-level solutions like CUDA and OpenCL do not provide safety guarantees such as determinism provided by Intel® Array Building Blocks, and require developers to tune to specific hardware platforms for performance.
- Will I have to learn a new language?
No. Intel ArBB is not a new language. Intel ArBB is a programming model that introduces implicitly parallel operators on new aggregate data types. Intel ArBB is implemented in standard C++, and is backed by a runtime library which generates code that simultaneously takes advantage of both SIMD and threaded parallelism
- So one has to convert portions of their code to ArBB in order for it to be optimized at runtime. How does ArBB compare with other C++ libraries?
Using and learning Intel ArBB is no different from using and learning other C++ libraries. Developers need to become familiar with the types and functions provided by the library. Intel ArBB provides a small set of types to learn together with a simple programming model that is no more difficult to explain than the behavior of a typical library function.
Converting existing C++ code to use Intel ArBB can involve some level of mechanical translation. Depending on the application, there may be other changes involved in switching to any high-level parallel model, such as choosing more appropriate algorithms. These higher-level changes are more likely to take up a more significant portion of development time than the mechanical changes. Regardless, the Intel ArBB team is looking for ways to make the translation more efficient. You may also be interested in Intel's other compiler-based offerings that require fewer such mechanical changes, such as Intel® CilkTMPlus included in Parallel Composer 2011.
- Will I have to rewrite my whole application to use Intel ArBB?
Developers target only the computationally intensive kernels in their application with Intel ArBB.
- Will I need to rewrite portions of my application for each target platform?
No. Intel ArBB provides forward scaling via its dynamic, platform-aware runtime allowing the application to be run on a variety of machines with different core counts, different cache sizes, different memory models, and even different SIMD widths in a heterogeneous environment.
- How can I ensure that my application thread pool and the Intel ArBB thread pool are not competing for resources?
Developers can specify the maximum number of threads that Intel ArBB uses. Unlike other parallelism solutions, Intel ArBB dynamically adapts to changes in the availability of underlying thread resources. Intel ArBB uses the Intel TBB runtime as its underlying threading runtime, ensuring Intel TBB and Intel Cilk Plus interoperability. The Intel TBB scheduler ensures that resources are not oversubscribed.
- Will Intel ArBB have Mac OS X* support?
We are not disclosing future plans for Mac OS at this time.
- Will Intel ArBB be open-sourced like Intel® Threading Building Blocks (Intel® TBB)?
We are exploring open source options but have no specific plans to announce at this time. We welcome customer input and are committed to providing an open solution in some form.