Once you get reproducible performance numbers, you need to choose what to optimize first.
First, make sure your general application logic is sane. Refer to the Application-Level Optimizations chapter of this document.
OpenCL™ Code Builder offers a powerful set of Microsoft Visual Studio* and Eclipse* plug-ins for “Build/Debug/Profile” capabilities. Most important features it offers are:
- OpenCL debugging at the API level, so you can inspect a trace of your application for redundant copies, errors returned by OpenCL APIs, excessive sync, and so on.
- Also it offers rich features for kernel development in OpenCL language like offline OpenCL language compilation with cross hardware support, Low Level Virtual Machine (LLVM) and assembly language viewer.
- Finally, the tool features OpenCL kernels debugging and performance experimenting with running kernels on a specific device without writing a host code.
Intel® Graphics Performance Analyzers (Intel® GPA) is a set of tools, which enable you to analyze and optimize OpenCL execution (by inspecting hardware queues, DMA packets flow and basic hardware counters) and also rendering pipelines in your applications.
Second step is optimization of the most time-consuming OpenCL kernels. Your can perform simple static analysis yourself, for example: inspect kernel code with a focus on intensive use of heavy math built-ins, loops, and other potentially expensive things.
But when it comes to the tools-assisted analysis, Intel® VTune™ Amplifier XE is most powerful tool for OpenCL optimization, which enables you to fine-tune you code for optimal OpenCL CPU and Intel Graphics device performance, ensuring that hardware capabilities are fully utilized.