does openCL support cache prefetch?
I have a kernel that calculates motion vectors with fullsearch and mse. There is weird performance issues with the following loop:
LAMMPS is an open-source software package that simulates classical molecular dynamics. As it supports many energy models and simulation options, its versatility has made it a popular choice. It was first developed at Sandia National Laboratories to use large-scale parallel computation. As multi-core is now ubiquitous compared to when LAMMPS was first developed 20 years ago, LAMMPS is perfect for optimizing.
Hi, just seeing slides of Intel IWOCL presentations I see three interesting ones:
genFFT a FFT OpenCL library for Intel GPUs up to 2x faster than clFFT from AMD: any info on release date, if will be open source, etc..
GPU daemon: about using persistent kernel on iGPU for faster kernel launching.. some code on how to implement is on slides but a source code sample/tutorial released on a Intel blog post will be better
I would like to port a GCN-optimized 1.2 kernels to run on latest Intel GPUs.
Are there any general guidelines I should be following?
For example, on GCN, if work group size is smaller than 64 (wave front size), then it is possible to dispense with
memory barriers, since work items will never be executed more than one Compute Unit. Does this apply to Intel GPU?
Any other things to keep in mind?
Also, in terms of install base, what version of HD Graphics GPU is the most common in the field?
And which CPUs have these GPUs?
I run Intel OpenCL GEMM Sample for Linux ( dwonload from bottom of https://software.intel.com/en-us/articles/sgemm-for-intel-processor-graphics), and make it successfully, but run it failed. The error message is:
In our project we mix Microsoft .NET code with native code, and we're trying to speed up areas using OpenCL. Here is a block of code I'm working on:
Here's a pie-in-the-sky request for enhancement... :)
It would be useful if we could create OpenCL/SPIR-* kernels that mapped one workitem to a single EU hardware thread and exposed the thread's entire 4KB/thread GRF.
Add to this a "Gen Native" OpenCL extension that exposed register-regionable explicit SIMD operations.
I happen to the problem that processing a 7360px*4912px image with the intel OpenCL SDK leads to "GPU stop response!" , but using the same code in the AMD GPU and NVIDIA GPU can process successfuly. why?
The follow picture is parameter of my computer. whether it is that my memery of GPU is too low?