Are there *any* circumstances that will implicitly allocate shared local memory?

I'm curious if there are any circumstances that will result in an implicit increase in a kernel workgroup's shared memory requirements?

For example, do the workgroup (or subgroup) functions like scan or reduce quietly "reserve" SLM?

If there are any circumstances where this might happen on SB, IVB, HSW or BDW then could you list them?


Decoding-opencl-encoding pipeline

I am working on Decode-OPENCL-Encode pipeline on intel processor. There is a sample code provide by intel for media interop which is attached.

I am integrating the encoder into same.

If we look at the DecodeOneFrame() function below: 

mfxStatus CDecodingPipeline::DecodeOneFrame(int Width, int Height, IDirect3DSurface9 *pDstSurface, IDirect3DDevice9* pd3dDevice)
    mfxU16 nOCLSurfIndex=0;

    mfxStatus stsOut = MFX_ERR_NONE;
    if(m_Tasks[m_TaskIndex].m_DecodeSync || m_Tasks[m_TaskIndex].m_OCLSync || m_Tasks[m_TaskIndex].m_EncodeSync)

Introducing Batch GEMM Operations

The general matrix-matrix multiplication (GEMM) is a fundamental operation in most scientific, engineering, and data applications. There is an everlasting desire to make this operation run faster. Optimized numerical libraries like Intel® Math Kernel Library (Intel® MKL) typically offer parallel high-performing GEMM implementations to leverage the concurrent threads supported by modern multi-core architectures. This strategy works well when multiplying large matrices because all cores are used efficiently.

  • Desarrolladores
  • Socios
  • Profesores
  • Estudiantes
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Unix*
  • Windows*
  • C/C++
  • Fortran
  • Avanzado
  • Principiante
  • Intermedio
  • Intel® Math Kernel Library
  • Intel Math Kernal Library (Intel MKL)
  • Herramientas de desarrollo
  • Optimización
  • Computación en paralelo
  • Need help: I get unexpected results using opencl 2.0 atomics on HD5500?


    I am trying opencl 2.0 atomics on HD5500, following the


    But I find the atomic operations result is not as expected.     The simplified version test is:

    kernel void atomics_test(global int *output, volatile global atomic_int*  atomicBuffer, uint iterations, uint offset)
        for (int j = 0; j < MY_INNER_LOOP; j++)

    Suscribirse a Profesores