Device Fission Segfault

Device Fission Segfault

I'm using the latest Intel Linux (Ubuntu) 64-bit SDK version 1.5 on my Q9500. I also have the AMD SDK installed.

When I initialise my environment I run the following code in order to fission my device:

    if(bFission)
    {
        fn_clCreateSubDevicesEXT = 
            (clCreateSubDevicesEXT_fn)clGetExtensionFunctionAddress("clCreateSubDevicesEXT");
        
        if(NULL == fn_clCreateSubDevicesEXT)
        {
            cerr << "Fission not supported!" << endl;
        }
        cl_uint iOut = 4;
        
        // using m_cldDevices we need to fission it using clCreateSubDevicesEXT
        m_cldFissionDevices = (cl_device_id*)malloc(sizeof(cl_device_id)*5);
        const cl_device_partition_property_ext properties[] = { CL_DEVICE_PARTITION_BY_COUNTS_EXT, 1, CL_PROPERTIES_LIST_END_EXT};
        
        // create an array of sub-devices
        ciErrNum = fn_clCreateSubDevicesEXT(m_cldDevices[0], 
                                 properties,
                                 4,
                                 m_cldFissionDevices,
                                 &iOut);
            
        memcpy(m_cldDevices[0], m_cldFissionDevices[0], sizeof(cl_device_id));
        errCheck(ciErrNum, "Fissioningn");
    }

I copy my fissioned device into my regular devices array so as to not have to change any other parts of the code. Is this allowed? It seems to work fine :s
The whole code works fine until I reach the end of my execution whereby I run the following:

    clFinish(m_commandQueue);
    if(m_kernel != NULL)
        clReleaseKernel(m_kernel);
    if(m_commandQueue != NULL)
        clReleaseCommandQueue(m_commandQueue);

    if(m_cpProgram != NULL)
        clReleaseProgram(m_cpProgram);
    if(m_ctx != NULL)
        clReleaseContext(m_ctx);

After which (not during) it segfaults. This occurs only on the Intel SDK and not on the AMD SDK. Also, it doesn't occur if I switch device fission off and use the regular functionality.

I've had a look in valgrind which claims that there are conditional jumps or moves that depend on unitiliased values (this is general, not fission ext specific). Also some overlaps in memory copys in clCopyMemoryRegion. Is this my error or SDK error? Do you test the SDK under valgrind?

It was unable to find the location of my segfault though I have a feeling it is to do with the overlapping memcpy.

Cheers,

Jam

UPDATE: Just noticed the clReleaseDeviceEXT function in the documentation. This must be called in order to prevent the segfault in most cases however I still get intermittent segfaults.

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Best Reply

Hello,

Your code is fine (including copying cl_device_ids). The issue is known and will be fixed in a future release (i.e. the code you showed will stop seg-faulting on exit). In the meantime, you could try to artificially add a call to usleep before releasing the device, as the root cause is a data race between your main thread shut down and the worker thread execution on the sub-device.

Doron Singer

Doron,

Thanks for your response, I'll give it a go.

Cheers.

Inicie sesión para dejar un comentario.