Fast ISPC Texture Compressor - Update

Download Code Sample

This article and the sample code project were written by Marc Fauconneau Dufresne at Intel Corp.

Updated 4/12/2016
This update adds support for RGBA ASTC compression.

Updated 8/26/2015
This update adds high-quality ETC1 and ASTC compression to the fast ISPC texture compression sample. For ASTC compression we only support RGB 2D LDR inputs for now. Block sizes up to 8x8 are supported (not 10x5 and 10x6). SIMD instruction sets are exploited using the  Intel SPMD Compiler. The following graph shows the performance/quality tradeoff offered compared to astcenc on the Kodak dataset with 6x6 blocks. Quality similar to astcenc's "fast" preset is achieved, but 44 times faster.

Figure 1 - Performance and quality vs. astcenc. Note the x-axis is using a logarithmic scale.


Updated 5/13/2014
This sample extends our state of the art BC7 Texture compressor with BC6H (DX11 HDR texture format) support. We use a similar approach to quick partition filtering, selection and refinement, along with an effective search strategy to take advantage of the B6CH endpoint transform. SIMD instruction sets are exploited using the Intel SPMD Compiler. Various quality/performance trade-offs are offered.

Figure 2ISPC Texture Compressor Update - 5/13/14

Original (11/5/2013)
This sample demonstrates a state of the art BC7 (DX11) Texture compressor. BC7 partitioning decisions are narrowed down in multiple stages. Final candidates are optimized using iterative endpoint refinement. All BC7 modes are supported. SIMD instruction sets are exploited using the Intel SPMD Compiler. Various quality/performance trade-offs are offered.

Figure 3 - Original ISPC Texture Compressor - 11/5/13

For more complete information about compiler optimizations, see our Optimization Notice.


Christopher M.'s picture

Hi Marc-

Thanks very much for creating this, it's been very helpful to us.

We have discovered that the compression results sometimes vary based on which SIMD instruction set is available on the processor, specifically that AVX2-compatible processors can generate different results than pre-AVX2 processors.  Is that expected?


Marcin Piaskiewicz's picture

Is there a possibility to add BC4 and BC5 formats as well?

Vatra, BogDan's picture


Thanks a lot for sharing with us this tool!

Are you consider to create a git(hub) repo, where people will be able contribute? e.g. I'd like to add linux support.

MARC F. (Intel)'s picture

Steve, thanks for the report.

The non-determinism was actually due to a bug down in ep_quant1. It'll be fixed in the next update (coming in a few days).

Steve M.'s picture

I was able to fix the non-determinism by clearing the ep[] array to zero here:

float bc7_enc_mode01237_part_fast(int qep[24], uint32 qblock[2], float block[64], int part_id, uniform int mode)
    uint32 pattern = get_pattern(part_id);
    uniform int bits = 2;  if (mode == 0 || mode == 1) bits = 3;
    uniform int pairs = 2; if (mode == 0 || mode == 2) pairs = 3;
    uniform int channels = 3; if (mode == 7) channels = 4;

    float ep[24];
    for ( uniform int i = 0; i < 24; i++ )
        ep[i] = 0.0f;

I found it by narrowing down the settings until I knew that mode 1 was nondeterministic. My clue from there was that block_segment_core() skips over values in the ep[] array, and that didn't look very safe, without digging too much into the rest of the algo:

    for (uniform int i=0; i<2; i++)
    for (uniform int p=0; p<channels; p++)
        ep[4*i+p] = ext[i]*axis[p]+dc[p];

I've tried to exercise the different modes with various test textures, and I haven't found any more nondeterminism.

The way I tested for nondeterminism was by allocating a 256k array off the stack, filling it with random numbers, deallocating it, then compressing the texture single-threaded, and comparing to the previous compressed results.

Hope this helps someone avoid wasting as much time as I did ;)

Steve M.'s picture

Hi, Marc! 

First of all, thanks for putting out such a useful piece of software!

I've noticed an issue with the results of the compression not being deterministic, and it's the state of the stack that affects the results. It seems like an uninitialized stack variable is being used somewhere, but I can't see where. Let me know if you'd be willing to discuss. 



Dimi C.'s picture


what dxsdk do i need to compile this?


Raja B. (Intel)'s picture

For a comparison with other compressors, check out 

chkone's picture


Personnally I directly use ispc_texcomp dll without the rest.

May be something is wrong with the archive. On line 510 of processing.cpp I have:

    // Create a shader resource view for the error texture.
    errorSRVDesc.Format = errorTexDesc.Format;
    errorSRVDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
    errorSRVDesc.Texture2D.MipLevels = errorTexDesc.MipLevels;
    errorSRVDesc.Texture2D.MostDetailedMip = errorTexDesc.MipLevels - 1; // <==== LINE 510
    V_RETURN(device->CreateShaderResourceView(errorTex, &errorSRVDesc, errorSRV));

When I open the solution I have diferent breakpoint already setted. It is only in my side?



Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.