It would be difficult to name an area of science that didn’t apply a Fourier transform somewhere. Fourier’s genius discovered a unique instrument linking the continuous and discrete worlds. The effectiveness of this instrument has been proven in theoretical terms as well as in actual practice. A Fourier tranform allows us to see our problem from a new, clearer prospective.
Unfortunately, such a lyrical approach is unable to describe the substance of the matter. Therefore let us come to earth and try to understand the advantages of applying the discrete cosine transform, as a special case of the Fourier transform, to solve our problem.
Let’s recall the transform we generated in Part II by means of heuristic reasoning:
It satisfies all the requirements and works just fine. However, it would be hard to call it efficient, if only because we never intended to achieve efficiency.
Therefore let’s use the experience of our predecessors, who established decades ago that the discrete cosine transform is one of the most efficient orthogonal transforms.
So, how is the discrete cosine transform related to the de-correlation problem?
Let us recall that in generating the above mentioned transformation we determined special requirements for the basis vectors. The first basis vector was to consist exclusively of positive values and strengthen the energy, while the others were to vary their signs in order to produce nearly zero energy. All transformation coefficients had a module of one for a simple reason: the transformation had to be independent of input data, and by picking best coefficients for one group of data we would at the same time impair the result for a different group.
Let’s look at the basis vectors of the Fourier discrete cosine transform. Here they are, the cosines:
Doesn’t it remind you of anything? The pattern of each basis vector repeats the pattern of the vectors from our simple transform. The first vector is a positive constant. The others vary their signs periodically. We just need to define eight points on each vector and to generate the target 8x8 matrix based on their values.
Let us determine the abscissas for the given sine curves:
Follow the waves and calculate the values:
Why are these particular abscissa values selected? The answer is related to one of the requirements to the nature of the transformation.
Below you can find an example for better understanding. Let’s take a look at a one-dimensional transformation for the following source vector of luminance sample values:
56, 57, 56, 53, 60, 53, 56, 58
Applying the transformation, we get the vector of coefficients:
449.0, 2.36396103, 4.0, -10.36396103, 7.0, -10.36396103, 4.0, 2.36396103
Next we quantize it severely, thus irretrievably losing part of the information:
449, 0, 0, -10, 7, -10, 0, 0
Apply the inverse transform:
54.5, 57.01776695, 57.0, 53.48223305, 59.5, 53.48223305, 57.0, 57.01776695
The reverse transform has restored the sequence with sufficient precision, taking into account the rough quantization and rounding.
This blog post completes the discussion of mathematical transformations and their role in modern video encoding standards. We’ve already discussed their intended use and generation rules. Now it’s time to take another step forward and touch upon the topic of entropy encoding.
A bit of formulas
Mathematically, for an n x n block with elements Sij, the direct DCT specified by the following formulas:
The inverse transformation is as follows:
The formulas simply detail matrix operations. In other terms, the direct transform may be described as a matrix product:
where S is the sample matrix of the source block, S' is the coefficients matrix, and A is the transformation matrix, with elements given as: