English | 中文 | Русский | Français
2,555 Posts served
8,264 Conversations started
Here’s another quick performance tip for LAME mp3 encoding. This nested loop in the function ‘L3psycho_anal_ns’ is a hotspot for constant bit-rate encoding: for (i = 0; i < 576; i++) { FLOAT sum1, sum2; sum1 = firbuf[i + 10]; sum2 = 0.0; for (j = 0; j < ((NSFIRLEN - 1) / 2) - 1; j += [...]
In this post I'd like to promote the new SSE 4.1 instruction set extension as it relates to the quantization loop I wrote about a few months ago. As you may recall, the modified code from ‘quantize_xrpow_lines" looked like this: for(i=0; i < l; i++) { float x0 = xr[i] * istep; int [...]
In this entry I want to highlight a loop in the ‘count_bits’ function which yielded a 1.15x app-level gain when we coaxed it to vectorize with the Intel Compiler. After disabling Takehiro’s float-to-int hack, this was the top hotspot in our constant bit-rate encoding workload: for (l = -width; l < 0; l++) if (xr[j [...]
One of the nice things about working on open source code is that any interesting findings can be freely discussed, such as in this blog. With that in mind I recently took up a project to optimize performance of the popular LAME mp3 encoder. Over the years I had seen LAME used in several other [...]
Continuing with the GaussianRand example, a 1.5x gain is nice but were there additional opportunities for performance gains? Of course there were! (That was a rhetorical question…) Seeing as floating point divides are among the longer latency operations, we should look at the two that are coded into the do/while loop to normalize the random [...]
Now we’ll take a look at optimizing something more interesting and complex. Since we can’t show much of the customer source we work on, we’ll look at some public domain code from the internet, specifically this Box Muller random number transformation from http://www.taygeta.com/random/gaussian.html: for (int i = 0; i < LENGTH; i++) { double w, [...]
Here at Intel we have spent much of the last year assessing the rising buzz about GPGPU’s and other accelerator cards in the financial services community. These technologies promise tremendous computing capability, but often we see performance claims that are exaggerated by comparing the best possible accelerator implementation to a very unoptimal version of the [...]