Author's Blogs

Restructuring loops for LAME mp3 high-pass filter
By Michael Stoner (Intel)Posted 09/10/20090
Here’s another quick performance tip for LAME mp3 encoding.  This nested loop in the function ‘L3psycho_anal_ns’ is a hotspot for constant bit-rate encoding:        for (i = 0; i < 576; i++)        {            FLOAT   sum1, sum2;            sum1 = firbuf[i + 10];            sum2 = 0.0;       ...
Using SSE4.1 for mp3 encoding quantization
By Michael Stoner (Intel)Posted 01/07/20091
In this post I'd like to promote the new SSE 4.1 instruction set extension as it relates to the quantization loop I wrote about a few months ago. As you may recall, the modified code from ‘quantize_xrpow_lines" looked like this: for(i=0; i < l; i++)    {       float x0 = xr[i] * istep;      ...
Another tip for faster mp3 encoding
By Michael Stoner (Intel)Posted 10/31/20082
In this entry I want to highlight a loop in the ‘count_bits’ function which yielded a 1.15x app-level gain when we coaxed it to vectorize with the Intel Compiler.  After disabling Takehiro’s float-to-int hack, this was the top hotspot in our constant bit-rate encoding workload: for (l = -width; l...
Open source project - LAME mp3 encoder optimization
By Michael Stoner (Intel)Posted 10/06/20080
One of the nice things about working on open source code is that any interesting findings can be freely discussed, such as in this blog.  With that in mind I recently took up a project to optimize performance of the popular LAME mp3 encoder.  Over the years I had seen LAME used in several other s...
Assessing the accelerator buzz: Another tip for faster Monte Carlo computing
By Michael Stoner (Intel)Posted 07/30/20080
Continuing with the GaussianRand example, a 1.5x gain is nice but were there additional opportunities for performance gains?  Of course there were! (That was a rhetorical question…)  Seeing as floating point divides are among the longer latency operations, we should look at the two that are coded...
Assessing the accelerator buzz: Vectorization of Monte Carlo algorithms
By Michael Stoner (Intel)Posted 07/15/20080
Now we’ll take a look at optimizing something more interesting and complex.  Since we can’t show much of the customer source we work on, we’ll look at some public domain code from the internet, specifically this Box Muller random number transformation from http://www.taygeta.com/random/gaussian.h...
Assessing the accelerator buzz: Tips and Tricks for Intel® Compiler vectorization
By Michael Stoner (Intel)Posted 06/26/20083
Here at Intel we have spent much of the last year assessing the rising buzz about GPGPU’s and other accelerator cards in the financial services community.  These technologies promise tremendous computing capability, but often we see performance claims that are exaggerated by comparing the best po...