Thanks for your answer, you're right - this size would have been rather small. I actually mixed things up a bit: I have a grid consisting of 40x40 points with 3 DOFs each. This means there are 1600x3=4800 unknowns and hence the dimension of the LES's matrixis rather 4800x4800. Although this is still much smaller than the problem size you were mentioning I think I should get a speed-up for that. Maybe the level 2 Sparse BLAS routines aren't threaded at all?
For more complete information about compiler optimizations, see our Optimization Notice.