icc -O1 -fp-model source is roughly equivalent to gcc -O3 -ffast-math (in your version of gcc, where you don't get auto-vectorization without asking); if you are satisfied with the speed of that, it's hard for me to get excited about an 8% code size increase. I don't see how you could have got such a code size increase from icc without vectorization, unless you have something unusual going on with -ip, which you might suppress with -fno-inline-functions or reduced in-lining limits.
If you get significant advantage by adding auto-vectorization to gcc, which is implied by icc -O2 and -O3, without the code size increase, I might understand your complaint. I just filed an issue about an extra dead vector code version. Where your loops use exclusively aligned data, you can reduce the vector code expansion by #pragma vector aligned.
icpc normally inlines templates in cases where g++ economizes by using a single version invoked by multiple functions; I'm not certain if inlining limits would control that, beyond what you did with -O1.