Besides the Intel Press publications on this subject, many historic and more recent papers have been on line. Mine have been taken down (for lack of interest?).
As you appear to be interested in the combination of auto-vectorization (Instruction Level Parallel) and SMP (threaded parallel), you might look at my post on the subject
http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/61470/
As I show, auto-vectorization in C often depends on C99 restrict, and frequently on pragmas.
If you don't like Intel or gnu compilers, you may be more interested in vector intrinsics, but then maybe you shouldn't be on this forum. I show just one case where the intrinsics accomplish optimization which is not otherwise supported by current compilers, but is effective only on Barcelona and Core i7 processors.
The value of auto-vectorization has been verified repeatedly, since long before Intel began to support it. It was already a significant minority view when C89 was defined but declined to support it to the desired extent. By the time C99 came along, there was sufficient interest to revive the restrict proposal and begin implementation of the typed aliasing features of C89 (icc -ansi_alias). I suppose you could take the widespread reluctance to follow the language rules which support vectorization, to implement C99, or to standardize equivalent features in C++, as a significant vote against auto-vectorization.