As far as I see from the preliminary documents, most of the extended instructions either operate on the lower half (arithmetic integer, for example) or do the same thing on the two half separately. To me it seems that what are going to get is not double throughput (as the jump from mmx to sse/sse2 meant) but additional eight xmm registers and the troubles managging them because they are tied together with their lower half. There are hardly any instructions that would cross the boundary of the two half. Shuffle the components? Not possible in one step. Want add two sets of integers together? Not even possible in two steps! These operations could be carried out much easier just having those upper register parts as discrate xmm registers. New instruction encoding, 3+ ops, some of the floating point instuctions that have ymm args are nice, but I can't see the advantage of having 256 bit regs over twice the number of 128 bit regs when the instruction set is not extended well to support them.
For more complete information about compiler optimizations, see our Optimization Notice.