I know it's a little late but maybe it will be help someone for this session or an other session of this contest.
I found a website who explain the instructions SIMD (SSE1 and SSE2).
It's in french, but you can find examples.
And thinks you can put 16 Char in one register __m128i;
And see the STTNI of SSE4.2 to compare this registers.
I hope it help someone :)