mmx memcpy

I'm new to mmx technology and
would like to speed up my application
by simply replacing the C++ memcpy
function with a faster mmx one.

AMD provides a copy/paste c++
and assembler version of the memcpy
function optimized for AMD.

I couldn't find one at the intel site
nor in the web. I found many documents
about mmx, I downloaded the IPP and MKL
but none of them offers a copy/paste
memcpy function.

Please help

Here's what the Intel C++ Compiler generates:

;;; memcpy(a, b, c);

mov edi, a
mov esi, b
mov ecx, c
push ecx
shr ecx, 2
rep movsd
pop ecx
and ecx, 3
rep movsb

