4x4 matrix transpose using sse2 intrinsics

4x4 matrix transpose using sse2 intrinsics

i want to transpose a 4x4 matrix using sse2 intrinsics. How do i go about it?

x00 x01 x02 x03 //I0
x10 x11 x12 x13//I1
x20 x21 x22 x23 //I2
x30 x31 x32 x33//I3

5 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

You can use Macro from Visual Studio:
_MM_TRANSPOSE4_PS(row0, row1, row2, row3)

tim18, the initial post on matrix transpose was for sse and is best suited for floats matrices. my current question is for integer matrix . my brief search sugests that for integers matrices we need to use punpcklo, punpckhi[sse2 intrinsics] combination to achieve better transpose. advise how i can use these sse2 intrinsics.

Cast your __m128i variables into __m128 variables (using _mm_castsi128_ps), use the macro _MM_TRANSPOSE_PS, then cast back using _mm_castps_si128.

The codes compiles but B2[4][4] output are fictitous numbers. where am i wrong?

#include "stdafx.h"
#include "emmintrin.h"
#include
#include
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
int B1[4][4]={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};// matrix to be transposed
int B2[4][4];// transposed matrix
int n=0;
for (int i=0;i<4;i++)
for (int j=0;j<4;j++)
{
B1[i][j]=n; n++;
}
__asm{
movq mm1, B1
movq mm2, B1+8
movq mm3, B1+12
movq mm4, B1+16
//step one
punpcklwd mm1, mm2
punpcklwd mm3, mm4
movq mm5, mm1// copy mm1 into mm5
punpckldq mm1, mm3
punpckhdq mm5, mm3
// Move result to B2
movq B2, mm1
movq B2+8, mm0
//step two
punpckhwd mm1, mm2
punpckhwd mm3, mm4
movq mm5, mm1// copy mm1 into mm5
punpckldq mm1, mm3
punpckhdq mm5, mm3
// move result to B2
movq B2+12, mm1
movq B2+16, mm0
emms
}
for(int i = 0; i<4; i++){
for(int j = 0; j<4; j++) cout << B2[i][j] << " ";
cout << endl;
}
return 0;
}

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!