SSE4.1 optimization problem

SSE4.1 optimization problem

Hi,

We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in MPSADBW instruction or something more?

Best Regards,
Alexey.

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Alexey,

Quoting Livikin AlexeyHi,

We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in MPSADBW instruction or something more?

Best Regards,
Alexey.

A 'Sum of Absolute Differences' instruction 'MPSADBW'was introduced with SSE4 and since your post
starts with expression'We have problem with optimization...' it would be nice if you provide a test case.

I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project).

Best regards,
Sergey

Quoting Sergey Kostrov...A 'Sum of Absolute Differences' instruction 'MPSADBW'was introduced with SSE4 and since your post
starts with expression'We have problem with optimization...' it would be nice if you provide a test case.

I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project)...
Here is a small examplethat showshow the instruction was used:

Note: 'NOP' instructions are used becausein 2010 I was using Intel Emulatorof SSE4 instructions...

...

;//////////////////////////////////////////////////////////////////////////////

; AsmTestLib.asm
include SSE4inst.inc
.686

.XMM
;//////////////////////////////////////////////////////////////////////////////
_TEXT	SEGMENT

_argSrc$		=  8								; size = 4

_argDst$		= 12								; size = 4

_spValueSize4	=  4								; size of a Single-Precision value

_dpValueSize8	=  8								; size of a Double-Precision value

_TEXT	ENDS
;//////////////////////////////////////////////////////////////////////////////
.CODE
;//////////////////////////////////////////////////////////////////////////////

; C/C++ Declaration:

; RTint SSE4CalcSAD( RTubyte *pchSrc, RTubyte *pchDst );
SSE4CalcSAD		PROC NEAR

	MOV			eax, DWORD PTR _argSrc$[esp-4]	; Load Source array of bytes into xmm1

	MOVDQU		xmm1, [eax]
	MOV			ecx, DWORD PTR _argDst$[esp-4]	; Load Destination array of bytes into xmm2

	MOVDQU		xmm2, [ecx]
	MPSADBW		xmm2, xmm1, 0					; Calculate SAD

	NOP											; NOP instructions placed between SSE4 instructions
	PHMINPOSUW	xmm1, xmm2						; Identify minimum SAD

	NOP
	PEXTRW		ecx, xmm1, 0					; Extract minimum SAD Value

	NOP

	PEXTRW		eax, xmm1, 1					; Extract minimum SAD Index

	NOP
	RET

SSE4CalcSAD		ENDP

...

Thanx Sergey. I think i send small info. I not implement that instruction. I get ready ipp samples and make h.264 encoder based on that sources. h.264 profile baseline.

Quoting Livikin Alexey...set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode...

[SergeyK] Could you provide more technical details? What devices are you using?

...understand problem in MPSADBW instruction or something more?

[SergeyK] It is hard to believe that there is an internal problem with 'MPSADBW' instruction. I would
consider:

- an optimizationproblem with the C++ compiler
- some problem(s) with thelibrary

Yes i think same, problem not in a instruction....

Quoting Livikin AlexeyYes i think same, problem not in a instruction....

Hi Alexey,
Could provide some details on what IPP function was used? What about a simple test-case?
Best regards,
Sergey

Hi Alexey,

How do you link IPP in your encoder application? If general link, like dynamic link or static link as
the article described: http://software.intel.com/en-us/articles/introduction-to-linking-with-intel-ipp-70-library/
the external Compiler option like -O2, -O3 or -msse3, -msse4.1 etc don't influence the ipp function internally.

-msse* mean
No cpu id check
Intel and non-Intel processors
Illegal instruction error if run on unsupported processor
At least Pentium4 required (sse2)

You mentioned the code crashed on some device, what kind of device (processor) they are?

It seems wemay need to movethe topic to Intel Compiler Forum if youhave test caseand see if Compiler experts can give some hints?

Best Regards,
Ying

Hi Ying,

Link static and dynamic. "Devices" in my post its any Polycom videoconference hardware or software endpoints. I use h264 encoder from ipp samples with slice size -1300. With -msse4.1 Endpoints stop decoding after hard move scenes,and i see i-frame requests from endpoints after that(i try generate i-frames but to no avail, the playback is not restored). With -msse3 on h264 i can say all work fine. But same problem exist in h263 encoder(with -msse3) she appears very rarely and maybe its problem with something more on my code. H263 encoder same from ipp samples.

Best Regards,
Alexey.

After some tests i can add next:1. Problem exist with any optimization parameters if in h264 encoder seted parameters mv_search_method = 0,1,2; me_search_x = 4; me_search_y = 4;That true for CIF dimension(352x288), on 4CIF (704x576) no problem...2. If set: mv_search_method = 0,1,2; me_search_x = 0; me_search_y = 0;then problem exist with -msse4.1 on -msse3 him go away. So look problem not in optimization only.

Login to leave a comment.