ippsTanh_32f_A11 at x64

ippsTanh_32f_A11 at x64

Interesting fact. Two functions that calculate the tanh with almost equal accuracy. Why is the performance of the approximated function is twice as high at x64?

void	function_1( float* lq, size_t lq_size )

{

	#pragma ivdep

	#pragma vector always

	for (size_t i = 0; i < lq_size; i++)

		lq[i]	/= 2.0f;
	ippsTanh_32f_A11( lq, lq, lq_size );
	return	;

}

and another

float	tanh_approximared( float x )			// excellent

{

	float		xa		= abs( x );		// do not optimization this line

	float		x2		= xa * xa;

	float		x3		= xa * x2;

	float		x4		= x2 * x2;

	float		x7		= x3 * x4;

	float		res		= (1.0f - 1.0f / (1.0f + xa + x2 + 0.58576695f * x3 + 0.55442112f * x4 + 0.057481508f * x7));

	return		(x > 0.0f ? res : -res);

}
void	function_2( float* lq, size_t lq_size )

{

	#pragma ivdep

	#pragma vector always

	for (size_t i = 0; i < lq_size; i++)

		lq[i]	= tanh_approximared( lq[i] / 2.0f );
	return	;

}

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Muved,

Could you pleasetell more test details, like OS, the problem size and how do you link the ipp?

ora completed test casewill behelpful.

Thanks
Ying

Two improvements could be done for the approximated tanhfunction:

- don't use local variables x2, x3, x4 andx7
- normalize the polynomial in order to reduce number of multiplications, that is, x^2+x^4 = x^2 * ( 1 + x^2 )

I use these improvements in my high-performance sin, cos, tan, etc functions.

Best regards,
Sergey

Hi Muved,

Your appriximation of tanhf is simple 7-degree polynomial one without any range reduction.It cannot be accurate on whole input range andit doesn'tsatisfy accuracy requirements for IPP A11 functions (at least 11 correct mantissa bits which corresponds to ~ 4096 ulp).

There arecouple oferror arguments, for example:

Input: 0.248947113752365 [0x3e7eebfe]
Output: 0.243623077869415 [0x3e797854]
Reference: 0.24392868578434 [0x3e79c871]
Error:20508.53 ulp

Input: 4.333872457e-019 [0x20ffd3a1]
Output: 0.0000000[0x00000000]
Reference: 4.333872457e-019 [0x20ffd3a1]
Error: -1.68e+007 ulp

That's why IPP implementation is slower.

Regards,
Andrey K.

Thanks all.

Leave a Comment

Please sign in to add a comment. Not a member? Join today