Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

<span class='option'> _mm256_dp_ps</span>

Calculates the dot product of float32 vectors. The corresponding Intel® AVX instruction is VDPPS.

Syntax

extern __m256 _mm256_dp_ps(__m256 m1, __m256 m2, const int mask);

Arguments

m1

float32 vector used for the operation

m2

float32 vector also used for the operation

mask

a constant of integer type where the high four bits of the mask determine how the resultant elements are summed and the low four bits determine whether the summed resultant value is to be broadcast to the destination vector or not

Description

First performs a SIMD multiplication of the lower four packed single-precision floating-point elements (float32 elements) from the first source vector m1 with corresponding elements in the second source vector m2.

Each of the four resulting single-precision elements is conditionally summed depending on the high four bits in the mask parameter.

The resulting summed value is broadcast to each of the lower 4 positions in the destination vector, if the corresponding lower bit of the mask is "1". If the corresponding lower bit of the mask is zero, the corresponding lower element in the destination vector is set to zero.

The process is then replicated with the high elements of the source vectors.

Returns

Result of the operation.