Query on Loop Vectorization in std::reduce
Abhishek Kaushik
abhishek.kaushik@intel.com
Thu Jan 30 20:28:27 GMT 2025
Hello libstdc++ maintainers,
I am encountering an issue with the following piece of code in std::reduce:
while ((__last - __first) >= 4)
{
_Tp __v1 = __binary_op(__first[0], __first[1]);
_Tp __v2 = __binary_op(__first[2], __first[3]);
_Tp __v3 = __binary_op(__v1, __v2);
__init = __binary_op(__init, __v3);
__first += 4;
}
The Intel compiler is not able to vectorize this loop. However, if I change the loop to a for loop like this:
for (; __first <= __last - 4; __first += 4)
{
_Tp __v1 = __binary_op(__first[0], __first[1]);
_Tp __v2 = __binary_op(__first[2], __first[3]);
_Tp __v3 = __binary_op(__v1, __v2);
__init = __binary_op(__init, __v3);
}
to follow the OpenMP canonical form, vectorization can occur, resulting in improved performance.
If I submit a patch to change the loop header, will it be accepted?
Thank you for your assistance!
Best regards,
Abhishek Kaushik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://gcc.gnu.org/pipermail/libstdc++/attachments/20250130/0796123e/attachment.htm>
More information about the Libstdc++
mailing list