Query on Loop Vectorization in std::reduce

Abhishek Kaushik abhishek.kaushik@intel.com
Thu Jan 30 20:28:27 GMT 2025


Hello libstdc++ maintainers,
I am encountering an issue with the following piece of code in std::reduce:

while ((__last - __first) >= 4)
{
  _Tp __v1 = __binary_op(__first[0], __first[1]);
  _Tp __v2 = __binary_op(__first[2], __first[3]);
  _Tp __v3 = __binary_op(__v1, __v2);
  __init = __binary_op(__init, __v3);
  __first += 4;
}


The Intel compiler is not able to vectorize this loop. However, if I change the loop to a for loop like this:

for (; __first <= __last - 4; __first += 4)
{
  _Tp __v1 = __binary_op(__first[0], __first[1]);
  _Tp __v2 = __binary_op(__first[2], __first[3]);
  _Tp __v3 = __binary_op(__v1, __v2);
  __init = __binary_op(__init, __v3);
}


to follow the OpenMP canonical form, vectorization can occur, resulting in improved performance.
If I submit a patch to change the loop header, will it be accepted?
Thank you for your assistance!
Best regards,
Abhishek Kaushik

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://gcc.gnu.org/pipermail/libstdc++/attachments/20250130/0796123e/attachment.htm>


More information about the Libstdc++ mailing list