GCC 11.3 vectorizes the following code. GCC 12.2 fails to vectorize. #include <algorithm> #include <array> #include <ranges> std::array<int, 16> foo(std::array<int, 16> u, std::array<int, 16> const &v) { std::ranges::transform(u, v, u.begin(), std::plus<int>()); return u; } https://godbolt.org/z/KnhdPs6G3
-std=c++20 -O3
In GCC 12 before the vectorizer we have: <bb 2> [local count: 114863530]: _4 = v_2(D) + 64; _5 = &v_2(D)->_M_elems; if (_4 != _5) goto <bb 5>; [89.30%] else goto <bb 4>; [10.70%] <bb 5> [local count: 102576004]: <bb 3> [local count: 958878296]: # __first1_24 = PHI <_11(6), &u._M_elems(5)> # __first2_25 = PHI <_12(6), _5(5)> _7 = MEM[(const int &)__first1_24]; _9 = *__first2_25; _10 = _7 + _9; *__first1_24 = _10; _11 = __first1_24 + 4; _12 = __first2_25 + 4; _15 = _4 != _12; _18 = &MEM <struct array> [(void *)&u + 64B] != _11; _16 = _15 & _18; if (_16 != 0) goto <bb 6>; [89.30%] else goto <bb 4>; [10.70%] <bb 6> [local count: 856302294]: goto <bb 3>; [100.00%] But with GCC 11 we had: <bb 2> [local count: 114863530]: _2 = &MEM <const int[16]> [(void *)v_3(D) + 64B]; _5 = &v_3(D)->_M_elems; goto <bb 5>; [100.00%] <bb 4> [local count: 114863532]: <retval> = u; return <retval>; <bb 6> [local count: 899822495]: <bb 5> [local count: 1014686026]: # __first1_22 = PHI <_11(6), &u._M_elems(2)> # __first2_23 = PHI <_12(6), _5(2)> # ivtmp_24 = PHI <ivtmp_13(6), 16(2)> _7 = MEM[(const int &)__first1_22]; _9 = *__first2_23; _10 = _7 + _9; *__first1_22 = _10; _11 = __first1_22 + 4; _12 = __first2_23 + 4; ivtmp_13 = ivtmp_24 - 1; if (ivtmp_13 != 0) goto <bb 6>; [93.84%] else goto <bb 4>; [6.16%] There is a missing optimization before the vectorizer which is causing the vectorizer not to know how many iterations the loop is for. I am tries tracking down which passes the IR changes to make things worse but I didn't do a good at doing that.
Started with r12-3903-g0288527f47cec669.
GCC 12.3 is being released, retargeting bugs to GCC 12.4.