Bug 107690 - [12/13/14 Regression] vectorization fails for std::ranges::transform due to IR changes since r12-3903-g0288527f47cec669
Summary: [12/13/14 Regression] vectorization fails for std::ranges::transform due to I...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P2 normal
Target Milestone: 12.4
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2022-11-14 19:53 UTC by Mark Bourgeault
Modified: 2023-05-08 12:25 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-11-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Bourgeault 2022-11-14 19:53:48 UTC
GCC 11.3 vectorizes the following code.  GCC 12.2 fails to vectorize.

#include <algorithm>
#include <array>
#include <ranges>

std::array<int, 16> foo(std::array<int, 16> u, std::array<int, 16> const &v)
{
    std::ranges::transform(u, v, u.begin(), std::plus<int>());
    return u;
}

https://godbolt.org/z/KnhdPs6G3
Comment 1 Andrew Pinski 2022-11-14 19:59:30 UTC
-std=c++20 -O3
Comment 2 Andrew Pinski 2022-11-14 20:08:57 UTC
In GCC 12 before the vectorizer we have:
  <bb 2> [local count: 114863530]:
  _4 = v_2(D) + 64;
  _5 = &v_2(D)->_M_elems;
  if (_4 != _5)
    goto <bb 5>; [89.30%]
  else
    goto <bb 4>; [10.70%]

  <bb 5> [local count: 102576004]:

  <bb 3> [local count: 958878296]:
  # __first1_24 = PHI <_11(6), &u._M_elems(5)>
  # __first2_25 = PHI <_12(6), _5(5)>
  _7 = MEM[(const int &)__first1_24];
  _9 = *__first2_25;
  _10 = _7 + _9;
  *__first1_24 = _10;
  _11 = __first1_24 + 4;
  _12 = __first2_25 + 4;
  _15 = _4 != _12;
  _18 = &MEM <struct array> [(void *)&u + 64B] != _11;
  _16 = _15 & _18;
  if (_16 != 0)
    goto <bb 6>; [89.30%]
  else
    goto <bb 4>; [10.70%]

  <bb 6> [local count: 856302294]:
  goto <bb 3>; [100.00%]


But with GCC 11 we had:

  <bb 2> [local count: 114863530]:
  _2 = &MEM <const int[16]> [(void *)v_3(D) + 64B];
  _5 = &v_3(D)->_M_elems;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 114863532]:
  <retval> = u;
  return <retval>;

  <bb 6> [local count: 899822495]:

  <bb 5> [local count: 1014686026]:
  # __first1_22 = PHI <_11(6), &u._M_elems(2)>
  # __first2_23 = PHI <_12(6), _5(2)>
  # ivtmp_24 = PHI <ivtmp_13(6), 16(2)>
  _7 = MEM[(const int &)__first1_22];
  _9 = *__first2_23;
  _10 = _7 + _9;
  *__first1_22 = _10;
  _11 = __first1_22 + 4;
  _12 = __first2_23 + 4;
  ivtmp_13 = ivtmp_24 - 1;
  if (ivtmp_13 != 0)
    goto <bb 6>; [93.84%]
  else
    goto <bb 4>; [6.16%]

There is a missing optimization before the vectorizer which is causing the vectorizer not to know how many iterations the loop is for.

I am tries tracking down which passes the IR changes to make things worse but I didn't do a good at doing that.
Comment 3 Martin Liška 2022-11-21 10:05:46 UTC
Started with r12-3903-g0288527f47cec669.
Comment 4 Richard Biener 2023-05-08 12:25:59 UTC
GCC 12.3 is being released, retargeting bugs to GCC 12.4.