91954 – [11/12/13/14 Regression] gcc.dg/vect/pr66142.c should not need early inlining to be vectorized since r10-3311-gff6686d2e5f797d6

Bug 91954 - [11/12/13/14 Regression] gcc.dg/vect/pr66142.c should not need early inlining to be vectorized since r10-3311-gff6686d2e5f797d6

Summary: [11/12/13/14 Regression] gcc.dg/vect/pr66142.c should not need early inlining...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	10.0

Importance:	P2 normal
Target Milestone:	11.5
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2019-10-01 17:01 UTC by Jan Hubicka
Modified:	2023-07-07 10:36 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:	9.2.0
Known to fail:	10.0
Last reconfirmed:	2021-11-21 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jan Hubicka 2019-10-01 17:01:34 UTC

Currently we need early inlinining to vectorize the testcase even though late inlining does all necessary job. This seems like pass ordering problem.

Comment 1 Richard Biener 2019-10-02 08:07:34 UTC

Hmm.

> make check-gcc RUNTESTFLAGS="--target_board=unix/-fno-early-inlining vect.exp=pr66142.c"
...
                === gcc Summary ===

# of expected passes            4

so it works? (ok, this is on the gcc 9 branch)

Comment 2 Martin Liška 2020-01-29 13:16:08 UTC

It's not working since the addition of IPA SRA in r10-3311-gff6686d2e5f797d6.

Comment 3 Richard Biener 2020-02-07 11:22:15 UTC

IPA SRA splits the struct B parameter:

Evaluating analysis results for bar/0
  Will split parameter 0
    - component at byte offset 0, size 4
    - component at byte offset 4, size 4
    - component at byte offset 8, size 4
    - component at byte offset 12, size 4

which is passed as

  D.2220[_14].t.x = 1.0e+0;
  D.2220[_14].t.y = u_11;
  D.2220[_14].w.x = x_16;
  D.2220[_14].w.y = y_18;
  _23 = &D.2220[_14];
  _24 = bar (_23);

producing

bar.isra (const float ISRA.5, const float ISRA.6, const float ISRA.7, const float ISRA.8)

the inlining of this then ends up as

  _23 = &D.2220[_14];
  _27 = MEM[(float *)_23];
  _28 = MEM[(float *)_23 + 4B];
  _29 = MEM[(float *)_23 + 8B];
  _30 = MEM[(float *)_23 + 12B];

where formerly forwprop ended up combining the &D.2220[_14] into the
four dereferences with the nice acccess paths which it cannot do from
the above:

  D.2220[_20].t.x = 1.0e+0;
  D.2220[_20].t.y = u_13;
  D.2220[_20].w.x = x_22;
  D.2220[_20].w.y = y_24;
  _30 = MEM <struct B[32]> [(const struct B *)&D.2220][_20].t.x;
  _34 = MEM <struct B[32]> [(const struct B *)&D.2220][_20].t.y;
  _35 = MEM <struct B[32]> [(const struct B *)&D.2220][_20].w.x;
  _36 = _30 * _35;
  _37 = MEM <struct B[32]> [(const struct B *)&D.2220][_20].w.y;

so instead we get

  D.2220[_14].t.x = 1.0e+0;
  D.2220[_14].t.y = u_11;
  D.2220[_14].w.x = x_16;
  D.2220[_14].w.y = y_18;
  _23 = &D.2220[_14];
  _27 = MEM[(float *)_23];
  _28 = MEM[(float *)_23 + 4B];
  _29 = MEM[(float *)_23 + 8B];
  _30 = MEM[(float *)_23 + 12B];

which we are not able to elide (which is of course kind-of lame).

IIRC we have a duplicate report for this somewhere assigned to me (for the VN
side).  VN already can do some tricks but it doesn't handle offsetted
MEM_REFs there (vn_reference_maybe_forwprop_address) and it doesn't somehow
work for the not offseted one either.

Comment 4 Richard Biener 2020-02-07 12:01:01 UTC

GIMPLE testcase:

struct A { float x, y; };
struct B { struct A t; };

float __GIMPLE (ssa,startwith("fre"))
foo (float a, int i)
{
  struct B D_2220[32];
  float *_23;
  float _27;
  float _28;
  float _31;

  __BB(2):
  D_2220[i_14(D)].t.x = 1.0e+0f;
  D_2220[i_14(D)].t.y = a_11(D);
  _23 = &D_2220[i_14(D)];
  _27 = __MEM <const float> ((float *)_23);
  _28 = __MEM <const float> ((float *)_23 + _Literal (float *) 4);
  _31 = _27 + _28;
  return _31;
}

note the issue isn't only ref matching but alias disambiguation of
the second store against the first load.  For first load we don't
have an access path (VN has but it's representation is not the same
the alias oracle uses...).  Need to embrace a canonical decomposed
form in ao_ref maybe.

Comment 5 Richard Biener 2020-04-01 08:23:53 UTC

The VN issue is old and has dups.  So this issue is about IPA SRA making a mess of the IL and or not playing well with IPA inline?  Because it worked with
the old IPA SRA?

I wonder why we use the IPA SRAed clone for inlining rather than the original
function.

Comment 6 Jakub Jelinek 2020-05-07 11:56:12 UTC

GCC 10.1 has been released.

Comment 7 Richard Biener 2020-07-23 06:52:04 UTC

GCC 10.2 is released, adjusting target milestone.

Comment 8 Richard Biener 2021-04-08 12:02:18 UTC

GCC 10.3 is being released, retargeting bugs to GCC 10.4.

Comment 9 Jakub Jelinek 2022-06-28 10:38:38 UTC

GCC 10.4 is being released, retargeting bugs to GCC 10.5.

Comment 10 Richard Biener 2023-07-07 10:36:04 UTC

GCC 10 branch is being closed.