Bug 111798 - [14/15 Regression] Recent change causing testsuite regression and poor code on mcore-elf
Summary: [14/15 Regression] Recent change causing testsuite regression and poor code o...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P2 normal
Target Milestone: 14.3
Assignee: Richard Biener
URL:
Keywords: missed-optimization, wrong-code
Depends on:
Blocks:
 
Reported: 2023-10-13 15:30 UTC by Jeffrey A. Law
Modified: 2025-02-28 11:02 UTC (History)
4 users (show)

See Also:
Host:
Target: mcore-elf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2025-02-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeffrey A. Law 2023-10-13 15:30:29 UTC
This change:

commit 6decda1a35be5764101987c210b5693a0d914e58
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Oct 12 11:34:57 2023 +0200

    tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA
    
    The following handles byte-aligned, power-of-two and byte-multiple
    sized BIT_FIELD_REF reads in SRA.  In particular this should cover
    BIT_FIELD_REFs created by optimize_bit_field_compare.
    
    For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
    appearing there leading to more DSE, fully eliding the aggregates.
    
    This results in the same false positive -Wuninitialized as the
    older attempt to remove the folding from optimize_bit_field_compare,
    fixed by initializing part of the aggregate unconditionally.
    
            PR tree-optimization/111779
    gcc/
            * tree-sra.cc (sra_handled_bf_read_p): New function.
            (build_access_from_expr_1): Handle some BIT_FIELD_REFs.
            (sra_modify_expr): Likewise.
            (make_fancy_name_1): Skip over BIT_FIELD_REF.
    
    gcc/fortran/
            * trans-expr.cc (gfc_trans_assignment_1): Initialize
            lhs_caf_attr and rhs_caf_attr codimension flag to avoid
            false positive -Wuninitialized.
    
    gcc/testsuite/
            * gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
            * gcc.dg/vect/vect-pr111779.c: New testcase.

Causes execute/20040709-2.c to fail on mcore-elf at -O2.  It also results in what appears to be significantly poorer code generation.

Note I haven't managed to get mcore-elf-gdb to work, so debugging is, umm, painful.  And I wouldn't put a lot of faith in the simulator correctness.

I have simplified the test to this:
extern void abort (void);
extern void exit (int);

unsigned int
myrnd (void)
{
  static unsigned int s = 1388815473;
  s *= 1103515245;
  s += 12345;
  return (s / 65536) % 2048;
}

struct __attribute__((packed)) K
{
  unsigned int k:6, l:1, j:10, i:15;
};

struct K sK;

unsigned int
fn1K (unsigned int x)
{
  struct K y = sK;
  y.k += x;
  return y.k;
}

void
testK (void)
{
  int i;
  unsigned int mask, v, a, r;
  struct K x;
  char *p = (char *) &sK;
  for (i = 0; i < sizeof (sK); ++i)
    *p++ = myrnd ();
  v = myrnd ();
  a = myrnd ();
  sK.k = v;
  x = sK;
  r = fn1K (a);
  if (x.j != sK.j || x.l != sK.l)
    abort ();
}

int
main (void)
{
  testK ();
  exit (0);
}


Which should at least make the poor code gen obvious.  I don't expect to have time to debug this further anytime in the near future.
Comment 1 Richard Biener 2023-10-16 08:29:29 UTC
If I configured correctly the only change (early) SRA does is

 unsigned int fn1K (unsigned int x)
 {
+  <unnamed-unsigned:6> y$k;
...
@@ -57,13 +53,14 @@
 
   <bb 2> :
   y = sK;
-  _1 = y.k;
+  y$k_13 = sK.k;
+  _1 = y$k_13;
   _2 = (unsigned char) _1;
   _3 = (unsigned char) x_9(D);
   _4 = _2 + _3;
   _5 = (<unnamed-unsigned:6>) _4;
-  y.k = _5;
-  _6 = y.k;
+  y$k_14 = _5;
+  _6 = y$k_14;
   _11 = (unsigned int) _6;
   y ={v} {CLOBBER(eol)};
   return _11;

and later DSE eliminates 'y' completely.  Note I can't see any big assembly
difference -O2 vs -O2 -fno-tree-sra so I wonder if I really reproduced the
significantly worse code generation issue.  In fact generated code looks
better, the stack slot is elided:

> diff -u t.s.good t.s.bad 
--- t.s.good    2023-10-16 10:18:27.457651977 +0200
+++ t.s.bad     2023-10-16 10:07:49.997656268 +0200
@@ -19,13 +19,11 @@
        .export fn1K
        .type   fn1K, @function
 fn1K:
-       subi    sp,8
        lrw     r7, sK
        ld.b    r7,(r7)
        addu    r2,r7
        movi    r7,63
        and     r2,r7
-       addi    sp,8
        jmp     r15
        .size   fn1K, .-fn1K
        .align  1
Comment 2 Jeffrey A. Law 2024-03-09 06:03:17 UTC
I wouldn't look at with/without tree-sra.  Instead I would look at the assembly code before/after your change.  It's clearly worse.  testK goes from ~37 instructions to ~70 and the abort call is no longer removed.
Comment 3 Richard Biener 2024-03-11 09:49:57 UTC
OK, so the abort () call is only elided during RTL optimization before the change.  The change itself first manifests during ESRA as

 void testK ()
 {
-  <unnamed-unsigned:10> x$j;
   unsigned int D.1835;
   static unsigned int s = 1388815473;
   unsigned int D.1834;
@@ -167,14 +157,13 @@
   _5 = (<unnamed-unsigned:6>) _51;
   sK.k = _5;
   x = sK;
-  x$j_55 = sK.j;
   y$k_36 = sK.k;
   _37 = (unsigned char) y$k_36;
   _38 = (unsigned char) _46;
   _39 = _37 + _38;
   _40 = (<unnamed-unsigned:6>) _39;
   _41 = (unsigned int) _40;
-  _6 = x$j_55;
+  _6 = x.j;
   _7 = sK.j;
   if (_6 != _7)
     goto <bb 7>; [INV]

there are other uses of 'x' below, unchanged:

  else
    goto <bb 6>; [INV]

  <bb 6> :
  _8 = BIT_FIELD_REF <x, 8, 0>;
  _9 = BIT_FIELD_REF <sK, 8, 0>;
  _10 = _8 ^ _9;
  _11 = _10 & 64;
  if (_11 != 0)
    goto <bb 7>; [INV]
  else
    goto <bb 8>; [INV]

  <bb 7> :
  abort ();

and with the change we now analyze those and run into

! Disqualifying sK - Address taken in a non-call-argument context.
! Disqualifying x - No or inhibitingly overlapping accesses.


That's because fold somehow transformed one but not the other bitfield
compares.

In particular we have the following two accesses

_8 = BIT_FIELD_REF <x, 8, 0>;
_6 = x.j;

where the first is offset = 0, size = 8 and the second offset = 7, size = 10

and those partially overlap.  Previous to the change SRA was doing a
not profitable change in the end causing it to leave 'x' around.

The interesting difference is in FRE though where without the change we
are able to elide the _6 != _7 compare but with it we fail.  That's because
SRA makes the load of sK.j available - while VN can see through the
x = sK assignment it won't derive a value from the result unless there's
a leader for it in the IL.

Swapping the compare to

  if (sK.j != x.j || x.l != sK.l)
    abort ();

makes it optimized and the abort () call vanishes as well - in fact it
now vanishes already in DOM2, much before RTL optimization.

I don't think the SRA change is to be blamed for doing anything wrong, we're
running into natural limitations of CSE and premature folding inhibiting
SRA (which the SRA change itself was supposed to handle a bit better).

When there's actual wrong-code now on mcore this is a latent problem
uncovered by the change.

The missed-optimization is yet another "dead code elimination regression".

I don't see an easy way to solve the missed-optimization regression.  On the
VN side, when we just see

  x = sK;
  _3 = x.j;
  _4 = sK.j;

then visiting the x.j load, when running into x = SK, could eventually
also put sK.j into the hash table (with the same VARYING value we assign
to x.j).  This would require some extra bookkeeping as we actually compute
this expression already but do not keep it as alternate form to record
when we figure there's no actual leader for it.  Testcase:

struct X { int a; int b; } p, o;
void foo ()
{
  p = o;
  if (p.a != o.a)
    __builtin_abort ();
}

Let me take this for this particular VN issue.
Comment 4 Richard Biener 2024-05-07 07:42:11 UTC
GCC 14.1 is being released, retargeting bugs to GCC 14.2.
Comment 5 Jakub Jelinek 2024-08-01 09:35:20 UTC
GCC 14.2 is being released, retargeting bugs to GCC 14.3.
Comment 6 Richard Biener 2025-02-28 11:02:50 UTC
So even after the recent ifcombine work we still produce during early folding:

  if (x.j != sK.j || ((BIT_FIELD_REF <x, 8, 0> ^ BIT_FIELD_REF <sK, 8, 0>) & 64) != 0)
    {
      abort ();
    }

and ifcombine does not optimize this further.

Disabling optimize_bit_field_compare optimizes this fully even before
ifcombine has a chance to perform the same optimization.