Bug 115883 - [15 Regression] late-combine exposing LRA problems
Summary: [15 Regression] late-combine exposing LRA problems
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: 15.0
Assignee: Not yet assigned to anyone
URL: https://gcc.gnu.org/pipermail/gcc-pat...
Keywords: ice-on-valid-code
Depends on: 116236
Blocks:
  Show dependency treegraph
 
Reported: 2024-07-11 22:22 UTC by Hans-Peter Nilsson
Modified: 2024-08-26 23:07 UTC (History)
4 users (show)

See Also:
Host:
Target: cris
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-07-17 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hans-Peter Nilsson 2024-07-11 22:22:09 UTC
Since the introduction of the late-combine optimization in r15-1579-g792f97b44ffc5e, test-results for CRIS have shown regressions for the following tests, all getting an ICE:
gcc.sum gcc.target/cris/rld-legit1.c
gcc.sum gcc.target/cris/rld-legit2.c
gcc.sum gcc.target/cris/torture/pr24750-2.c

Those are all basically the same code tested from different angles.  In the code, an address, using two registers, is generated as desirable and valid.  While it's valid for hard-registers, there's pressure from an asm clobbering all hard-regs that can be part of an address, so spills are introduced.  Typically there's one spill to a special register, the other to stack.  This code used to be a coverage test for the CRIS implementation of the reload macro LEGITIMIZE_RELOAD_ADDRESS. To wit, the RTX for the spilled memory expression looked like this:
 (mem:QI (plus:SI (sign_extend:SI (mem:HI (reg/v/f:SI 32 [ a ]) [1 *a_6(D)+0 S2 A8])) (reg/v/f:SI 33 [ y ])) [0 *_3+0 S1 A8]).
Though, for brevity, let's use the "compact" representation: [sign_extend([r32:SI])+r33:SI] (sadly the inner mode, HImode is missing in the compact representation; probably just a bug).

The original reload problem is similar to what happens now (cf. PR24750):
reload (now LRA) splits the address at the mem instead of at the sign_extend, and
[sign_extend(r32:SI)+r33:SI] is *not* valid, whereas [r32:SI+r33:SI] is, in addition to the original.  The ICE now, is that the former address is generated by LRA, and later not recognized:
/src/gcc/gcc/testsuite/gcc.target/cris/rld-legit1.c: In function 'f':
/src/gcc/gcc/testsuite/gcc.target/cris/rld-legit1.c:21:1: error: insn does not satisfy its constraints:
`````
(insn 14 21 15 2 (parallel [
            (set (reg/i:SI 10 r10)
                (sign_extend:SI (mem:QI (plus:SI (sign_extend:SI (reg:HI 9 r9 [39]))
                            (reg:SI 13 r13 [37])) [0 *_3+0 S1 A8])))
            (clobber (reg:CC 19 ccr))
        ]) "/x/gcc/gcc/testsuite/gcc.target/cris/rld-legit1.c":21:1 52 {extendqisi\
2}
     (nil))
during RTL pass: reload
'''''
(The astute reader is aware that LRA identifies as reload in dump files.)

With LRA, there's no mechanism corresponding to LEGITIMIZE_RELOAD_ADDRESS: LRA seems to split-up spilled parts of the memory address.  These tests are instead now coverage for peephole2 patterns that cobbles together the split-up parts to emit the same assembly as for reload, for the intended addressing mode.  This transformation is not performance-critical, the peephole2 patterns were added because failing to do so would have constituted a regression for the reload-to-LRA transition.  These patterns are naturally brittle, matching one of the possible split-up sequences from LRA.  They have remained surprisingly stable since the LRA transition, up to late-combine.  Thankfully, these tests seem to be the only thing functionally failing with the late-combine introduction.

This PR is about the presumed LRA bug, not about the peephole2 patterns incidentally no longer matching, or of performance for CRIS with late-combine.

I've analyzed as far as the difference being due to a lack of REG_POINTER for the "base" register in the address, which seems to be due to an oversight in combine.cc, only forward-propagated by the first of the late-combine passes.  It seems this is just exposing a flaw in LRA though; the invalid address should not have been generated, as REG_POINTER is not a stable attribute (IIUC); its presence exposes optimization opportunities but its absence should not cause invalid code or ICE.
Comment 1 Eric Botcazou 2024-07-12 08:16:30 UTC
Are you sure about REG_POINTER though?  IIRC the PA port does rely on it for correctness.
Comment 2 Hans-Peter Nilsson 2024-07-12 17:04:21 UTC
(In reply to Eric Botcazou from comment #1)
> Are you sure about REG_POINTER though?  IIRC the PA port does rely on it for
> correctness.

Regarding it not being a stable atrtibute?  No; see the patch submission (linked now), CC:ed to vmakarov for clarification.
Comment 3 Hans-Peter Nilsson 2024-07-14 03:03:51 UTC
From r15-2024-ga01b40c047334c (disabling late-combine for CRIS), you'll need -flate-combine-instructions to expose the bug.
Comment 4 Hans-Peter Nilsson 2024-08-21 00:05:49 UTC
The underlying issue was fixed by the commit fixing PR116236, i.e.
commit r15-2937-g3673b7054ec268c445620b9c52d25e65bc9a7f96, so I'll close this but refresh the attribute-copying patch (adjusting the commit log).
Comment 5 Hans-Peter Nilsson 2024-08-21 00:07:48 UTC
As per "This PR is about the presumed LRA bug, not about the peephole2 patterns incidentally no longer matching, or of performance for CRIS with late-combine" I'm closing this, see my earlier comments.
Comment 6 GCC Commits 2024-08-26 23:07:43 UTC
The master branch has been updated by Hans-Peter Nilsson <hp@gcc.gnu.org>:

https://gcc.gnu.org/g:5031df5d1f4954304c618efd2de029edc6b3699f

commit r15-3204-g5031df5d1f4954304c618efd2de029edc6b3699f
Author: Hans-Peter Nilsson <hp@axis.com>
Date:   Mon Jul 8 03:55:27 2024 +0200

    combine.cc (make_more_copies): Copy attributes from the original pseudo, PR115883
    
    The first of the late-combine passes, propagates some of the copies
    made during the (in-time-)combine pass in make_more_copies into the
    users of the "original" pseudo registers and removes the "old"
    pseudos.  That effectively removes attributes such as REG_POINTER,
    which matter to LRA.  The quoted PR is for an ICE-manifesting bug that
    was exposed by the late-combine pass and went back to hiding with this
    patch until commit r15-2937-g3673b7054ec2, the fix for PR116236, when
    it was actually fixed.  To wit, this patch is only incidentally
    related to that bug.
    
    In other words, the REG_POINTER attribute should not be required for
    LRA to work correctly.  This patch merely corrects state for those
    propagated register-uses to ante late-combine.
    
    For reasons not investigated, this fixes a failing test
    "FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3"
    for x86_64-linux-gnu.
    
            PR middle-end/115883
            * combine.cc (make_more_copies): Copy attributes from the original
            pseudo to the new copy.