Summary: | RA picks the wrong register for -fipa-ra | ||
---|---|---|---|
Product: | gcc | Reporter: | H.J. Lu <hjl.tools> |
Component: | rtl-optimization | Assignee: | Tom de Vries <vries> |
Status: | REOPENED --- | ||
Severity: | normal | CC: | iains, jakub, law, vmakarov, vmakarov, vries |
Priority: | P3 | Keywords: | missed-optimization, ra |
Version: | 5.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2015-03-06 00:00:00 | |
Bug Depends on: | |||
Bug Blocks: | 64342 | ||
Attachments: | testsuite patch |
Description
H.J. Lu
2015-02-01 13:47:38 UTC
How could this be a regression? -fipa-ra is a new option, I believe your testcase isn't a wrong-code, but an enhancement request for a new feature. (In reply to Jakub Jelinek from comment #1) > How could this be a regression? -fipa-ra is a new option, I believe your > testcase isn't a wrong-code, but an enhancement request for a new feature. It is a regression from when -fipa-ra was implemented, probably triggered by r216154. That doesn't count, regressions for release management purposes are only regressions from released compilers. From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64342#c9 The test fails due to different and orthogonal issue: RA chooses %ebx for a temporary when -fpic is used: movl %eax, %ebx call bar addl %ebx, %eax The push and pop insns are from prologue/epilogue since call-saved reg is used (%ebx), not due to save/restore the reg around the call: (note 3 4 19 2 NOTE_INSN_FUNCTION_BEG) (insn/f:TI 19 3 20 2 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [0 S4 A8]) (reg:SI 3 bx)) fuse-caller-save.c:16 66 {*pushsi2} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) (note 20 19 2 2 NOTE_INSN_PROLOGUE_END) ... (note 27 15 24 2 NOTE_INSN_EPILOGUE_BEG) (insn/f:TI 24 27 25 2 (set (reg:SI 3 bx) (mem:SI (post_inc:SI (reg/f:SI 7 sp)) [0 S4 A8])) fuse-caller-save.c:18 74 {*popsi1} (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int 4 [0x4]))) (expr_list:REG_CFA_RESTORE (reg:SI 3 bx) (nil)))) So, please open a new PR due to RA issue: since %edx was allocated for PIC register (but later removed), another call-used (%ecx) should be used here. Confirmed, still happens with r221230 even after PR64342 fix: The testcase from the Comment #0, compiled with: -m32 -fpic -O2 -fipa-ra -fomit-frame-pointer -fno-optimize-sibling-calls -mregparm=1 -fno-asynchronous-unwind-tables: compiles to: foo: pushl %ebx movl %eax, %ebx subl $8, %esp call bar addl $8, %esp addl %ebx, %eax popl %ebx ret From PR64342 comment 7: Register allocation seems to progress similarly, up until this message in reload, which seems to be directly related to the r216154 patch: ... Spill r86 after risky transformations Reassigning non-reload pseudos Assign 3 to r86 (freq=3000) ... I've looked a bit in more detail at this message. fipa-ra manages to allocate hardreg 1 (dx) to the call-crossing liverange as before. But setup_live_pseudos_and_spill_after_risky_transforms looks at the conflict regs, and finds that lra_reg_info[regno].conflict_hard_regs contains dx. Hence, it spills the liverange into an available reg, which happens to be hardreg 3 (bx). So, the risky transformations and fipa-ra have in common that they allocate registers from the conflict_hard_regs set. In the case of -fipa-ra, we don't want setup_live_pseudos_and_spill_after_risky_transforms to undo the -fipa-ra allocation, but currently there's no way to distinguish between the two. Using this patch, we make sure the conflict registers don't include dx for this testcase, and the test-cases passes again: ... diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c index 9dfffb6..0231057 100644 --- a/gcc/lra-lives.c +++ b/gcc/lra-lives.c @@ -636,8 +636,11 @@ check_pseudos_live_through_calls (int regno) if (! sparseset_bit_p (pseudos_live_through_calls, regno)) return; sparseset_clear_bit (pseudos_live_through_calls, regno); + IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, - call_used_reg_set); + !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set) + ? lra_reg_info[regno].actual_call_used_reg_set + : call_used_reg_set); for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++) if (HARD_REGNO_CALL_PART_CLOBBERED (hr, PSEUDO_REGNO_MODE (regno))) ... (In reply to vries from comment #6) > From PR64342 comment 7: > > Register allocation seems to progress similarly, up until this message in > reload, which seems to be directly related to the r216154 patch: > ... > Spill r86 after risky transformations > Reassigning non-reload pseudos > Assign 3 to r86 (freq=3000) > ... > > I've looked a bit in more detail at this message. > > fipa-ra manages to allocate hardreg 1 (dx) to the call-crossing liverange as > before. But setup_live_pseudos_and_spill_after_risky_transforms looks at the > conflict regs, and finds that lra_reg_info[regno].conflict_hard_regs > contains dx. Hence, it spills the liverange into an available reg, which > happens to be hardreg 3 (bx). > > So, the risky transformations and fipa-ra have in common that they allocate > registers from the conflict_hard_regs set. In the case of -fipa-ra, we don't > want setup_live_pseudos_and_spill_after_risky_transforms to undo the > -fipa-ra allocation, but currently there's no way to distinguish between the > two. > Right. Introduction of pic pseudo and possible rematerializations of memory references with the pic pseudo made checking the new conflict first necessary. Before this it was not a problem as pic hard reg was used only for pic addressing. > Using this patch, we make sure the conflict registers don't include dx for > this testcase, and the test-cases passes again: > ... > diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c > index 9dfffb6..0231057 100644 > --- a/gcc/lra-lives.c > +++ b/gcc/lra-lives.c > @@ -636,8 +636,11 @@ check_pseudos_live_through_calls (int regno) > if (! sparseset_bit_p (pseudos_live_through_calls, regno)) > return; > sparseset_clear_bit (pseudos_live_through_calls, regno); > + > IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, > - call_used_reg_set); > + !hard_reg_set_empty_p > (lra_reg_info[regno].actual_call_used_reg_set) > + ? lra_reg_info[regno].actual_call_used_reg_set > + : call_used_reg_set); > > for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++) > if (HARD_REGNO_CALL_PART_CLOBBERED (hr, PSEUDO_REGNO_MODE (regno))) > ... The patch looks ok to me. Tom, could you prepare the patch (check it mostly for x86-64 bootstrap and testsuite) and commit it to the trunk. I approve it. Author: vries Date: Thu Mar 12 06:59:34 2015 New Revision: 221372 URL: https://gcc.gnu.org/viewcvs?rev=221372&root=gcc&view=rev Log: Use actual_call_used_reg_set to find conflicting regs 2015-03-12 Tom de Vries <tom@codesourcery.com> * lra-lives.c (check_pseudos_live_through_calls): Use actual_call_used_reg_set instead of call_used_reg_set, if available. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-lives.c Author: vries Date: Thu Mar 12 08:01:24 2015 New Revision: 221374 URL: https://gcc.gnu.org/viewcvs?rev=221374&root=gcc&view=rev Log: Revert 'require nonpic target' for fuse-caller-save*.c 2015-03-12 Tom de Vries <tom@codesourcery.com> PR rtl-optimization/64895 * gcc.target/i386/fuse-caller-save-rec.c: Revert require nonpic target. * gcc.target/i386/fuse-caller-save-xmm.c: Ditto. * gcc.target/i386/fuse-caller-save.c: Ditto. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save.c Author: vries Date: Mon Mar 16 09:42:21 2015 New Revision: 221448 URL: https://gcc.gnu.org/viewcvs?rev=221448&root=gcc&view=rev Log: Revert 'Use actual_call_used_reg_set to find conflicting regs' 2015-03-16 Tom de Vries <tom@codesourcery.com> PR middle-end/65414 Revert: 2015-03-12 Tom de Vries <tom@codesourcery.com> PR rtl-optimization/64895 * lra-lives.c (check_pseudos_live_through_calls): Use actual_call_used_reg_set instead of call_used_reg_set, if available. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-lives.c Patch has been reverted. The same happens for aarch64.
> [hjl@gnu-tools-1 gcc]$ cat /tmp/x.c
> static int __attribute__((noinline))
> bar (int x)
> {
> if (x > 4)
> return bar (x - 3);
> return 0;
> }
>
> int __attribute__((noinline))
> foo (int y)
> {
> return y + bar (y);
> }
>
There is another problem here actually.
In this particular case, bar() is a static function which is not exported. Although -fpic option is provided, pic_offset_table_rtx is not used at all in function foo().
In this case, pic_offset_table_rtx may not be initialized at all. The target hook TARGET_INIT_PIC_REG can be improved to initialize pic register only when necessary.
On the other hand, if pic_offset_table_rtx is not used at all, lra_risky_transformations_p should not be true. Does it make sensible?
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index a78edd8..d4a950f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4221,7 +4221,8 @@ lra_constraints (bool first_p)
lra_constraint_iter);
changed_p = false;
if (pic_offset_table_rtx
- && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+ && (i = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
+ && lra_reg_info[i].nrefs > 0)
lra_risky_transformations_p = true;
else
lra_risky_transformations_p = false;
Between revisions r244915 and r244957 I got the following XPASS XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not pop XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not push XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-times addl\\t%[re]?dx, %[re]?ax 1 XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not pop XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not push XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-times addl\\t%[re]?d[ix], %[re]?ax 1 > Between revisions r244915 and r244957 I got the following XPASS
I have forgotten to say that it is on x86_64-apple-darwin16.
(IIUC the thread here) It looks to me that the codegen is now DTRT for both pic and non pic. Darwin is doing pic by default, so sees XPASSes There is no Linux pic test (so the change was not noticed there): (I will produce a patch for the tests on the basis that this is now fixed). Linux x86-64 (r271505): Running target unix/-fpic/-m32 Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/iains/gcc-trunk/src-local/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/iains/gcc-trunk/src-local/gcc/testsuite/gcc.target/i386/i386.exp ... XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not push XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not pop XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-times addl\t%[re]?dx, %[re]?ax 1 FAIL: gcc.target/i386/fuse-caller-save-xmm.c scan-assembler-times addpd\t\\.?LC0.*, %xmm0 1 XPASS: gcc.target/i386/fuse-caller-save-xmm.c scan-assembler-times addpd\t%xmm1, %xmm0 1 XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not push XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not pop XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-times addl\t%[re]?d[ix], %[re]?ax 1 === gcc Summary for unix/-fpic/-m64 === # of expected passes 18 ===== code output below - Darwin produces the same (or equivalent, for m32/pic). ===== (extraneous lines snipped for clarity) fuse-caller-save-rec.c -O2 -fipa-ra -fomit-frame-pointer -fno-optimize-sibling-calls -mregparm=1 -m32 -S {,-fpic} bar: cmpl $4, %eax jg .L9 xorl %eax, %eax ret .L9: subl $12, %esp subl $3, %eax call bar addl $12, %esp ret foo: subl $12, %esp movl %eax, %edx call bar addl $12, %esp addl %edx, %eax ret ===== fuse-caller-save.c -O2 -fipa-ra -fomit-frame-pointer -mregparm=1 -m32 -S {,-fpic} bar: addl $3, %eax ret foo: movl %eax, %edx call bar addl %edx, %eax ret ===== fuse-caller-save-xmm.c -O2 -fipa-ra -fomit-frame-pointer -msse2 -mno-avx -m32 -S bar: addpd .LC0, %xmm0 ret foo: subl $12, %esp movapd %xmm0, %xmm1 call bar addl $12, %esp addpd %xmm1, %xmm0 ret fuse-caller-save-xmm.c -O2 -fipa-ra -fomit-frame-pointer -msse2 -mno-avx -m32 -S -fpic bar: call __x86.get_pc_thunk.ax addl $_GLOBAL_OFFSET_TABLE_, %eax movapd .LC0@GOTOFF(%eax), %xmm1 addpd %xmm1, %xmm0 ret foo: subl $12, %esp movapd %xmm0, %xmm2 call bar addl $12, %esp addpd %xmm2, %xmm0 ret Created attachment 46398 [details]
testsuite patch
Will post this later, tested on x86_64-linux and x86_64-darwin.
Author: iains Date: Thu May 23 09:23:47 2019 New Revision: 271544 URL: https://gcc.gnu.org/viewcvs?rev=271544&root=gcc&view=rev Log: x86, testsuite - update fuse-caller-save tests. These tests had started to XPASS on pic targets where the codegen is now as expected. gcc/testsuite/ 2019-05-23 Iain Sandoe <iain@sandoe.co.uk> PR rtl-optimisation/64895 * gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs. * gcc.target/i386/fuse-caller-save.c: Likewise. * gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for PIC cases, remove XFAILs. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save.c Author: iains Date: Sun Aug 4 10:24:34 2019 New Revision: 274062 URL: https://gcc.gnu.org/viewcvs?rev=274062&root=gcc&view=rev Log: Darwin, X86, backport fixes for 64895. Although this is marked as rtl-optimisation, the Darwin issue is that the testcase XPASS there since the codegen is different. 2019-08-04 Iain Sandoe <iain@sandoe.co.uk> Backport from mainline. 2019-05-23 Iain Sandoe <iain@sandoe.co.uk> PR rtl-optimisation/64895 * gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs. * gcc.target/i386/fuse-caller-save.c: Likewise. * gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for PIC cases, remove XFAILs. Modified: branches/gcc-9-branch/gcc/testsuite/ChangeLog branches/gcc-9-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c branches/gcc-9-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c branches/gcc-9-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save.c testsuite fix applied for pic targets on trunk and 9.2 Author: iains Date: Fri Aug 30 19:00:44 2019 New Revision: 275213 URL: https://gcc.gnu.org/viewcvs?rev=275213&root=gcc&view=rev Log: [Darwin, testsuite] Backport fix for 64895 XPASSes. These tests don't fail on Darwin. 2019-08-30 Iain Sandoe <iain@sandoe.co.uk> Backport from mainline. 2019-05-23 Iain Sandoe <iain@sandoe.co.uk> PR rtl-optimisation/64895 * gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs. * gcc.target/i386/fuse-caller-save.c: Likewise. * gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for PIC cases, remove XFAILs. Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save.c Author: iains Date: Sun Sep 8 19:41:20 2019 New Revision: 275495 URL: https://gcc.gnu.org/viewcvs?rev=275495&root=gcc&view=rev Log: [X86, testsuite] Fix PR rtl-optimisation/64895 XPASSes. These tests had started to XPASS on pic targets where the codegen is now as expected. 2019-09-08 Iain Sandoe <iain@sandoe.co.uk> Backport from mainline. 2019-05-23 Iain Sandoe <iain@sandoe.co.uk> PR rtl-optimisation/64895 * gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs. * gcc.target/i386/fuse-caller-save.c: Likewise. * gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for PIC cases, remove XFAILs. Modified: branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save.c |