Bug 64895

Summary: RA picks the wrong register for -fipa-ra
Product: gcc Reporter: H.J. Lu <hjl.tools>
Component: rtl-optimizationAssignee: Tom de Vries <vries>
Status: REOPENED ---    
Severity: normal CC: iains, jakub, law, vmakarov, vmakarov, vries
Priority: P3 Keywords: missed-optimization, ra
Version: 5.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2015-03-06 00:00:00
Bug Depends on:    
Bug Blocks: 64342    
Attachments: testsuite patch

Description H.J. Lu 2015-02-01 13:47:38 UTC
'-fipa-ra'
     Use caller save registers for allocation if those registers are not
     used by any called function.  In that case it is not necessary to
     save and restore them around calls.  This is only possible if
     called functions are part of same compilation unit as current
     function and they are compiled before it.

But in this case testcase 
[hjl@gnu-tools-1 gcc]$ cat /tmp/x.c 
static int __attribute__((noinline))
bar (int x)
{
  if (x > 4)
    return bar (x - 3);
  return 0;
}

int __attribute__((noinline))
foo (int y)
{
  return y + bar (y);
}
[hjl@gnu-tools-1 gcc]$ ./xgcc -B./ -m32 -fpic -O2 -fipa-ra -fomit-frame-pointer -fno-optimize-sibling-calls -mregparm=1 /tmp/x.c -S -fno-asynchronous-unwind-tables        
[hjl@gnu-tools-1 gcc]$ cat x.s 
	.file	"x.c"
	.section	.text.unlikely,"ax",@progbits
.LCOLDB0:
	.text
.LHOTB0:
	.p2align 4,,15
	.type	bar, @function
bar:
	cmpl	$4, %eax
	jg	.L7
	xorl	%eax, %eax
	ret
	.p2align 4,,10
	.p2align 3
.L7:
	subl	$12, %esp
	subl	$3, %eax
	call	bar
	addl	$12, %esp
	ret
	.size	bar, .-bar
	.section	.text.unlikely
.LCOLDE0:
	.text
.LHOTE0:
	.section	.text.unlikely
.LCOLDB1:
	.text
.LHOTB1:
	.p2align 4,,15
	.globl	foo
	.type	foo, @function
foo:
	pushl	%ebx
	movl	%eax, %ebx
	subl	$8, %esp
	call	bar
	addl	$8, %esp
	addl	%ebx, %eax
	popl	%ebx
	ret
	.size	foo, .-foo
	.section	.text.unlikely
.LCOLDE1:
	.text
.LHOTE1:
	.ident	"GCC: (GNU) 5.0.0 20150201 (experimental)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-tools-1 gcc]$ 

Pick EBX instead of EDX to save EAX in foo.
Comment 1 Jakub Jelinek 2015-02-01 19:09:48 UTC
How could this be a regression?  -fipa-ra is a new option, I believe your testcase isn't a wrong-code, but an enhancement request for a new feature.
Comment 2 H.J. Lu 2015-02-01 19:22:10 UTC
(In reply to Jakub Jelinek from comment #1)
> How could this be a regression?  -fipa-ra is a new option, I believe your
> testcase isn't a wrong-code, but an enhancement request for a new feature.

It is a regression from when -fipa-ra was implemented, probably triggered by
r216154.
Comment 3 Jakub Jelinek 2015-02-01 19:24:48 UTC
That doesn't count, regressions for release management purposes are only regressions from released compilers.
Comment 4 H.J. Lu 2015-02-01 19:25:53 UTC
From

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64342#c9

The test fails due to different and orthogonal issue: RA chooses %ebx for a temporary when -fpic is used:

        movl    %eax, %ebx
        call    bar
        addl    %ebx, %eax

The push and pop insns are from prologue/epilogue since call-saved reg is used (%ebx), not due to save/restore the reg around the call:

(note 3 4 19 2 NOTE_INSN_FUNCTION_BEG)
(insn/f:TI 19 3 20 2 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [0  S4 A8])
        (reg:SI 3 bx)) fuse-caller-save.c:16 66 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 3 bx)
        (nil)))
(note 20 19 2 2 NOTE_INSN_PROLOGUE_END)
...
(note 27 15 24 2 NOTE_INSN_EPILOGUE_BEG)
(insn/f:TI 24 27 25 2 (set (reg:SI 3 bx)
        (mem:SI (post_inc:SI (reg/f:SI 7 sp)) [0  S4 A8])) fuse-caller-save.c:18 74 {*popsi1}
     (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 7 sp)
            (plus:SI (reg/f:SI 7 sp)
                (const_int 4 [0x4])))
        (expr_list:REG_CFA_RESTORE (reg:SI 3 bx)
            (nil))))

So, please open a new PR due to RA issue: since %edx was allocated for PIC register (but later removed), another call-used (%ecx) should be used here.
Comment 5 Uroš Bizjak 2015-03-06 08:04:34 UTC
Confirmed, still happens with r221230 even after PR64342 fix:

The testcase from the Comment #0, compiled with:

-m32 -fpic -O2 -fipa-ra -fomit-frame-pointer -fno-optimize-sibling-calls -mregparm=1 -fno-asynchronous-unwind-tables:

compiles to:

foo:
        pushl   %ebx
        movl    %eax, %ebx
        subl    $8, %esp
        call    bar
        addl    $8, %esp
        addl    %ebx, %eax
        popl    %ebx
        ret
Comment 6 Tom de Vries 2015-03-10 10:24:54 UTC
From PR64342 comment 7:

Register allocation seems to progress similarly, up until this message in reload, which seems to be directly related to the r216154 patch:
...
    Spill r86 after risky transformations
  Reassigning non-reload pseudos
           Assign 3 to r86 (freq=3000)
...

I've looked a bit in more detail at this message.

fipa-ra manages to allocate hardreg 1 (dx) to the call-crossing liverange as before. But setup_live_pseudos_and_spill_after_risky_transforms looks at the conflict regs, and finds that lra_reg_info[regno].conflict_hard_regs contains dx. Hence, it spills the liverange into an available reg, which happens to be hardreg 3 (bx).

So, the risky transformations and fipa-ra have in common that they allocate registers from the conflict_hard_regs set. In the case of -fipa-ra, we don't want setup_live_pseudos_and_spill_after_risky_transforms to undo the -fipa-ra allocation, but currently there's no way to distinguish between the two.

Using this patch, we make sure the conflict registers don't include dx for this testcase, and the test-cases passes again:
...
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 9dfffb6..0231057 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -636,8 +636,11 @@ check_pseudos_live_through_calls (int regno)
   if (! sparseset_bit_p (pseudos_live_through_calls, regno))
     return;
   sparseset_clear_bit (pseudos_live_through_calls, regno);
+
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
-                   call_used_reg_set);
+                   !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set)
+                   ? lra_reg_info[regno].actual_call_used_reg_set
+                   : call_used_reg_set);
 
   for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++)
     if (HARD_REGNO_CALL_PART_CLOBBERED (hr, PSEUDO_REGNO_MODE (regno)))
...
Comment 7 Vladimir Makarov 2015-03-10 19:21:41 UTC
(In reply to vries from comment #6)
> From PR64342 comment 7:
> 
> Register allocation seems to progress similarly, up until this message in
> reload, which seems to be directly related to the r216154 patch:
> ...
>     Spill r86 after risky transformations
>   Reassigning non-reload pseudos
>            Assign 3 to r86 (freq=3000)
> ...
> 
> I've looked a bit in more detail at this message.
> 
> fipa-ra manages to allocate hardreg 1 (dx) to the call-crossing liverange as
> before. But setup_live_pseudos_and_spill_after_risky_transforms looks at the
> conflict regs, and finds that lra_reg_info[regno].conflict_hard_regs
> contains dx. Hence, it spills the liverange into an available reg, which
> happens to be hardreg 3 (bx).
> 
> So, the risky transformations and fipa-ra have in common that they allocate
> registers from the conflict_hard_regs set. In the case of -fipa-ra, we don't
> want setup_live_pseudos_and_spill_after_risky_transforms to undo the
> -fipa-ra allocation, but currently there's no way to distinguish between the
> two.
> 

Right.  Introduction of pic pseudo and possible rematerializations of memory references with the pic pseudo made checking the new conflict first necessary.  Before this it was not a problem as pic hard reg was used only for pic addressing.

> Using this patch, we make sure the conflict registers don't include dx for
> this testcase, and the test-cases passes again:
> ...
> diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
> index 9dfffb6..0231057 100644
> --- a/gcc/lra-lives.c
> +++ b/gcc/lra-lives.c
> @@ -636,8 +636,11 @@ check_pseudos_live_through_calls (int regno)
>    if (! sparseset_bit_p (pseudos_live_through_calls, regno))
>      return;
>    sparseset_clear_bit (pseudos_live_through_calls, regno);
> +
>    IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
> -                   call_used_reg_set);
> +                   !hard_reg_set_empty_p
> (lra_reg_info[regno].actual_call_used_reg_set)
> +                   ? lra_reg_info[regno].actual_call_used_reg_set
> +                   : call_used_reg_set);
>  
>    for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++)
>      if (HARD_REGNO_CALL_PART_CLOBBERED (hr, PSEUDO_REGNO_MODE (regno)))
> ...

The patch looks ok to me.  Tom, could you prepare the patch (check it mostly for x86-64 bootstrap and testsuite) and commit it to the trunk.  I approve it.
Comment 8 Tom de Vries 2015-03-12 07:14:58 UTC
Author: vries
Date: Thu Mar 12 06:59:34 2015
New Revision: 221372

URL: https://gcc.gnu.org/viewcvs?rev=221372&root=gcc&view=rev
Log:
Use actual_call_used_reg_set to find conflicting regs

2015-03-12  Tom de Vries  <tom@codesourcery.com>

	* lra-lives.c (check_pseudos_live_through_calls): Use
	actual_call_used_reg_set instead of call_used_reg_set, if available.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-lives.c
Comment 9 Tom de Vries 2015-03-12 11:13:38 UTC
Author: vries
Date: Thu Mar 12 08:01:24 2015
New Revision: 221374

URL: https://gcc.gnu.org/viewcvs?rev=221374&root=gcc&view=rev
Log:
Revert 'require nonpic target' for fuse-caller-save*.c

2015-03-12  Tom de Vries  <tom@codesourcery.com>

	PR rtl-optimization/64895
	* gcc.target/i386/fuse-caller-save-rec.c: Revert require nonpic target.
	* gcc.target/i386/fuse-caller-save-xmm.c: Ditto.
	* gcc.target/i386/fuse-caller-save.c: Ditto.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c
    trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c
    trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
Comment 10 Tom de Vries 2015-03-16 09:42:53 UTC
Author: vries
Date: Mon Mar 16 09:42:21 2015
New Revision: 221448

URL: https://gcc.gnu.org/viewcvs?rev=221448&root=gcc&view=rev
Log:
Revert 'Use actual_call_used_reg_set to find conflicting regs'

2015-03-16  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/65414
	Revert:
	2015-03-12  Tom de Vries  <tom@codesourcery.com>

	PR rtl-optimization/64895
	* lra-lives.c (check_pseudos_live_through_calls): Use
	actual_call_used_reg_set instead of call_used_reg_set, if available.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-lives.c
Comment 11 Jakub Jelinek 2015-03-16 09:44:25 UTC
Patch has been reverted.
Comment 12 Renlin Li 2016-01-20 18:01:01 UTC
The same happens for aarch64.

> [hjl@gnu-tools-1 gcc]$ cat /tmp/x.c 
> static int __attribute__((noinline))
> bar (int x)
> {
>   if (x > 4)
>     return bar (x - 3);
>   return 0;
> }
> 
> int __attribute__((noinline))
> foo (int y)
> {
>   return y + bar (y);
> }
> 

There is another problem here actually.

In this particular case, bar() is a static function which is not exported. Although -fpic option is provided, pic_offset_table_rtx is not used at all in function foo().

In this case, pic_offset_table_rtx may not be initialized at all. The target hook TARGET_INIT_PIC_REG can be improved to initialize pic register only when necessary.


On the other hand, if pic_offset_table_rtx is not used at all, lra_risky_transformations_p should not be true. Does it make sensible?

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index a78edd8..d4a950f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4221,7 +4221,8 @@ lra_constraints (bool first_p)
             lra_constraint_iter);
   changed_p = false;
   if (pic_offset_table_rtx
-      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+      && (i = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
+      && lra_reg_info[i].nrefs > 0)
     lra_risky_transformations_p = true;
   else
     lra_risky_transformations_p = false;
Comment 13 Dominique d'Humieres 2017-01-28 14:46:16 UTC
Between revisions r244915 and r244957 I got the following XPASS

XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not pop
XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not push
XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-times addl\\t%[re]?dx, %[re]?ax 1
XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not pop
XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not push
XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-times addl\\t%[re]?d[ix], %[re]?ax 1
Comment 14 Dominique d'Humieres 2017-01-28 14:50:29 UTC
> Between revisions r244915 and r244957 I got the following XPASS

I have forgotten to say that it is on x86_64-apple-darwin16.
Comment 15 Iain Sandoe 2019-05-22 14:29:12 UTC
(IIUC the thread here) It looks to me that the codegen is now DTRT for both pic and non pic.
Darwin is doing pic by default, so sees XPASSes
There is no Linux pic test (so the change was not noticed there):

(I will produce a patch for the tests on the basis that this is now fixed).

Linux x86-64 (r271505):

Running target unix/-fpic/-m32
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/iains/gcc-trunk/src-local/gcc/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /home/iains/gcc-trunk/src-local/gcc/testsuite/gcc.target/i386/i386.exp ...
XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not push
XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-not pop
XPASS: gcc.target/i386/fuse-caller-save-rec.c scan-assembler-times addl\t%[re]?dx, %[re]?ax 1
FAIL: gcc.target/i386/fuse-caller-save-xmm.c scan-assembler-times addpd\t\\.?LC0.*, %xmm0 1
XPASS: gcc.target/i386/fuse-caller-save-xmm.c scan-assembler-times addpd\t%xmm1, %xmm0 1
XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not push
XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-not pop
XPASS: gcc.target/i386/fuse-caller-save.c scan-assembler-times addl\t%[re]?d[ix], %[re]?ax 1

		=== gcc Summary for unix/-fpic/-m64 ===

# of expected passes		18

=====
code output below - Darwin produces the same (or equivalent, for m32/pic).
=====
(extraneous lines snipped for clarity)

fuse-caller-save-rec.c -O2 -fipa-ra -fomit-frame-pointer -fno-optimize-sibling-calls -mregparm=1 -m32 -S {,-fpic}

bar:
	cmpl	$4, %eax
	jg	.L9
	xorl	%eax, %eax
	ret
.L9:
	subl	$12, %esp
	subl	$3, %eax
	call	bar
	addl	$12, %esp
	ret

foo:
	subl	$12, %esp
	movl	%eax, %edx
	call	bar
	addl	$12, %esp
	addl	%edx, %eax
	ret
=====
fuse-caller-save.c -O2 -fipa-ra -fomit-frame-pointer -mregparm=1 -m32 -S {,-fpic}

bar:
	addl	$3, %eax
	ret
foo:

	movl	%eax, %edx
	call	bar
	addl	%edx, %eax
	ret

===== 
fuse-caller-save-xmm.c -O2 -fipa-ra -fomit-frame-pointer -msse2 -mno-avx -m32 -S

bar:
	addpd	.LC0, %xmm0
	ret

foo:
	subl	$12, %esp
	movapd	%xmm0, %xmm1
	call	bar
	addl	$12, %esp
	addpd	%xmm1, %xmm0
	ret

fuse-caller-save-xmm.c -O2 -fipa-ra -fomit-frame-pointer -msse2 -mno-avx -m32 -S -fpic

bar:
	call	__x86.get_pc_thunk.ax
	addl	$_GLOBAL_OFFSET_TABLE_, %eax
	movapd	.LC0@GOTOFF(%eax), %xmm1
	addpd	%xmm1, %xmm0
	ret

foo:
	subl	$12, %esp
	movapd	%xmm0, %xmm2
	call	bar
	addl	$12, %esp
	addpd	%xmm2, %xmm0
	ret
Comment 16 Iain Sandoe 2019-05-22 15:48:20 UTC
Created attachment 46398 [details]
testsuite patch

Will post this later, tested on x86_64-linux and x86_64-darwin.
Comment 17 Iain Sandoe 2019-05-23 09:24:20 UTC
Author: iains
Date: Thu May 23 09:23:47 2019
New Revision: 271544

URL: https://gcc.gnu.org/viewcvs?rev=271544&root=gcc&view=rev
Log:
x86, testsuite - update fuse-caller-save tests.

These tests had started to XPASS on pic targets where
the codegen is now as expected.

gcc/testsuite/

2019-05-23  Iain Sandoe  <iain@sandoe.co.uk>

	PR rtl-optimisation/64895
	* gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs.
	* gcc.target/i386/fuse-caller-save.c: Likewise.
	* gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for
	PIC cases, remove XFAILs.


Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c
    trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c
    trunk/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
Comment 18 Iain Sandoe 2019-08-04 10:25:07 UTC
Author: iains
Date: Sun Aug  4 10:24:34 2019
New Revision: 274062

URL: https://gcc.gnu.org/viewcvs?rev=274062&root=gcc&view=rev
Log:
Darwin, X86, backport fixes for 64895.

Although this is marked as rtl-optimisation, the Darwin
issue is that the testcase XPASS there since the codegen
is different.

2019-08-04  Iain Sandoe  <iain@sandoe.co.uk>

	Backport from mainline.
	2019-05-23  Iain Sandoe  <iain@sandoe.co.uk>

	PR rtl-optimisation/64895
	* gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs.
	* gcc.target/i386/fuse-caller-save.c: Likewise.
	* gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for
	PIC cases, remove XFAILs.


Modified:
    branches/gcc-9-branch/gcc/testsuite/ChangeLog
    branches/gcc-9-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c
    branches/gcc-9-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c
    branches/gcc-9-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
Comment 19 Iain Sandoe 2019-08-04 10:27:37 UTC
testsuite fix applied for pic targets on trunk and 9.2
Comment 20 Iain Sandoe 2019-08-30 19:01:15 UTC
Author: iains
Date: Fri Aug 30 19:00:44 2019
New Revision: 275213

URL: https://gcc.gnu.org/viewcvs?rev=275213&root=gcc&view=rev
Log:
[Darwin, testsuite] Backport fix for 64895 XPASSes.

These tests don't fail on Darwin.

2019-08-30  Iain Sandoe  <iain@sandoe.co.uk>

	Backport from mainline.
	2019-05-23  Iain Sandoe  <iain@sandoe.co.uk>

	PR rtl-optimisation/64895
	* gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs.
	* gcc.target/i386/fuse-caller-save.c: Likewise.
	* gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for
	PIC cases, remove XFAILs.


Modified:
    branches/gcc-8-branch/gcc/testsuite/ChangeLog
    branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c
    branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c
    branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
Comment 21 Iain Sandoe 2019-09-08 19:41:52 UTC
Author: iains
Date: Sun Sep  8 19:41:20 2019
New Revision: 275495

URL: https://gcc.gnu.org/viewcvs?rev=275495&root=gcc&view=rev
Log:
[X86, testsuite] Fix PR rtl-optimisation/64895 XPASSes.

These tests had started to XPASS on pic targets where the codegen
is now as expected.

2019-09-08  Iain Sandoe  <iain@sandoe.co.uk>

	Backport from mainline.
	2019-05-23  Iain Sandoe  <iain@sandoe.co.uk>

	PR rtl-optimisation/64895
	* gcc.target/i386/fuse-caller-save-rec.c: Remove XFAILs.
	* gcc.target/i386/fuse-caller-save.c: Likewise.
	* gcc.target/i386/fuse-caller-save-xmm.c: Adjust tests for
	PIC cases, remove XFAILs.


Modified:
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/fuse-caller-save.c