56339 – [4.8 Regression]: Suboptimal register allocation

Bug 56339 - [4.8 Regression]: Suboptimal register allocation

Summary: [4.8 Regression]: Suboptimal register allocation

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.8.0

Importance:	P2 normal
Target Milestone:	4.9.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization, ra

Depends on:
Blocks:

Reported:	2013-02-15 08:53 UTC by Uroš Bizjak
Modified:	2015-06-23 08:32 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	4.8.5
Last reconfirmed:	2013-02-15 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Uroš Bizjak 2013-02-15 08:53:07 UTC

Split out from PR55941.

The testcase from referred PR

--cut here--
typedef struct A { double a, b; } A;
double f(A x,double y){return x.a+y;}
--cut here--

compiles to (-O2):

f:
        addsd   %xmm0, %xmm2
        movapd  %xmm2, %xmm0
        ret

gcc-4.7 generates:

f:
        addsd   %xmm2, %xmm0
        ret

LRA does not take into account commutative operands here.

Comment 1 Steven Bosscher 2013-02-15 09:51:56 UTC

Confirmed.

Actually, it looks to me like IRA already makes this allocation:
Disposition:
    1:r64  l0    21    0:r67  l0    23
(21=>%xmm0, 23=>%xmm2)

=> .ira dump:

;; basic block 2
    2: r64:DF=xmm0:DF
    8: r67:DF=xmm2:DF
   12: r67:DF=r67:DF+r64:DF
   17: xmm0:DF=r67:DF
   20: use xmm0:DF

=> .reload (LRA) dump:

********** Local #1: **********
         Choosing alt 0 in insn 12:  (0) =x  (1) %0  (2) xm

;; basic block 2
    2: xmm0:DF=xmm0:DF
    8: xmm2:DF=xmm2:DF
   12: xmm2:DF=xmm2:DF+xmm0:DF
   17: xmm0:DF=xmm2:DF
   20: use xmm0:DF

So all LRA does, is follow IRA's recommended allocation.  That is IMHO
the right thing to do, too.

Comment 2 Steven Bosscher 2013-02-15 10:00:13 UTC

The "unbreakable" insns 12 "xmm2:DF=xmm2:DF+xmm0:DF" is created by regmove.

.ce3 dump:
    2: r64:DF=xmm0:DF
    8: r66:DF=xmm2:DF
   12: r67:DF=r66:DF+r64:DF
   17: xmm0:DF=r67:DF
   20: use xmm0:DF

.regmove dump:
Could fix operand 1 of insn 12 matching operand 0.
    2: r64:DF=xmm0:DF
    8: r67:DF=xmm2:DF
   12: r67:DF=r67:DF+r64:DF
   17: xmm0:DF=r67:DF
   20: use xmm0:DF

With -fno-regmove:
        addsd   %xmm2, %xmm0
        ret

Comment 3 Steven Bosscher 2013-02-15 10:17:19 UTC

Before regmove, both input operands die in insn 12:
   12: r67:DF=r66:DF+r64:DF
      REG_DEAD r66:DF
      REG_DEAD r64:DF

and because reg classes haven't been set up, r66 and r67 both have
GENERAL_REGS as their preferred register class so regmove doesn't see
that r64 and r67 share the same preferred register %xmm0:

Breakpoint 1, regmove_backward_pass () at ../../trunk/gcc/regmove.c:1088
1088                  if (dump_file)
(gdb) p reg_preferred_class (64) 
$10 = GENERAL_REGS
(gdb) p reg_preferred_class (66)
$11 = GENERAL_REGS
(gdb) p reg_preferred_class (67)
$12 = GENERAL_REGS
(gdb) p ira_set_pseudo_classes (true, dump_file)
$13 = void
(gdb) p reg_preferred_class (64)
$14 = SSE_FIRST_REG
(gdb) p reg_preferred_class (66)
$15 = SSE_REGS
(gdb) p reg_preferred_class (67)
$16 = SSE_FIRST_REG
(gdb)

Comment 4 Steven Bosscher 2013-02-15 10:20:52 UTC

Perhaps for regmove IRA classes should be set up unconditionally:

Index: regmove.c
===================================================================
--- regmove.c   (revision 196074)
+++ regmove.c   (working copy)
@@ -1234,8 +1234,9 @@ regmove_optimize (void)
   regstat_init_n_sets_and_refs ();
   regstat_compute_ri ();
 
-  if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (true, dump_file);
+  /* Set up register classes for pseudos, so that reg_preferred_class
+     returns a more useful result.  */
+  ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
@@ -1256,8 +1257,7 @@ regmove_optimize (void)
     }
   regstat_free_n_sets_and_refs ();
   regstat_free_ri ();
-  if (flag_ira_loop_pressure)
-    free_reg_info ();
+  free_reg_info ();
   return 0;
 }

Comment 5 Vladimir Makarov 2013-02-15 16:48:19 UTC

(In reply to comment #4)
> Perhaps for regmove IRA classes should be set up unconditionally:
> 
> Index: regmove.c
> ===================================================================
> --- regmove.c   (revision 196074)
> +++ regmove.c   (working copy)
> @@ -1234,8 +1234,9 @@ regmove_optimize (void)
>    regstat_init_n_sets_and_refs ();
>    regstat_compute_ri ();
> 
> -  if (flag_ira_loop_pressure)
> -    ira_set_pseudo_classes (true, dump_file);
> +  /* Set up register classes for pseudos, so that reg_preferred_class
> +     returns a more useful result.  */
> +  ira_set_pseudo_classes (true, dump_file);
> 
>    regno_src_regno = XNEWVEC (int, nregs);
>    for (i = nregs; --i >= 0; )
> @@ -1256,8 +1257,7 @@ regmove_optimize (void)
>      }
>    regstat_free_n_sets_and_refs ();
>    regstat_free_ri ();
> -  if (flag_ira_loop_pressure)
> -    free_reg_info ();
> +  free_reg_info ();
>    return 0;
>  }

It can be a solution.  I see only one drawback, it is expensive.  Setting classes is expensive procedure requiring 2 passes over all insns, their alternatives,and classes for each pseudo operand.

In general, it still will not work for other cases.  We are lucky that xmm0 forms own class SSE_FIRST_REG.  Regmove for general cases should see hard regs not classes.

This is not the first PR about regmove.  I'd like to remove big part of regmove concerning matching operands as IRA/LRA can deal with this.  Unfortunately, not too well when hard regs exposed in RTL and some work should be done to improve this code.  I am going to do for gcc4.9.

Comment 6 Richard Biener 2013-03-08 15:39:21 UTC

So - what regressed this compared to 4.7?  It wasn't regmove.c changes.

Comment 7 Uroš Bizjak 2013-03-08 15:51:15 UTC

(In reply to comment #6)
> So - what regressed this compared to 4.7?  It wasn't regmove.c changes.

As said in comment 0:

gcc-4.7 generates:

f:
        addsd   %xmm2, %xmm0
        ret

gcc-4.8 generates:

f:
        addsd   %xmm0, %xmm2
        movapd  %xmm2, %xmm0
        ret

Comment 8 Steven Bosscher 2013-03-08 18:49:09 UTC

(In reply to comment #6)
> So - what regressed this compared to 4.7?  It wasn't regmove.c changes.

Probably LRA, it better respects IRA's choices (which is good).  The fix
should be found in a change to IRA or regmove.

Comment 9 Jakub Jelinek 2013-03-08 19:46:59 UTC

You don't need to guess, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55941#c1 mentions the commits that regressed it.

Comment 10 Richard Biener 2013-03-11 10:46:16 UTC

Thus CCing the offending people.

Comment 11 Uroš Bizjak 2013-03-11 11:49:03 UTC

(In reply to comment #10)
> Thus CCing the offending people.

One of the offending people is the reporter.

Comment 12 Jakub Jelinek 2013-03-11 11:57:57 UTC

I don't think this is P1.
When looking at the dumps, right after expansion the 4.7 expanded code actually looks much worse compared to the 4.8 expanded one (4.7 goes through memory, while 4.8 through subregs), in *.ud-dce it is pretty much comparable, though 4.7 has one extra move:
    2 r62:DF=xmm0:DF
      REG_DEAD: xmm0:DF
    6 r64:DF=xmm2:DF
      REG_DEAD: xmm2:DF
   21 r66:DF=r62:DF
      REG_DEAD: r62:DF
    7 NOTE_INSN_FUNCTION_BEG
   10 r65:DF=r64:DF+r66:DF
      REG_DEAD: r66:DF
      REG_DEAD: r64:DF
   15 xmm0:DF=r65:DF
      REG_DEAD: r65:DF
   18 use xmm0:DF
in 4.7 vs.
    2: r64:DF=xmm0:DF
      REG_DEAD xmm0:DF
    8: r66:DF=xmm2:DF
      REG_DEAD xmm2:DF
    9: NOTE_INSN_FUNCTION_BEG
   12: r67:DF=r66:DF+r64:DF
      REG_DEAD r66:DF
      REG_DEAD r64:DF
   17: xmm0:DF=r67:DF
      REG_DEAD r67:DF
   20: use xmm0:DF
in 4.8.  The combiner change is what matters for the later behavior of regmove, RA and reload/LRA, but unfortunately that is an ICE fix that can't be reverted,
we really must avoid to propagating hard registers too early, otherwise RA needs to just give up.
So, in 4.7 we end up with r65:DF=xmm2:DF+r62:DF after combine, while 4.8 still uses pseudos.  Which is the reason why regmove sets that into the optimal for this testcase r65:DF=xmm2:DF+r65:DF, because it has no other choice (the operand was already a hard reg), while in 4.8 it has a choice (sees two different pseudos, and chooses in this case the wrong one).  I'm afraid there is no easy fix, so IMHO this needs to be postponed for 4.9.

Comment 13 Jakub Jelinek 2013-03-22 14:43:10 UTC

GCC 4.8.0 is being released, adjusting target milestone.

Comment 14 Jakub Jelinek 2013-05-31 10:58:07 UTC

GCC 4.8.1 has been released.

Comment 15 Jakub Jelinek 2013-10-16 09:48:44 UTC

GCC 4.8.2 has been released.

Comment 16 Vladimir Makarov 2013-11-26 20:24:22 UTC

The problem was fixed by the patch removing regmove and improving hardware reg preferences in IRA.

Comment 17 Uroš Bizjak 2013-12-12 18:09:14 UTC

gcc-4.9 now generates:

f:
        addsd   %xmm2, %xmm0
        ret

The problem is fixed in 4.9, reconfirmed on 4.8 branch.

Comment 18 Richard Biener 2014-05-22 09:02:21 UTC

GCC 4.8.3 is being released, adjusting target milestone.

Comment 19 Jakub Jelinek 2014-12-19 13:32:01 UTC

GCC 4.8.4 has been released.

Comment 20 Richard Biener 2015-06-23 08:32:13 UTC

Fixed for 4.9.0.