Split out from PR55941. The testcase from referred PR --cut here-- typedef struct A { double a, b; } A; double f(A x,double y){return x.a+y;} --cut here-- compiles to (-O2): f: addsd %xmm0, %xmm2 movapd %xmm2, %xmm0 ret gcc-4.7 generates: f: addsd %xmm2, %xmm0 ret LRA does not take into account commutative operands here.
Confirmed. Actually, it looks to me like IRA already makes this allocation: Disposition: 1:r64 l0 21 0:r67 l0 23 (21=>%xmm0, 23=>%xmm2) => .ira dump: ;; basic block 2 2: r64:DF=xmm0:DF 8: r67:DF=xmm2:DF 12: r67:DF=r67:DF+r64:DF 17: xmm0:DF=r67:DF 20: use xmm0:DF => .reload (LRA) dump: ********** Local #1: ********** Choosing alt 0 in insn 12: (0) =x (1) %0 (2) xm ;; basic block 2 2: xmm0:DF=xmm0:DF 8: xmm2:DF=xmm2:DF 12: xmm2:DF=xmm2:DF+xmm0:DF 17: xmm0:DF=xmm2:DF 20: use xmm0:DF So all LRA does, is follow IRA's recommended allocation. That is IMHO the right thing to do, too.
The "unbreakable" insns 12 "xmm2:DF=xmm2:DF+xmm0:DF" is created by regmove. .ce3 dump: 2: r64:DF=xmm0:DF 8: r66:DF=xmm2:DF 12: r67:DF=r66:DF+r64:DF 17: xmm0:DF=r67:DF 20: use xmm0:DF .regmove dump: Could fix operand 1 of insn 12 matching operand 0. 2: r64:DF=xmm0:DF 8: r67:DF=xmm2:DF 12: r67:DF=r67:DF+r64:DF 17: xmm0:DF=r67:DF 20: use xmm0:DF With -fno-regmove: addsd %xmm2, %xmm0 ret
Before regmove, both input operands die in insn 12: 12: r67:DF=r66:DF+r64:DF REG_DEAD r66:DF REG_DEAD r64:DF and because reg classes haven't been set up, r66 and r67 both have GENERAL_REGS as their preferred register class so regmove doesn't see that r64 and r67 share the same preferred register %xmm0: Breakpoint 1, regmove_backward_pass () at ../../trunk/gcc/regmove.c:1088 1088 if (dump_file) (gdb) p reg_preferred_class (64) $10 = GENERAL_REGS (gdb) p reg_preferred_class (66) $11 = GENERAL_REGS (gdb) p reg_preferred_class (67) $12 = GENERAL_REGS (gdb) p ira_set_pseudo_classes (true, dump_file) $13 = void (gdb) p reg_preferred_class (64) $14 = SSE_FIRST_REG (gdb) p reg_preferred_class (66) $15 = SSE_REGS (gdb) p reg_preferred_class (67) $16 = SSE_FIRST_REG (gdb)
Perhaps for regmove IRA classes should be set up unconditionally: Index: regmove.c =================================================================== --- regmove.c (revision 196074) +++ regmove.c (working copy) @@ -1234,8 +1234,9 @@ regmove_optimize (void) regstat_init_n_sets_and_refs (); regstat_compute_ri (); - if (flag_ira_loop_pressure) - ira_set_pseudo_classes (true, dump_file); + /* Set up register classes for pseudos, so that reg_preferred_class + returns a more useful result. */ + ira_set_pseudo_classes (true, dump_file); regno_src_regno = XNEWVEC (int, nregs); for (i = nregs; --i >= 0; ) @@ -1256,8 +1257,7 @@ regmove_optimize (void) } regstat_free_n_sets_and_refs (); regstat_free_ri (); - if (flag_ira_loop_pressure) - free_reg_info (); + free_reg_info (); return 0; }
(In reply to comment #4) > Perhaps for regmove IRA classes should be set up unconditionally: > > Index: regmove.c > =================================================================== > --- regmove.c (revision 196074) > +++ regmove.c (working copy) > @@ -1234,8 +1234,9 @@ regmove_optimize (void) > regstat_init_n_sets_and_refs (); > regstat_compute_ri (); > > - if (flag_ira_loop_pressure) > - ira_set_pseudo_classes (true, dump_file); > + /* Set up register classes for pseudos, so that reg_preferred_class > + returns a more useful result. */ > + ira_set_pseudo_classes (true, dump_file); > > regno_src_regno = XNEWVEC (int, nregs); > for (i = nregs; --i >= 0; ) > @@ -1256,8 +1257,7 @@ regmove_optimize (void) > } > regstat_free_n_sets_and_refs (); > regstat_free_ri (); > - if (flag_ira_loop_pressure) > - free_reg_info (); > + free_reg_info (); > return 0; > } It can be a solution. I see only one drawback, it is expensive. Setting classes is expensive procedure requiring 2 passes over all insns, their alternatives,and classes for each pseudo operand. In general, it still will not work for other cases. We are lucky that xmm0 forms own class SSE_FIRST_REG. Regmove for general cases should see hard regs not classes. This is not the first PR about regmove. I'd like to remove big part of regmove concerning matching operands as IRA/LRA can deal with this. Unfortunately, not too well when hard regs exposed in RTL and some work should be done to improve this code. I am going to do for gcc4.9.
So - what regressed this compared to 4.7? It wasn't regmove.c changes.
(In reply to comment #6) > So - what regressed this compared to 4.7? It wasn't regmove.c changes. As said in comment 0: gcc-4.7 generates: f: addsd %xmm2, %xmm0 ret gcc-4.8 generates: f: addsd %xmm0, %xmm2 movapd %xmm2, %xmm0 ret
(In reply to comment #6) > So - what regressed this compared to 4.7? It wasn't regmove.c changes. Probably LRA, it better respects IRA's choices (which is good). The fix should be found in a change to IRA or regmove.
You don't need to guess, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55941#c1 mentions the commits that regressed it.
Thus CCing the offending people.
(In reply to comment #10) > Thus CCing the offending people. One of the offending people is the reporter.
I don't think this is P1. When looking at the dumps, right after expansion the 4.7 expanded code actually looks much worse compared to the 4.8 expanded one (4.7 goes through memory, while 4.8 through subregs), in *.ud-dce it is pretty much comparable, though 4.7 has one extra move: 2 r62:DF=xmm0:DF REG_DEAD: xmm0:DF 6 r64:DF=xmm2:DF REG_DEAD: xmm2:DF 21 r66:DF=r62:DF REG_DEAD: r62:DF 7 NOTE_INSN_FUNCTION_BEG 10 r65:DF=r64:DF+r66:DF REG_DEAD: r66:DF REG_DEAD: r64:DF 15 xmm0:DF=r65:DF REG_DEAD: r65:DF 18 use xmm0:DF in 4.7 vs. 2: r64:DF=xmm0:DF REG_DEAD xmm0:DF 8: r66:DF=xmm2:DF REG_DEAD xmm2:DF 9: NOTE_INSN_FUNCTION_BEG 12: r67:DF=r66:DF+r64:DF REG_DEAD r66:DF REG_DEAD r64:DF 17: xmm0:DF=r67:DF REG_DEAD r67:DF 20: use xmm0:DF in 4.8. The combiner change is what matters for the later behavior of regmove, RA and reload/LRA, but unfortunately that is an ICE fix that can't be reverted, we really must avoid to propagating hard registers too early, otherwise RA needs to just give up. So, in 4.7 we end up with r65:DF=xmm2:DF+r62:DF after combine, while 4.8 still uses pseudos. Which is the reason why regmove sets that into the optimal for this testcase r65:DF=xmm2:DF+r65:DF, because it has no other choice (the operand was already a hard reg), while in 4.8 it has a choice (sees two different pseudos, and chooses in this case the wrong one). I'm afraid there is no easy fix, so IMHO this needs to be postponed for 4.9.
GCC 4.8.0 is being released, adjusting target milestone.
GCC 4.8.1 has been released.
GCC 4.8.2 has been released.
The problem was fixed by the patch removing regmove and improving hardware reg preferences in IRA.
gcc-4.9 now generates: f: addsd %xmm2, %xmm0 ret The problem is fixed in 4.9, reconfirmed on 4.8 branch.
GCC 4.8.3 is being released, adjusting target milestone.
GCC 4.8.4 has been released.
Fixed for 4.9.0.