This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Combine related fail of gcc.target/powerpc/ti_math1.c


FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1
is seen on powerpc64le-linux since somewhere between revision 218587
and 218616.  See
https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01287.html and
https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01325.html

A regression hunt fingers one of Segher's 2014-12-10 patches to the
rs6000 backend, git commit 0f1bedb4 or svn rev 218595.  Segher might
argue that generated code is better after this commit, and I'd agree
that his change is a good one in general, but even so it would be nice
to generate the ideal code.  Curiously, the ideal code is generated at
-O1, but we regress at -O2..

	before			after			ideal (-O1)
add_128:		add_128:		add_128:
	ld 10,0(3)		ld 9,0(3)		ld 9,0(3)
	ld 11,8(3)		ld 10,8(3)		ld 10,8(3)
	addc 8,4,10		addc 3,4,9		addc 3,4,9
	adde 9,5,11		addze 5,5		adde 4,5,10
	mr 3,8			add 4,5,10		blr
	mr 4,9			blr
	blr

I went looking into where the addze appeared, and found combine.

Trying 18, 9 -> 24:
Failed to match this instruction:
(set (reg:DI 4 4 [+8 ])
    (plus:DI (plus:DI (reg:DI 5 5 [ val+8 ])
            (reg:DI 76 ca))
        (reg:DI 169 [+8 ])))
Successfully matched this instruction:
(set (reg:DI 167 [ D.2366+8 ])
    (plus:DI (reg:DI 5 5 [ val+8 ])
        (reg:DI 76 ca)))
Successfully matched this instruction:
(set (reg:DI 4 4 [+8 ])
    (plus:DI (reg:DI 167 [ D.2366+8 ])
        (reg:DI 169 [+8 ])))
allowing combination of insns 18, 9 and 24
original costs 4 + 8 + 4 = 16
replacement costs 4 + 4 = 8

Here are the three insns involved, sans source line numbers and notes.

(insn 18 17 4 2 (set (reg:DI 165 [ val+8 ])
        (reg:DI 5 5 [ val+8 ])) {*movdi_internal64})
...
(insn 9 8 23 2 (parallel [
            (set (reg:DI 167 [ D.2366+8 ])
                (plus:DI (plus:DI (reg:DI 165 [ val+8 ])
                        (reg:DI 169 [+8 ]))
                    (reg:DI 76 ca)))
            (clobber (reg:DI 76 ca))
        ]) {*adddi3_carry_in_internal})
...
(insn 24 23 15 2 (set (reg:DI 4 4 [+8 ])
        (reg:DI 167 [ D.2366+8 ])) {*movdi_internal64})

So, a move copying an argument register to a pseudo, one insn from the
body of the function, and a move copying a pseudo to a result
register.  The thought I had was: It is really combine's business to
look at copies from/to ABI mandated hard registers?  Isn't removing
the copies something that register allocation can do better?  If so,
then combine is doing unnecessary work.

As a quick hack, I tried the following.

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 223431)
+++ gcc/combine.c	(working copy)
@@ -1281,6 +1281,16 @@ combine_instructions (rtx_insn *f, unsigned int nr
 	  if (!NONDEBUG_INSN_P (insn))
 	    continue;
 
+	  if (this_basic_block == EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb)
+	    {
+	      rtx set = single_set (insn);
+	      if (set
+		  && REG_P (SET_DEST (set))
+		  && HARD_REGISTER_P (SET_DEST (set))
+		  && REG_P (SET_SRC (set)))
+		continue;
+	    }
+
 	  while (last_combined_insn
 		 && last_combined_insn->deleted ())
 	    last_combined_insn = PREV_INSN (last_combined_insn);

This cures the powerpc64le testcase failure, but Segher said on irc I
was risking breaking x86 and other targets.  Perhaps that was trying
to push me to fix the underlying combine problem.  :)  In any case, I
didn't believe him, and tested the patch on powerpc64le-linux and
x86_64-linux.  No regressions in --languages=all,go and objdump -d
comparison for gcc/*.o against virgin source show no unexpected
changes.  powerpc64le-linux actually shows no changes at all apart
from combine.o while x86_64-linux shows some changes in register
allocation and cmove arg swapping with inversion of the condition.
There were no extra instructions.

So, is this worth pursuing in order to speed up combine?  I'd be
inclined to patch create_log_links instead for a proper patch.


Incidentally the underlying problem in combine (well the first one I
spotted, there might be more), is that 
      if (flag_expensive_optimizations)
	{
	  /* Pass pc_rtx so no substitutions are done, just
	     simplifications.  */
"simplifies" this i2src

(plus:DI (plus:DI (reg:DI 165 [ val+8 ])
        (reg:DI 169 [+8 ]))
    (reg:DI 76 ca))

to this

(plus:DI (plus:DI (reg:DI 76 ca)
        (reg:DI 165 [ val+8 ]))
    (reg:DI 169 [+8 ]))

and the latter has the ca register in the wrong place.  So a split is
tried and you get addze.  I'm working on this.  The reordering happens
inside simplify_plus_minus.

-- 
Alan Modra
Australia Development Lab, IBM


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]