This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Better info for combine results in worse code generated

From: Alan Modra <amodra at gmail dot com>
To: gcc at gcc dot gnu dot org
Date: Fri, 29 May 2015 00:09:42 +0930
Subject: Better info for combine results in worse code generated
Authentication-results: sourceware.org; auth=none

It's really annoying when you fix a combine bug and get worse code..

The following was part of a larger patch.  What this part does is
simply not throw away useful info about non-zero bits.  The upper bits
that "we don't know anything about" are not those indicated by
last_set_mode, because nonzero_bits_mode may have been used to
calculate last_set_nonzero_bits.  See record_value_for_reg.

Index: combine.c
===================================================================
--- combine.c	(revision 223725)
+++ combine.c	(working copy)
@@ -9832,10 +9832,16 @@ reg_nonzero_bits_for_combine (const_rtx x, machine
 		   REGNO (x)))))
     {
       unsigned HOST_WIDE_INT mask = rsp->last_set_nonzero_bits;
+      machine_mode mask_mode = rsp->last_set_mode;
 
-      if (GET_MODE_PRECISION (rsp->last_set_mode) < GET_MODE_PRECISION (mode))
+      /* We possibly calculated last_set_nonzero_bits in a wider mode.  */
+      if (GET_MODE_CLASS (mask_mode) == MODE_INT
+	  && GET_MODE_PRECISION (mask_mode) < HOST_BITS_PER_WIDE_INT)
+	mask_mode = nonzero_bits_mode;
+
+      if (GET_MODE_PRECISION (mask_mode) < GET_MODE_PRECISION (mode))
 	/* We don't know anything about the upper bits.  */
-	mask |= GET_MODE_MASK (mode) ^ GET_MODE_MASK (rsp->last_set_mode);
+	mask |= GET_MODE_MASK (mode) ^ GET_MODE_MASK (mask_mode);
 
       *nonzero &= mask;
       return NULL;

The problem is that the following testcase on powerpc64le now
generates worse code.

void foo (signed char *p) { if (*p != 0) *p = 1; }

	before			after
foo:			foo:
	lbz 9,0(3)		lbz 9,0(3)
	cmpwi 7,9,0		andi. 10,9,0xff
	beqlr 7			beqlr 0
	li 9,1			li 9,1
	stb 9,0(3)		stb 9,0(3)
	blr			blr

That record form andi. is slower on many processors, and restricted to
setting cr0.


This is what combine sees at the start of the function.

(insn 6 3 7 2 (set (reg:QI 158 [ *p_3(D) ])
        (mem:QI (reg/v/f:DI 156 [ p ]) [0 *p_3(D)+0 S1 A8])) byte.c:1 444 {*movqi_internal}
     (nil))
(insn 7 6 8 2 (set (reg:SI 157 [ *p_3(D) ])
        (sign_extend:SI (reg:QI 158 [ *p_3(D) ]))) byte.c:1 30 {extendqisi2}
     (expr_list:REG_DEAD (reg:QI 158 [ *p_3(D) ])
        (nil)))
(insn 8 7 9 2 (set (reg:CC 159)
        (compare:CC (reg:SI 157 [ *p_3(D) ])
            (const_int 0 [0]))) byte.c:1 690 {*cmpsi_internal1}
     (expr_list:REG_DEAD (reg:SI 157 [ *p_3(D) ])
        (nil)))
(jump_insn 9 8 10 2 (set (pc)
        (if_then_else (eq (reg:CC 159)
                (const_int 0 [0]))
            (label_ref:DI 15)
            (pc))) byte.c:1 723 {*rs6000.md:11429}
     (expr_list:REG_DEAD (reg:CC 159)
        (int_list:REG_BR_PROB 3900 (nil)))
 -> 15)

And here's where things go wrong.

Trying 7 -> 8:
Successfully matched this instruction:
(set (reg:CC 159)
    (compare:CC (zero_extend:SI (reg:QI 158 [ *p_3(D) ]))
        (const_int 0 [0])))
allowing combination of insns 7 and 8
original costs 4 + 4 = 8
replacement cost 8
deferring deletion of insn with uid = 7.
modifying insn i3     8: {r159:CC=cmp(zero_extend(r158:QI),0);clobber scratch;}
      REG_DEAD r158:QI
deferring rescan insn with uid = 8.

Trying 6 -> 8:
Failed to match this instruction:
(parallel [
        (set (reg:CC 159)
            (compare:CC (subreg:SI (mem:QI (reg/v/f:DI 156 [ p ]) [0 *p_3(D)+0 S1 A8]) 0)
                (const_int 0 [0])))
        (clobber (scratch:SI))
    ])
Failed to match this instruction:
(set (reg:CC 159)
    (compare:CC (subreg:SI (mem:QI (reg/v/f:DI 156 [ p ]) [0 *p_3(D)+0 S1 A8]) 0)
        (const_int 0 [0])))


With an unpatched compiler the 7 -> 8 combination doesn't happen,
because the less accurate zero-bits info doesn't allow the sign_extend
to be removed.  Instead, you get

Trying 7 -> 8:
Failed to match this instruction:
(set (reg:CC 159)
    (compare:CC (reg:QI 158 [ *p_3(D) ])
        (const_int 0 [0])))

Trying 6, 7 -> 8:
Failed to match this instruction:
(set (reg:CC 159)
    (compare:CC (zero_extend:SI (mem:QI (reg/v/f:DI 156 [ p ]) [0 *p_3(D)+0 S1 A8]))
        (const_int 0 [0])))
Successfully matched this instruction:
(set (reg:SI 157 [ *p_3(D) ])
    (zero_extend:SI (mem:QI (reg/v/f:DI 156 [ p ]) [0 *p_3(D)+0 S1 A8])))
Successfully matched this instruction:
(set (reg:CC 159)
    (compare:CC (reg:SI 157 [ *p_3(D) ])
        (const_int 0 [0])))
allowing combination of insns 6, 7 and 8
original costs 8 + 4 + 4 = 16
replacement costs 8 + 4 = 12
deferring deletion of insn with uid = 6.
modifying insn i2     7: r157:SI=zero_extend([r156:DI])
deferring rescan insn with uid = 7.
modifying insn i3     8: r159:CC=cmp(r157:SI,0)
      REG_DEAD r157:SI
deferring rescan insn with uid = 8.

So, a three insn combine that's split to two insns.  Improving the
non-zero bit info loses the opportunity to try this three insn
combination, because we've already reduced down to two insns.

Does anyone have any clues as to how I might fix this?  I'm not keen
on adding an insn_and_split to rs6000.md to recognize the 6 -> 8
combination, because one of the aims of the wider patch I was working
on was to remove patterns like rotlsi3_64, ashlsi3_64, lshrsi3_64 and
ashrsi3_64.  Adding patterns in order to remove others doesn't sound
like much of a win.

-- 
Alan Modra
Australia Development Lab, IBM

Follow-Ups:
- Re: Better info for combine results in worse code generated
  - From: David Edelsohn
- Re: Better info for combine results in worse code generated
  - From: Segher Boessenkool

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]