User account creation filtered due to spam.

Bug 64785 - [5/6/7/8 Regression][SH] and|or|xor #imm not used
Summary: [5/6/7/8 Regression][SH] and|or|xor #imm not used
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.0
: P4 normal
Target Milestone: 5.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2015-01-25 13:11 UTC by Oleg Endo
Modified: 2016-06-03 10:05 UTC (History)
1 user (show)

See Also:
Host:
Target: sh*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-01-27 00:00:00


Attachments
Add sh_pre_ra RTL pass (3.16 KB, patch)
2015-03-07 12:23 UTC, Oleg Endo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Endo 2015-01-25 13:11:18 UTC
It seems that for some reason loading a constant is now favored instead of using the #imm,R0 alternative.

void test000 (int* x, int xb)
{
  x[0] = xb & 128;
}

void test001 (int* x, int xb)
{
  x[0] = xb | 128;
}

void test002 (int* x, int xb)
{
  x[0] = xb ^ 128;
}

trunk:
_test000:
	mov.w	.L7,r1	! 15	*movhi/1	[length = 2]
	and	r1,r5	! 7	*andsi_compact/4	[length = 2]
	rts		! 18	*return_i	[length = 2]
	mov.l	r5,@r4	! 8	movsi_ie/9	[length = 2]

4.9:
	mov	r5,r0	! 15	movsi_ie/2	[length = 2]
	and	#128,r0	! 7	*andsi_compact/3	[length = 2]
	rts		! 18	*return_i	[length = 2]
	mov.l	r0,@r4	! 8	movsi_ie/9	[length = 2]


The RTL before RA is the same in both cases:

(insn 7 4 8 2 (set (reg:SI 163 [ D.1431 ])
        (and:SI (reg:SI 5 r5 [ xb ])
            (const_int 128 [0x80]))) sh_tmp.cpp:257 124 {*andsi_compact}
     (expr_list:REG_DEAD (reg:SI 5 r5 [ xb ])
        (nil)))
(insn 8 7 0 2 (set (mem:SI (reg:SI 4 r4 [ x ]) [1 *x_4(D)+0 S4 A32])
        (reg:SI 163 [ D.1431 ])) sh_tmp.cpp:257 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 4 r4 [ x ])
        (expr_list:REG_DEAD (reg:SI 163 [ D.1431 ])
            (nil))))

Reload on trunk says:

Reloads for insn # 7
Reload 0: reload_in (SI) = (const_int 128 [0x80])
	GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 2)
	reload_in_reg: (const_int 128 [0x80])
	reload_reg_rtx: (reg:SI 1 r1)


While reload on 4.9 says:

Reloads for insn # 7
Reload 0: reload_in (SI) = (reg:SI 5 r5 [ xb ])
	reload_out (SI) = (reg:SI 0 r0 [orig:163 D.1377 ] [163])
	R0_REGS, RELOAD_OTHER (opnum = 0)
	reload_in_reg: (reg:SI 5 r5 [ xb ])
	reload_out_reg: (reg:SI 0 r0 [orig:163 D.1377 ] [163])
	reload_reg_rtx: (reg:SI 0 r0 [orig:163 D.1377 ] [163])

Maybe this is because the function argument from hardreg r5 is propagated into the insn.  This propagation is also causing unnecessary sign/zero extensions, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53987#c9
Comment 1 Oleg Endo 2015-01-27 23:46:48 UTC
It seems that hardregs in insn operands are kind of sticky.  Combine propagates the function arg in r5 into the and insn and this doesn't get changed to r0.

Disallowing combine to do that seems to fix the problem, although I haven't checked for any other side effects of that change.  Probably some code will get worse for insns that really use hardregs (e.g. libfuncs).

Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 220144)
+++ gcc/config/sh/sh.c	(working copy)
@@ -2066,6 +2066,26 @@
       && MEM_P (XEXP (XEXP (p, 1), 0)))
       return false;
 
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (i, array, p, ALL)
+    {
+      if (*i == NULL)
+	continue;
+      int regno = -1;
+      for (const_rtx x = *i; regno == -1; )
+	{
+	  if (REG_P (x))
+	    regno = REGNO (x);
+	  else if (SUBREG_P (x))
+	    x = SUBREG_REG (x);
+	  else
+	    break;
+	}
+
+      if (regno != -1 && regno >= R0_REG && regno <= R0_REG + 15)
+	return false;
+    }
+
   return true;
 }
 

Another more radical approach could be to insert an RTL pass that pre-allocates the R0 reg for those insns that have "z" constraint alternatives, similar to what Kaz did with the mov.{b|w} patterns in prepare_move_operands for LRA.
Comment 2 Oleg Endo 2015-01-28 17:06:59 UTC
(In reply to Oleg Endo from comment #1)
> 
> Another more radical approach could be to insert an RTL pass that
> pre-allocates the R0 reg for those insns that have "z" constraint
> alternatives, similar to what Kaz did with the mov.{b|w} patterns in
> prepare_move_operands for LRA.

As a start, that pass could walk over all insns and make sure that for operands with constraints/alternatives no hardregs are used before RA.  For instance

(set (reg:SI 163) (and:SI (reg:SI 5 r5) (const_int 128)))

would be changed to

(set (reg:SI 164) (reg:SI r5)) // 164 is a new pseudo
(set (reg:SI 163) (and:SI (reg:SI 164) (const_int 128)))

I think something like this would be feasible for 5.0.
Comment 3 Oleg Endo 2015-03-07 12:23:53 UTC
Created attachment 34982 [details]
Add sh_pre_ra RTL pass
Comment 4 Oleg Endo 2015-03-07 12:31:08 UTC
(In reply to Oleg Endo from comment #3)
> Created attachment 34982 [details]
> Add sh_pre_ra RTL pass

Right now I can think of two alternatives for this PR:
- convert respective insns into insn_and_split and emit a pseudo + move in split1

- catch imm8,r0 insns in sh_legitimate_combined_insn in the same way as it's done by the sh_pre_ra pass above.


I think that an SH specific pre-RA RTL pass could be useful in the future, for example to improve R0 utilization, even with LRA.

Kaz, do you have any opinions?
Comment 5 Oleg Endo 2015-03-07 12:33:55 UTC
(In reply to Oleg Endo from comment #3)
> Created attachment 34982 [details]
> Add sh_pre_ra RTL pass

+
+  flag_live_range_shrinkage = 1;

This hunk was not supposed to be in the patch.
Comment 6 Kazumoto Kojima 2015-03-08 00:31:28 UTC
(In reply to Oleg Endo from comment #4)
> I think that an SH specific pre-RA RTL pass could be useful in the future,
> for example to improve R0 utilization, even with LRA.
> 
> Kaz, do you have any opinions?

I like your pre-RA pass even if it's a too big hammer for
this specific problem.  It should wait the next stage1, though.
Also it would be better to look for another use cases of that
pass as you suggested so as to justify the cost of scanning
all insns.
Comment 7 Oleg Endo 2015-03-08 10:24:11 UTC
(In reply to Kazumoto Kojima from comment #6)
> 
> I like your pre-RA pass even if it's a too big hammer for
> this specific problem.  It should wait the next stage1, though.

It seems that this PR's issue is not a frequent use case (no hits in CSiBE at all).  So yes, stage1 is good.


> Also it would be better to look for another use cases of that
> pass as you suggested so as to justify the cost of scanning
> all insns.

Some use cases for the pre-RA pass:
- R0 pre-allocation

- reduction of number of pseudos and reg-reg copies
  some passes leave pseudos and copies which can be removed
  to make the RA task easier.

- 2 operand / commutative operands optimization
  on SH the dest operand is always one of the source operands.
  I've seen several times that the generic RA makes not-so-good
  choices which results in more live regs and unnecessary reg-reg
  copies.  Very often output operands are put in different pseudos
  than the input operands before RA and RA has to fix this somehow.

- the last time I played with the fipr insn (PR 55295) RA had trouble
  allocating FV regs.  For example:
     void func (float a, float b, float c, float d)
  would not allocate (a,b,c,d) to FV4, although the operands are already
  in the appropriate FR regs.  It resulted in many unnecessary reg-reg
  copies.  I haven't tried this with LRA though.


There are some more things which I'd do before RA:

- Forming SH2A movu.{b|w} insns (PR 64792)
- Various constant optimizations (PR 63390, PR 51708, PR 54682, PR 65069)
- 64 bit FP load/store fusion (PR 64305)

It would be possible to write one huge pre-RA RTL pass to do all of that.  However, I'd like to avoid accidents such as reload.c and rather keep things separated as much as possible.  I don't have evidence, but I don't think that scanning all insns is too bad.  It's being done multiple times during compilation and there are other places which could be optimized.  For example, as far as I know, split4 and split5 passes are not needed on SH and could be disabled.  Or maybe the conditions in define_split such as "can_create_pseudo_p ()" should be evaluated *before* all insns are scanned/recog'ed.
Comment 8 Jakub Jelinek 2015-04-22 12:00:26 UTC
GCC 5.1 has been released.
Comment 9 Richard Biener 2015-07-16 09:12:25 UTC
GCC 5.2 is being released, adjusting target milestone to 5.3.
Comment 10 Richard Biener 2015-12-04 10:45:02 UTC
GCC 5.3 is being released, adjusting target milestone.
Comment 11 Richard Biener 2016-06-03 10:05:14 UTC
GCC 5.4 is being released, adjusting target milestone.