User account creation filtered due to spam.
It seems that for some reason loading a constant is now favored instead of using the #imm,R0 alternative. void test000 (int* x, int xb) { x[0] = xb & 128; } void test001 (int* x, int xb) { x[0] = xb | 128; } void test002 (int* x, int xb) { x[0] = xb ^ 128; } trunk: _test000: mov.w .L7,r1 ! 15 *movhi/1 [length = 2] and r1,r5 ! 7 *andsi_compact/4 [length = 2] rts ! 18 *return_i [length = 2] mov.l r5,@r4 ! 8 movsi_ie/9 [length = 2] 4.9: mov r5,r0 ! 15 movsi_ie/2 [length = 2] and #128,r0 ! 7 *andsi_compact/3 [length = 2] rts ! 18 *return_i [length = 2] mov.l r0,@r4 ! 8 movsi_ie/9 [length = 2] The RTL before RA is the same in both cases: (insn 7 4 8 2 (set (reg:SI 163 [ D.1431 ]) (and:SI (reg:SI 5 r5 [ xb ]) (const_int 128 [0x80]))) sh_tmp.cpp:257 124 {*andsi_compact} (expr_list:REG_DEAD (reg:SI 5 r5 [ xb ]) (nil))) (insn 8 7 0 2 (set (mem:SI (reg:SI 4 r4 [ x ]) [1 *x_4(D)+0 S4 A32]) (reg:SI 163 [ D.1431 ])) sh_tmp.cpp:257 257 {movsi_ie} (expr_list:REG_DEAD (reg:SI 4 r4 [ x ]) (expr_list:REG_DEAD (reg:SI 163 [ D.1431 ]) (nil)))) Reload on trunk says: Reloads for insn # 7 Reload 0: reload_in (SI) = (const_int 128 [0x80]) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (const_int 128 [0x80]) reload_reg_rtx: (reg:SI 1 r1) While reload on 4.9 says: Reloads for insn # 7 Reload 0: reload_in (SI) = (reg:SI 5 r5 [ xb ]) reload_out (SI) = (reg:SI 0 r0 [orig:163 D.1377 ] [163]) R0_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg:SI 5 r5 [ xb ]) reload_out_reg: (reg:SI 0 r0 [orig:163 D.1377 ] [163]) reload_reg_rtx: (reg:SI 0 r0 [orig:163 D.1377 ] [163]) Maybe this is because the function argument from hardreg r5 is propagated into the insn. This propagation is also causing unnecessary sign/zero extensions, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53987#c9
It seems that hardregs in insn operands are kind of sticky. Combine propagates the function arg in r5 into the and insn and this doesn't get changed to r0. Disallowing combine to do that seems to fix the problem, although I haven't checked for any other side effects of that change. Probably some code will get worse for insns that really use hardregs (e.g. libfuncs). Index: gcc/config/sh/sh.c =================================================================== --- gcc/config/sh/sh.c (revision 220144) +++ gcc/config/sh/sh.c (working copy) @@ -2066,6 +2066,26 @@ && MEM_P (XEXP (XEXP (p, 1), 0))) return false; + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (i, array, p, ALL) + { + if (*i == NULL) + continue; + int regno = -1; + for (const_rtx x = *i; regno == -1; ) + { + if (REG_P (x)) + regno = REGNO (x); + else if (SUBREG_P (x)) + x = SUBREG_REG (x); + else + break; + } + + if (regno != -1 && regno >= R0_REG && regno <= R0_REG + 15) + return false; + } + return true; } Another more radical approach could be to insert an RTL pass that pre-allocates the R0 reg for those insns that have "z" constraint alternatives, similar to what Kaz did with the mov.{b|w} patterns in prepare_move_operands for LRA.
(In reply to Oleg Endo from comment #1) > > Another more radical approach could be to insert an RTL pass that > pre-allocates the R0 reg for those insns that have "z" constraint > alternatives, similar to what Kaz did with the mov.{b|w} patterns in > prepare_move_operands for LRA. As a start, that pass could walk over all insns and make sure that for operands with constraints/alternatives no hardregs are used before RA. For instance (set (reg:SI 163) (and:SI (reg:SI 5 r5) (const_int 128))) would be changed to (set (reg:SI 164) (reg:SI r5)) // 164 is a new pseudo (set (reg:SI 163) (and:SI (reg:SI 164) (const_int 128))) I think something like this would be feasible for 5.0.
Created attachment 34982 [details] Add sh_pre_ra RTL pass
(In reply to Oleg Endo from comment #3) > Created attachment 34982 [details] > Add sh_pre_ra RTL pass Right now I can think of two alternatives for this PR: - convert respective insns into insn_and_split and emit a pseudo + move in split1 - catch imm8,r0 insns in sh_legitimate_combined_insn in the same way as it's done by the sh_pre_ra pass above. I think that an SH specific pre-RA RTL pass could be useful in the future, for example to improve R0 utilization, even with LRA. Kaz, do you have any opinions?
(In reply to Oleg Endo from comment #3) > Created attachment 34982 [details] > Add sh_pre_ra RTL pass + + flag_live_range_shrinkage = 1; This hunk was not supposed to be in the patch.
(In reply to Oleg Endo from comment #4) > I think that an SH specific pre-RA RTL pass could be useful in the future, > for example to improve R0 utilization, even with LRA. > > Kaz, do you have any opinions? I like your pre-RA pass even if it's a too big hammer for this specific problem. It should wait the next stage1, though. Also it would be better to look for another use cases of that pass as you suggested so as to justify the cost of scanning all insns.
(In reply to Kazumoto Kojima from comment #6) > > I like your pre-RA pass even if it's a too big hammer for > this specific problem. It should wait the next stage1, though. It seems that this PR's issue is not a frequent use case (no hits in CSiBE at all). So yes, stage1 is good. > Also it would be better to look for another use cases of that > pass as you suggested so as to justify the cost of scanning > all insns. Some use cases for the pre-RA pass: - R0 pre-allocation - reduction of number of pseudos and reg-reg copies some passes leave pseudos and copies which can be removed to make the RA task easier. - 2 operand / commutative operands optimization on SH the dest operand is always one of the source operands. I've seen several times that the generic RA makes not-so-good choices which results in more live regs and unnecessary reg-reg copies. Very often output operands are put in different pseudos than the input operands before RA and RA has to fix this somehow. - the last time I played with the fipr insn (PR 55295) RA had trouble allocating FV regs. For example: void func (float a, float b, float c, float d) would not allocate (a,b,c,d) to FV4, although the operands are already in the appropriate FR regs. It resulted in many unnecessary reg-reg copies. I haven't tried this with LRA though. There are some more things which I'd do before RA: - Forming SH2A movu.{b|w} insns (PR 64792) - Various constant optimizations (PR 63390, PR 51708, PR 54682, PR 65069) - 64 bit FP load/store fusion (PR 64305) It would be possible to write one huge pre-RA RTL pass to do all of that. However, I'd like to avoid accidents such as reload.c and rather keep things separated as much as possible. I don't have evidence, but I don't think that scanning all insns is too bad. It's being done multiple times during compilation and there are other places which could be optimized. For example, as far as I know, split4 and split5 passes are not needed on SH and could be disabled. Or maybe the conditions in define_split such as "can_create_pseudo_p ()" should be evaluated *before* all insns are scanned/recog'ed.
GCC 5.1 has been released.
GCC 5.2 is being released, adjusting target milestone to 5.3.
GCC 5.3 is being released, adjusting target milestone.
GCC 5.4 is being released, adjusting target milestone.