[PATCH] x86_64: PR rtl-optimization/92180: class_likely_spilled vs. cant_combine_insn.

Uros Bizjak ubizjak@gmail.com
Mon Aug 17 11:06:10 GMT 2020


On Mon, Aug 17, 2020 at 12:42 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch catches a missed optimization opportunity where GCC currently
> generates worse code than LLVM.  The issue, as nicely analyzed in bugzilla,
> boils down to the following three insns in combine:
>
> (insn 6 5 7 2 (parallel [
>             (set (reg:DI 85)
>                 (ashift:DI (reg:DI 85)
>                     (const_int 32 [0x20])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr92180.c":4:10 564 {*ashldi3_1}
>      (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> (insn 7 6 14 2 (parallel [
>             (set (reg:DI 84)
>                 (ior:DI (reg:DI 84)
>                     (reg:DI 85)))
>             (clobber (reg:CC 17 flags))
>         ]) "pr92180.c":4:10 454 {*iordi_1}
>      (expr_list:REG_DEAD (reg:DI 85)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> (insn 14 7 15 2 (set (reg/i:SI 0 ax)
>         (subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:DI 84)
>         (nil)))
>
> Normally, combine/simplify-rtx would notice that insns 6 and 7
> (which update highpart bits) are unnecessary as the final insn 14
> only requires to lowpart bits.  The complication is that insn 14
> sets a hard register in targetm.class_likely_spilled_p which
> prevents combine from performing its simplifications, and removing
> the redundant instructions.
>
> At first glance a fix would appear to require changes to combine,
> potentially affecting code generation on all small register class
> targets...  An alternate (and I think clever) solution is to spot
> that this problematic situation can be avoided by the backend.
>
> At RTL expansion time, the middle-end has a clear separation between
> pseudos and hard registers, so the RTL initially contains:
>
> (insn 9 8 10 2 (set (reg:SI 86)
>         (subreg:SI (reg:DI 82 [ _1 ]) 0)) "pr92180.c":6:10 -1
>      (nil))
> (insn 10 9 14 2 (set (reg:SI 83 [ <retval> ])
>         (reg:SI 86)) "pr92180.c":6:10 -1
>      (nil))
> (insn 14 10 15 2 (set (reg/i:SI 0 ax)
>         (reg:SI 83 [ <retval> ])) "pr92180.c":7:1 -1
>      (nil))
>
> which can be optimized without problems by combine; it is only the
> intervening passes (initially fwprop1) that propagate computations
> into sets of hard registers, and disable those opportunities.
>
> The solution proposed here is to have the x86 backend/recog prevent
> early RTL passes composing instructions (that set likely_spilled hard
> registers) that they (combine) can't simplify, until after reload.
> We allow sets from pseudo registers, immediate constants and memory
> accesses, but anything more complicated is performed via a temporary
> pseudo.  Not only does this simplify things for the register allocator,
> but any remaining register-to-register moves are easily cleaned up
> by the late optimization passes after reload, such as peephole2 and
> cprop_hardreg.
>
> This patch has been tested on x86_64-pc-linux-gnu with a
> "make bootstrap" and a "make -k check" with no new failures.
> Ok for mainline?

I think that fwprop interferes with recent change to combine, where
combine won't propagate hard registers anymore. So, following that
change, there is no point for fwprop to create instructions that
combine won't be able to process. Alternatively, perhaps fwprop should
be prevented from propagating likely_spilled hard registers?

Let's ask Segher for his opinion.

Uros.

>
>
> 2020-08-17  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR rtl-optimization/92180
>         * config/i386/i386.c (ix86_hardreg_mov_ok): New function to
>         determine whether (set DST SRC) should be allowed at this point.
>         * config/i386/i386-protos.h (ix86_hardreg_mov_ok): Prototype here.
>         * config/i386/i386-expand.c (ix86_expand_move): Check whether
>         this is a complex set of a likely spilled hard register, and if
>         so place the value in a pseudo, and load the hard reg from it.
>         * config/i386/i386.md (*movdi_internal, *movsi_internal,
>         *movhi_internal, *movqi_internal): Make these instructions
>         conditional on ix86_hardreg_mov_ok.
>         (*lea<mode>): Make this define_insn_and_split conditional on
>         ix86_hardreg_mov_ok.
>
> gcc/testsuite/ChangeLog
>         PR rtl-optimization/92180
>         * gcc.target/i386/pr92180.c: New test.
>
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


More information about the Gcc-patches mailing list