[PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand
H.J. Lu
hjl.tools@gmail.com
Thu Dec 23 14:41:58 GMT 2021
On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> > > The problem is in
> > >
> > > (define_memory_constraint "TARGET_MEM_CONSTRAINT"
> > > "Matches any valid memory."
> > > (and (match_code "mem")
> > > (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
> > > MEM_ADDR_SPACE (op))")))
> > >
> > > define_register_constraint allows LRA to convert the operand to the form
> > > '(mem (reg X))', where X is a base register. I am testing the v2 patch with
> >
> > If you mean replacing an immediate with a MEM containing that immediate,
> > isn't that often the right thing though?
> > I mean, if the register pressure is high and options are either spill some
> > register, set it to immediate, use it in one instruction and then fill the
> > spilled register (i.e. 2 memory loads), compared to a MEM use on the
> > arithmetic instruction one MEM seems cheaper to me. With -fPIC and the
> > cst needing runtime relocation slightly less so of course.
> >
>
> We will check the performance impact on SPEC CPU 2017.
> Here is the v2 patch. Liwei, can you help collect SPEC CPU 2017
> impact of the enclosed patch? Thanks.
We checked SPEC CPU 2017 performance with -O2 and -Ofast.
There is no performance regression. OK for master?
> > The code due to ivopts is trying to have something like
> > size_t a = (size_t) &tunable_list;
> > size_t b = 0xffffffffffffffa8 - a;
> > size_t c = x + b;
> > and for that cst - &symbol one needs actually 2 registers, one to hold the
> > constant and one to hold the (%rip) based address.
> > (insn 790 789 791 111 (set (reg:DI 292)
> > (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 {*movdi_internal}
> > (nil))
> > (insn 791 790 792 111 (set (reg:DI 293)
> > (symbol_ref:DI ("tunable_list") [flags 0x2] <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> > (nil))
> > (insn 792 791 793 111 (parallel [
> > (set (reg:DI 291)
> > (minus:DI (reg:DI 292)
> > (reg:DI 293)))
> > (clobber (reg:CC 17 flags))
> > ]) "dl-tunables.c":304:15 299 {*subdi_1}
> > (nil))
> > (insn 793 792 794 111 (parallel [
> > (set (reg:DI 294)
> > (plus:DI (reg:DI 291)
> > (reg:DI 198 [ ivtmp.176 ])))
> > (clobber (reg:CC 17 flags))
> > ]) "dl-tunables.c":304:15 226 {*adddi_1}
> > (nil))
> > It would be smarter to rewrite the above into a lea 88+tunable_list(%rip), %temp1
> > and use a subtraction instead of addition in the last insn above, or of
> > course in the particular case even consider the following 2 instructions
> > that do:
> > (insn 794 793 795 111 (set (reg:DI 296)
> > (symbol_ref:DI ("tunable_list") [flags 0x2] <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> > (nil))
> > (insn 795 794 796 111 (parallel [
> > (set (reg:DI 295 [ cur ])
> > (plus:DI (reg:DI 294)
> > (reg:DI 296)))
> > (clobber (reg:CC 17 flags))
> > ]) "dl-tunables.c":304:15 226 {*adddi_1}
> > (nil))
> > and find out that &tuneble_list - &tuneble_list is 0 and we don't need it at
> > all. Guess we don't figure that out due to the cast of one of those
> > addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal
> > pointer.
> >
> > Jakub
> >
>
>
> --
> H.J.
Thanks.
--
H.J.
More information about the Gcc-patches
mailing list