[PATCH 0/5] Tweak IRA handling of tying and earlyclobbers
Richard Sandiford
richard.sandiford@arm.com
Fri Jun 21 17:43:00 GMT 2019
Richard Sandiford <richard.sandiford@arm.com> writes:
> This series of patches tweaks the IRA handling of matched constraints
> and earlyclobbers. The main explanations are in the individual patches.
>
> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
>
> I also tried building at least one target per CPU directory and
> comparing the effect of the patches on the assembly output for
> gcc.c-torture, gcc.dg and g++.dg using -O2 -ftree-vectorize. The table
> below summarises the effect on the number of lines of assembly, ignoring
> tests for which the number of lines was the same:
Forgot to say that this list excludes targets for which there were
no changes in assembly length. (Thought I'd better say that since
the list clearly doesn't have one entry per CPU directory.)
FWIW the full list was:
aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu amdgcn-amdhsa
arc-elf arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf
c6x-elf cr16-elf cris-elf csky-elf epiphany-elf fr30-elf
frv-linux-gnu ft32-elf h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu
i686-pc-linux-gnu i686-apple-darwin iq2000-elf lm32-elf m32c-elf
m32r-elf m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
nds32le-elf nios2-linux-gnu nvptx-none or1k-elf pdp11
powerpc64-linux-gnu powerpc64le-linux-gnu powerpc-ibm-aix7.0 pru-elf
riscv32-elf riscv64-elf rl78-elf rx-elf s390-linux-gnu
s390x-linux-gnu sh-linux-gnu sparc-linux-gnu sparc64-linux-gnu
sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf xstormy16-elf
v850-elf vax-netbsdelf visium-elf x86_64-darwin x86_64-linux-gnu
xtensa-elf
> Target Tests Delta Best Worst Median
> ====== ===== ===== ==== ===== ======
> alpha-linux-gnu 87 -126 -96 138 -1
> arm-linux-gnueabi 38 -37 -10 4 -1
> arm-linux-gnueabihf 38 -37 -10 4 -1
> avr-elf 19 -64 -60 14 -1
> bfin-elf 143 -55 -21 21 -1
> c6x-elf 38 -32 -9 16 -1
> cris-elf 253 -1456 -192 24 -1
> csky-elf 101 -221 -36 26 -1
> frv-linux-gnu 11 -23 -8 -1 -1
> ft32-elf 1 -2 -2 -2 -2
> hppa64-hp-hpux11.23 66 -24 -12 12 -1
> i686-apple-darwin 22 -45 -24 11 -1
> i686-pc-linux-gnu 18 -65 -96 40 -1
> ia64-linux-gnu 1 -4 -4 -4 -4
> m68k-linux-gnu 83 31 -70 18 1
> mcore-elf 26 -122 -38 11 -2
> mmix 29 -110 -25 3 -1
> mn10300-elf 399 258 -70 70 1
> msp430-elf 120 1363 -13 833 2
> pdp11 37 -90 -92 25 -1
> powerpc-ibm-aix7.0 31 -25 -4 3 -1
> powerpc64-linux-gnu 31 -26 -2 2 -1
> powerpc64le-linux-gnu 31 -26 -2 2 -1
> pru-elf 2 8 1 7 1
> riscv32-elf 1 -2 -2 -2 -2
> riscv64-elf 1 -2 -2 -2 -2
> rl78-elf 6 -20 -18 9 -3
> rx-elf 123 32 -58 30 -1
> s390-linux-gnu 7 16 -6 9 1
> s390x-linux-gnu 1 -3 -3 -3 -3
> sh-linux-gnu 475 -4696 -843 42 -1
> spu-elf 168 -296 -114 25 -2
> visium-elf 214 -936 -183 22 -1
> x86_64-darwin 30 -25 -4 2 -1
> x86_64-linux-gnu 28 -29 -4 1 -1
>
> Of course, the number of lines is only a very rough guide to code size
> and code size is only a very rough guide to performance. It's just
> a way of getting a feel for how invasive the change is in pracitce.
>
> As often with this kind of comparison, quite a few changes in either
> direction come from things that the RA doesn't consider, such as the
> ability to merge code after RA.
>
> The msp430-elf results are especially misleading. The port has patterns
> like:
>
> ;; Alternatives 2 and 3 are to handle cases generated by reload.
> (define_insn "subqi3"
> [(set (match_operand:QI 0 "nonimmediate_operand" "=rYs, rm, &?r, ?&r")
> (minus:QI (match_operand:QI 1 "general_operand" "0, 0, !r, !i")
> (match_operand:QI 2 "general_operand" " riYs, rmi, rmi, r")))]
> ""
> "@
> SUB.B\t%2, %0
> SUB%X0.B\t%2, %0
> MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0
> MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0"
> )
>
> The patches make more use of the first two (cheap) alternatives
> in preference to the third, but sometimes at the cost of introducing
> moves elsewhere. Each alternative counts one line in this test,
> but the third alternative is really two instructions.
>
> (If the port does actually want us to prefer the third alternative
> over introducing moves, then I think the constraints need to be
> changed. Using "!" heavily disparages the alternative and so
> it's reasonable for the optimisers to try hard to avoid it.
> If the alternative is actually the preferred way of handling
> untied operands then the "?" on operand 0 should be enough.)
>
> The arm-* improvements come from patterns like:
>
> (define_insn_and_split "*negdi2_insn"
> [(set (match_operand:DI 0 "s_register_operand" "=r,&r")
> (neg:DI (match_operand:DI 1 "s_register_operand" "0,r")))
> (clobber (reg:CC CC_REGNUM))]
> "TARGET_32BIT"
>
> The patches make IRA assign a saving of one full move to ties between
> operands 0 and 1, whereas previously it would only assign a saving
> of an eigth of a move.
>
> The other big winners (e.g. cris-*, sh-* and visium-*) have similar cases.
>
> I'll post the SVE patches that rely on and test for this later.
>
> Thanks,
> Richard
More information about the Gcc-patches
mailing list