Add a new combine pass
Richard Sandiford
richard.sandiford@arm.com
Mon Nov 25 22:09:00 GMT 2019
Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Mon, Nov 18, 2019 at 05:55:13PM +0000, Richard Sandiford wrote:
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > (It's 23:35 local time, so it's still just about stage 1. :-))
>>
>> Or actually, just under 1 day after end of stage 1. Oops.
>> Could have sworn stage 1 ended on the 17th :-( Only realised
>> I'd got it wrong when catching up on Saturday's email traffic.
>>
>> And inevitably, I introduced a couple of stupid mistakes while
>> trying to clean the patch up for submission by that (non-)deadline.
>> Here's a version that fixes an inverted overlapping memref check
>> and that correctly prunes the use list for combined instructions.
>> (This last one is just a compile-time saving -- the old code was
>> correct, just suboptimal.)
>
> I've build the Linux kernel with the previous version, as well as this
> one. R0 is unmodified GCC, R1 is the first patch, R2 is this one:
>
> (I've forced --param=run-combine=6 for R1 and R2):
> (Percentages are relative to R0):
>
> R0 R1 R2 R1 R2
> alpha 6107088 6101088 6101088 99.902% 99.902%
> arc 4008224 4006568 4006568 99.959% 99.959%
> arm 9206728 9200936 9201000 99.937% 99.938%
> arm64 13056174 13018174 13018194 99.709% 99.709%
> armhf 0 0 0 0 0
> c6x 2337237 2337077 2337077 99.993% 99.993%
> csky 3356602 0 0 0 0
> h8300 1166996 1166776 1166776 99.981% 99.981%
> i386 11352159 0 0 0 0
> ia64 18230640 18167000 18167000 99.651% 99.651%
> m68k 3714271 0 0 0 0
> microblaze 4982749 4979945 4979945 99.944% 99.944%
> mips 8499309 8495205 8495205 99.952% 99.952%
> mips64 7042036 7039816 7039816 99.968% 99.968%
> nds32 4486663 0 0 0 0
> nios2 3680001 3679417 3679417 99.984% 99.984%
> openrisc 4226076 4225868 4225868 99.995% 99.995%
> parisc 7681895 7680063 7680063 99.976% 99.976%
> parisc64 8677077 8676581 8676581 99.994% 99.994%
> powerpc 10687611 10682199 10682199 99.949% 99.949%
> powerpc64 17671082 17658570 17658570 99.929% 99.929%
> powerpc64le 17671082 17658570 17658570 99.929% 99.929%
> riscv32 1554938 1554758 1554758 99.988% 99.988%
> riscv64 6634342 6632788 6632788 99.977% 99.977%
> s390 13049643 13014939 13014939 99.734% 99.734%
> sh 3254743 0 0 0 0
> shnommu 1632364 1632124 1632124 99.985% 99.985%
> sparc 4404993 4399593 4399593 99.877% 99.877%
> sparc64 6796711 6797491 6797491 100.011% 100.011%
> x86_64 19713174 19712817 19712817 99.998% 99.998%
> xtensa 0 0 0 0 0
Thanks for running these.
> There are fivew new failures, with either of the combine2 patches. And
> all five are actually different (different symptoms, at least):
>
> - csky fails on libgcc build:
>
> /home/segher/src/gcc/libgcc/fp-bit.c: In function '__fixdfsi':
> /home/segher/src/gcc/libgcc/fp-bit.c:1405:1: error: unable to generate reloads for:
> 1405 | }
> | ^
> (insn 199 86 87 8 (parallel [
> (set (reg:SI 101)
> (plus:SI (reg:SI 98)
> (const_int -32 [0xffffffffffffffe0])))
> (set (reg:CC 33 c)
> (lt:CC (plus:SI (reg:SI 98)
> (const_int -32 [0xffffffffffffffe0]))
> (const_int 0 [0])))
> ]) "/home/segher/src/gcc/libgcc/fp-bit.c":1403:23 207 {*cskyv2_declt}
> (nil))
> during RTL pass: reload
>
> Target problem?
Yeah, looks like it. The pattern is:
(define_insn "*cskyv2_declt"
[(set (match_operand:SI 0 "register_operand" "=r")
(plus:SI (match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "const_int_operand" "Uh")))
(set (reg:CC CSKY_CC_REGNUM)
(lt:CC (plus:SI (match_dup 1) (match_dup 2))
(const_int 0)))]
"CSKY_ISA_FEATURE (2E3)"
"declt\t%0, %1, %M2"
)
So the predicate accepts all const_ints but the constraint doesn't.
> - i386 goes into an infinite loop compiling, or at least an hour or so...
> Erm I forgot too record what it was compiling. I did attach a GDB... It
> is something from lra_create_live_ranges.
Hmm.
> - m68k:
>
> /home/segher/src/kernel/fs/exec.c: In function 'copy_strings':
> /home/segher/src/kernel/fs/exec.c:590:1: internal compiler error: in final_scan_insn_1, at final.c:3048
> 590 | }
> | ^
> 0x10408307 final_scan_insn_1
> /home/segher/src/gcc/gcc/final.c:3048
> 0x10408383 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
> /home/segher/src/gcc/gcc/final.c:3152
> 0x10408797 final_1
> /home/segher/src/gcc/gcc/final.c:2020
> 0x104091f7 rest_of_handle_final
> /home/segher/src/gcc/gcc/final.c:4658
> 0x104091f7 execute
> /home/segher/src/gcc/gcc/final.c:4736
>
> and that line is
> gcc_assert (prev_nonnote_insn (insn) == last_ignored_compare);
Ah, this'll be while m68k was still a cc0 target. Yeah, I should probably
just skip the whole pass for cc0.
> - nds32:
>
> /tmp/ccC8Czca.s: Assembler messages:
> /tmp/ccC8Czca.s:3144: Error: Unrecognized operand/register, lmw.bi [$fp+(-60)],[$fp],$r11,0x0.
>
> /tmp/ccl8o20c.s: Assembler messages:
> /tmp/ccl8o20c.s:2449: Error: Unrecognized operand/register, lmw.bi $r9,[$fp],[$fp+(-132)],0x0.
>
> /tmp/ccZxjwHd.s: Assembler messages:
> /tmp/ccZxjwHd.s:4776: Error: Unrecognized operand/register, lmw.bi [$fp+(-52)],[$fp],[$fp+(-56)],0x0.
>
> /tmp/cczjOS3d.s: Assembler messages:
> /tmp/cczjOS3d.s:2336: Error: Unrecognized operand/register, lmw.bi $r16,[$fp],$r7,0x0.
>
> and more. All lmw.bi... target issue?
Yeah, looks like it wasn't expecting this pattern to be generated
automatically before RA, so it doesn't have constraints (and probably
couldn't, since the registers need to be consecutive).
> - sh (that's sh4-linux):
>
> /home/segher/src/kernel/net/ipv4/af_inet.c: In function 'snmp_get_cpu_field':
> /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find a register to spill in class 'R0_REGS'
> 1638 | }
> | ^
> /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the insn:
> (insn 18 17 19 2 (set (reg:SI 0 r0)
> (mem:SI (plus:SI (reg:SI 4 r4 [178])
> (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> (nil))))
> /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier errors, bailing out
Would have to look more at this one. Seems odd that it can't allocate
R0 when it's already the destination and when R0 can't be live before
the insn. But there again, this is reload, so my enthuasiasm for looking
is a bit limited :-)
> Looking at just binary size, which is a good stand-in for how many insns
> it combined:
>
> R2
> arm64 99.709%
> ia64 99.651%
> s390 99.734%
> sparc 99.877%
> sparc64 100.011%
>
> (These are those that are not between 99.9% and 100.0%).
>
> So only sparc64 regressed, and just a tiny bit (I can look at what that
> is, if there is interest). But 32-bit sparc improved, and s390, arm64,
> and ia64 got actual benefit.
>
> Again this is just code size, not analysing the actually changed code.
OK. Certainly not an earth-shattering improvement then, but not
entirely worthless either.
> I did look at the powerpc64le changes. It is almost completely load-
> with-update (and store-with-update) insns that make the difference, but
> there are also some dot insns. The extra mr. are usually not a good
> idea, but the extsw. are. Sometimes this causes *more* insns in the end
> (register move insns), but that is the exception.
>
> This mr. problem is there with combine already, btw. In the end it is
> caused by this just not being something good to do on pseudos, it would
> be better to do this after RA, in a peephole or similar. OTOH it isn't
> actually really important for performance either way.
>
> Btw, does the new pass use TARGET_LEGITIMATE_COMBINED_INSN? It probably
> should. (That would be the hook where we would probably want to prevent
> generating mr. insns).
No, it doesn't use that yet, but I agree it should. Will fix.
I see combine also tests cannot_copy_insn_p. I'm not sure whether that's
appropriate for the new pass or not. Arguably it's not copying the
instruction, it's just moving it to be in parallel with something else.
(But then that's largely true of the combine case too.)
Thanks,
Richard
More information about the Gcc-patches
mailing list