Since r8-5608-gd555138e648961fdc572d8afdb234b52978828f9 the following testcase ICEs at -O2 -fPIC on x86_64-linux: long a, b; void bar (char *, long); void baz (char, char); void qux (char *, char *); void foo (void) { while (1) { char c, d, e, f; bar (&c, a); bar (&d, b); baz (c, d); qux (&e, &f); double g = 0; __asm__("" : : "norfxy" (g)); } } during RTL pass: reload dump file: rh2027386.c.301r.reload rh2027386.c: In function ‘foo’: rh2027386.c:19:1: internal compiler error: maximum number of generated reload insns per insn achieved (90) 19 | } | ^ 0x11156b7 lra_constraints(bool) ../../gcc/lra-constraints.c:5084 0x10fe2de lra(_IO_FILE*) ../../gcc/lra.c:2336 0x10a590d do_reload ../../gcc/ira.c:5932 0x10a5dfc execute ../../gcc/ira.c:6118 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Similar LRA looping on these "norfxy" constraints has been fixed with r9-9463-g49cc1253d079bbefc1 but not in this testcase. One thing is it would be nice to avoid the LRA looping (dunno what is at fault, whether LRA or the backend). Another one is I wonder if the cheapest reload when the insn allows memory wouldn't be to use the literal pool memory. E.g. on void foo (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "m" (d), "m" (e)); } void bar (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "mr" (d), "mr" (e)); } void baz (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "mrx" (d), "mrx" (e)); } void qux (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "mrfx" (d), "mrfx" (e)); } for foo we emit a weird load of the floating point constants from constant pool, store those on stack and use those stack memories as operands (this isn't RA fault, but expansion fault), while for bar-qux the combiner combines the constant pool memories into the inline asm and they survive RA there. So, after the looping is fixed, it would be nice if the RA also considered moving constant pool MEMs (they are constant, can't be clobbered by function calls etc. in between) to input operands that accept memory. Note, systemtap changed recently the norfxy to norx for x86_64, I think both the y and f in there are too dangerous, but even with norx constraint, if a floating point constant is used and combiner doesn't combine it for some reason (e.g. multiple uses), it would be nice if for the systemtap macros they were as cheap as possible and thus avoiding runtime code to compute the values when possible.
I can not reproduce ICE on this week GCC. Probably it was fixed (or switched off) by some recent RA patch. As for the second issue (code generation for function foo), I thought for some time how it could be fixed. It seemed that LRA inheritance sub-pass could be extended to work on memory too besides regs. But I got to conclusion that it would complicate already complicated LRA (inheritance subpass) more as we need to add sophisticated analysis (including aliasing) for memory. I guess there is an simpler alternative solution. The problem would disappear if double constant were in asm insn before LRA. I think some pass before RA could this. It could be driven by a target, for example to promote double constants for x86-64. Also the problem might be solved if we had pseudo<-double insn instead of mem<-double insn before LRA, LRA code dealing with equiv could promote double into the asm insn (although I am not 100% sure about this but, if it is not the case, probably code dealing with equiv could be tweaked to do this). So my proposal is to solve the problem somehow outside RA.
#c0 doesn't ICE on the trunk since r12-5944-ga7acb6dca941db2b1c135107dac3a34a20650d5c
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.