Bug 103395 - [9/10/11/12/13 Regression] ICE on qemu in arm create_fix_barrier
Summary: [9/10/11/12/13 Regression] ICE on qemu in arm create_fix_barrier
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.0
: P2 normal
Target Milestone: 9.5
Assignee: Not yet assigned to anyone
URL:
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2021-11-23 19:32 UTC by Jakub Jelinek
Modified: 2022-04-28 15:02 UTC (History)
11 users (show)

See Also:
Host:
Target: armv7hl-linux-gnueabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-11-24 00:00:00


Attachments
qemuice.i.xz (124.19 KB, application/octet-stream)
2021-11-24 13:18 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2021-11-23 19:32:46 UTC
Since GCC 5 (r208990 still works fine, r215478 ICEs already up to current trunk)
the following testcase ICEs on armv7hl-linux-gnueabi with -O2:
void
foo (void)
{
  __asm__ ("\n\n\n\n\n\n\n\n\n\n"
	   "\n\n\n\n\n\n\n\n\n\n"
	   "\n\n\n\n\n\n\n\n\n\n"
	   "\n\n\n\n\n\n\n\n\n\n"
	   "\n\n\n\n\n\n\n\n\n\n"
	   "\n\n\n\n\n\n\n\n\n\n"
           "\n" : : "nor" (0), "nor" (0));
}
Removing just one newline makes the ICE go away.  get_attr_length for such inline asm is 248 bytes (estimation) and the arm minipool code is apparently upset about it.
Comment 1 Richard W.M. Jones 2021-11-23 22:07:29 UTC
Nice reproducer!

Here's the original thread where the bug was reported when compiling qemu on Fedora Rawhide for armv7: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/GD3ABSWD6HHTNEKV2EJY4PXABQ245UCZ/
Comment 2 Andrew Pinski 2021-11-24 09:32:37 UTC
Confirmed.
Comment 3 Jakub Jelinek 2021-11-24 09:40:03 UTC
Note, in qemu it isn't just those newlines but simply a long inline asm.
It can very well be something that is empty or very short in the .text section,
e.g. can .pushsection; .section whatever; put lots of stuff in there; .popsection.
But as inline asm is a black box to gcc, we have only very rough estimation for the inline asm text length, which just counts how many newlines and typically ; characters there are and multiple that by some insn size.
Comment 4 Richard Earnshaw 2021-11-24 12:16:48 UTC
I suspect the best we're likely to be able to do is to downgrade the ICE into an error if there's a long inline ASM in the code, much like the way we handle invalid constraints in inline ASMs.
Comment 5 Jakub Jelinek 2021-11-24 12:25:50 UTC
That will still mean qemu will not compile.
I admit I know next to nothing about the arm minipool stuff, but can't whatever needs to be inserted be inserted either before or after the inline asm, if it is needed for the asm inputs then likely before that?
Comment 6 Richard Earnshaw 2021-11-24 13:05:08 UTC
It depends on the the reference.  Some minipool references can only be forwards due to the nature of the instruction making the reference.  I'll need to take a look, and I might also need a real testcase, not just the reduced one below.
Comment 7 Jakub Jelinek 2021-11-24 13:18:25 UTC
Created attachment 51867 [details]
qemuice.i.xz

Unreduced preprocessed source.
Comment 8 Richard Earnshaw 2021-11-24 16:35:26 UTC
OK, so the real problem in the test case is the constraint ("nor") is meaningless on Arm (because there is no instruction in the architecture that can accept both a constant and a memref) and this confuses the minipool code because it exploits this restriction to detect insns that need to be reworked by the md_reorg pass.

When processing an ASM we allow only a forward literal pool reference and it must be less than 250 bytes beyond the /start/ of the pattern (because we don't know where in the asm it gets used).  So we have to deduct from that range 4 bytes for every asm statement: add too many lines to the asm and we reach the point where it is impossible to place the literal pool even directly after the asm.

So I think really this is an ICE on invalid, because the constraint is not meaningful on Arm.
Comment 9 Jakub Jelinek 2021-11-24 16:52:48 UTC
CCing Frank as this is systemtap sys/sdt.h
which has:
# ifndef STAP_SDT_ARG_CONSTRAINT
# if defined __powerpc__
# define STAP_SDT_ARG_CONSTRAINT        nZr
# else
# define STAP_SDT_ARG_CONSTRAINT        nor
# endif
# endif

All of n, o and r are generic constraints and const0_rtx is valid for the "n" constraint, so why is the backend trying to put it into memory at all?
What is systemtap trying to do is not use those operands in any instruction, but note for the debugger how to find out the value of the asm input operand (read some register, some memory or the immediate constant).
Comment 10 Richard Earnshaw 2021-11-24 18:17:21 UTC
It's been this way now for over 20 years.  A single instruction cannot take a constant and a memory for the same operand, so this has been used in the backend to trigger the minipool generation: a constant in an operand that takes a memory triggers a minipool spill to be pushed.

If we changed this now we risk breaking existing inline asms that exploit this feature (good or bad) and expect a constant to be pushed into a minipool entry.
Comment 11 Jakub Jelinek 2021-11-24 18:57:22 UTC
Inline asm that only works with memory but in constraints says it accepts both immediate constant and memory is IMNSHO just broken, it is just fine if the compiler makes a different choice.
If "nor" with constant input on arm has meant actually just "or", then sure, systemtap could be changed and after a couple of years it will propagate to all stap copies used in the wild, but it is quite severe misoptimization of one of the most common cases.  The systemtap macros don't really know what argument will be passed to them, whether a constant, something that lives in memory, something that lives in a register and ideally wants as few actual instructions before those macros as possible to arrange the arguments so that debugger can inspect them.
Comment 12 Jakub Jelinek 2021-11-24 18:58:54 UTC
Note, sys/sdt.h is using the "nor" constraints for 11 years already.
Comment 13 Florian Weimer 2021-11-25 10:03:07 UTC
Maybe it's possible to provide specific, architecture-independent constraints for Systemtap-like use cases?
Comment 14 Jakub Jelinek 2021-11-25 10:16:45 UTC
If it can be proven that all gcc versions until now treat "nor" constraint as ignoring the n in there and pushing all constants into constant pool, I think it could be changed into "or" for arm32.  But it would be IMNSHO unnecessary pessimization (but it could e.g. be done for GCC < 12 or whenever this would be fixed).
Another option is to tweak whatever generates those large inline asms.
In the qemu case it is created with
/usr/bin/python3 ../scripts/tracetool.py --backend=dtrace --group=util --format=h /builddir/build/BUILD/qemu-6.1.0/util/trace-events trace/trace-util.h                               
whatever that is (but that means I haven't actually seen what it generates).
Note, apparently several other packages are affected, so not sure what changed recently in systemtap-sdt-devel or whatever else that adds up to this.
In the preprocessed source I got I see several blocks of
   ".ascii \"\\x20\""
   "\n"
   "_SDT_SIGN %n[_SDT_S4]"
   "\n"
   "_SDT_SIZE %n[_SDT_S4]"
   "\n"
   "_SDT_TYPE %n[_SDT_S4]"
   "\n"
   ".ascii \"%[_SDT_A4]\""
   "\n"
It already uses assembler macros, perhaps adding a macro to do all 3 at once or perhaps with something extra could bring the number of newlines down...
Comment 15 Jakub Jelinek 2021-11-25 11:11:57 UTC
Apparently the change on the systemtap side was:
https://sourceware.org/git/?p=systemtap.git;a=commit;f=includes/sys/sdt.h;h=eaa15b047688175a94e3ae796529785a3a0af208
which indeed adds a lot of newlines to the inline asms.
But when already using gas macros, I wonder if all that
#define _SDT_ASM_TEMPLATE_1             _SDT_ARGFMT(1)
#define _SDT_ASM_TEMPLATE_2             _SDT_ASM_TEMPLATE_1 _SDT_ASM_BLANK _SDT_ARGFMT(2)
#define _SDT_ASM_TEMPLATE_3             _SDT_ASM_TEMPLATE_2 _SDT_ASM_BLANK _SDT_ARGFMT(3)
#define _SDT_ASM_TEMPLATE_4             _SDT_ASM_TEMPLATE_3 _SDT_ASM_BLANK _SDT_ARGFMT(4)
#define _SDT_ASM_TEMPLATE_5             _SDT_ASM_TEMPLATE_4 _SDT_ASM_BLANK _SDT_ARGFMT(5)
#define _SDT_ASM_TEMPLATE_6             _SDT_ASM_TEMPLATE_5 _SDT_ASM_BLANK _SDT_ARGFMT(6)
#define _SDT_ASM_TEMPLATE_7             _SDT_ASM_TEMPLATE_6 _SDT_ASM_BLANK _SDT_ARGFMT(7)
#define _SDT_ASM_TEMPLATE_8             _SDT_ASM_TEMPLATE_7 _SDT_ASM_BLANK _SDT_ARGFMT(8)
#define _SDT_ASM_TEMPLATE_9             _SDT_ASM_TEMPLATE_8 _SDT_ASM_BLANK _SDT_ARGFMT(9)
#define _SDT_ASM_TEMPLATE_10            _SDT_ASM_TEMPLATE_9 _SDT_ASM_BLANK _SDT_ARGFMT(10)
#define _SDT_ASM_TEMPLATE_11            _SDT_ASM_TEMPLATE_10 _SDT_ASM_BLANK _SDT_ARGFMT(11)
#define _SDT_ASM_TEMPLATE_12            _SDT_ASM_TEMPLATE_11 _SDT_ASM_BLANK _SDT_ARGFMT(12)
couldn't be rewritten into use of another .macro _SDT_ASM_TEMPLATE that just takes emits it all.
See the
             .macro  sum from=0, to=5
             .long   \from
             .if     \to-\from
             sum     "(\from+1)",\to
             .endif
             .endm
macro from gas documentation for inspiration.
Comment 16 Jakub Jelinek 2021-11-25 11:25:27 UTC
Note, the %n[_SDT_S##no] in there need to stay (dunno about the
_SDT_ASM_SUBSTR(_SDT_ARGTMPL(_SDT_A##no)) stuff), but that could be achieved
by giving the macro from, to, arg, args:vararg arguments and use it like:
  _SDT_ASM_TEMPLATE 1, 4, %n[_SDT_S1], %n[_SDT_S2], %n[_SDT_S3], %n[_SDT_S4]
Comment 17 Daniel Berrange 2021-11-26 09:38:42 UTC
FYI, I've opened a bug against systemtap in Fedora to track this problem on that side too https://bugzilla.redhat.com/show_bug.cgi?id=2026858
Comment 18 Jakub Jelinek 2021-11-26 13:46:29 UTC
Note that we document how the size of asm is estimated:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
and unfortunately asm inline ("..." ...) makes the size estimation 0 only for inlining purposes and not for others too.
So, for systemtap it is still desirable to use as few newlines/; characters in the pattern as possible.  If one macro is used to handle 1-12 operands through recursions, one way to save a few newlines would be avoid all those other 5 macros, each one is used just once for each parameter and avoiding their .macro, .endm and .purgem lines will get rid of 15 lines...
Similarly, not doing .pushsection/.popsection 3 times for each argument but just once would help etc.