Created attachment 48416 [details] reduced testcase Output: $ aarch64-unknown-linux-gnu-gcc -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre testcase.c -static $ ./a.out `�+%��@� The generated code passes wrong data as the "format" argument; I wasn't able to reproduce this without using __builtin_printf() quickly. ... and w1, w1, 255 //, MEM[(volatile unsigned char *)&x] adrp x0, .LANCHOR0 // tmp98, add x0, x0, :lo12:.LANCHOR0 //, tmp98, bl printf // ... .set .LANCHOR0,. + 0 .LC0: .xword -846514461162629792 .xword 0 .LC1: .string "%02x" ... $ aarch64-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r10-8064-20200430095601-g31e6f829336-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/10.0.1/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r10-8064-20200430095601-g31e6f829336-checking-yes-rtl-df-extra-aarch64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.0.1 20200430 (experimental) (GCC)
Started with r8-7309-g74a9301d6128c0c7a4a8570545b9db40505c93f2
Testcase without printf: __attribute__((noipa)) void foo (const char *p, int q) { if (p[0] != '%' || p[1] != '0' || p[2] != '2' || p[3] != 'x' || p[4] != '\0' || (unsigned char) q != 0x95) __builtin_abort (); } int main () { char x = ((union { __int128 a; char b[16]; }){ .a = 0xF4409395252B9560}).b[1]; for (unsigned i = 0; i < sizeof (x); i++) foo ("%02x", i[(volatile unsigned char *) &x]); return 0; } The difference (before that commit and with that) is: --- pr94873.s1 2020-04-30 14:31:25.834578823 +0200 +++ pr94873.s2 2020-04-30 14:32:46.452456073 +0200 @@ -19,8 +19,11 @@ main: ret .size main, .-main .section .rodata - .align 3 + .align 4 .set .LANCHOR0,. + 0 +.LC0: + .xword -846514461162629792 + .xword 0 .LC1: .string "%02x" .ident "GCC: (GNU) 8.0.1 20180315 (experimental)" i.e. the unused constant somehow made it into the .rodata section.
I think this goes wrong during combine, auto-inc-dec makes (insn 29 6 7 2 (set (reg/f:DI 106) (reg/f:DI 97)) "pr94873.c":11:48 -1 (expr_list:REG_DEAD (reg/f:DI 97) (nil))) (insn 7 29 8 2 (set (reg:TI 95 [ D.3590 ]) (mem/u/c:TI (post_inc:DI (reg/f:DI 106)) [0 S16 A128])) "pr94873.c":11:48 58 {*movti_aarch64} (expr_list:REG_INC (reg/f:DI 106) (expr_list:REG_EQUAL (const_wide_int 0x0f4409395252b9560) (nil)))) ... (insn 19 18 20 2 (set (reg:DI 0 x0) (reg/f:DI 106)) "pr94873.c":13:5 53 {*movdi_aarch64} (expr_list:REG_DEAD (reg/f:DI 106) (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (const_int 16 [0x10]))) (nil)))) where pseudo 97 holds .LANCHOR0 address, at which there is the 16-byte constant followed by the string literal. But combine optimizes that REG_INC insn away and we end up with (insn 19 18 20 2 (set (reg:DI 0 x0) (lo_sum:DI (reg/f:DI 98) (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]))) "pr94873.c":13:5 1054 {add_losym_di} (expr_list:REG_DEAD (reg/f:DI 98) (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (const_int 16 [0x10]))) (nil)))) as you can see, the actual value in the insn isn't equal to what REG_EQUAL note says.
I think this boils down whether it is valid to have REG_EQUAL note on a REG_INC insn or not. The documentation says: "This note is only valid on an insn that sets only one register" but REG_INC insns actually set two registers, the result and increment/decrement some other register. The combiner has code to count the autoincs before/after transformation and punts if they disagree, but because of the REG_EQUAL note the: 1498 /* Temporarily replace the set's source with the 1499 contents of the REG_EQUAL note. The insn will 1500 be deleted or recognized by try_combine. */ ... code just replaces the (insn 7 29 8 2 (set (reg:TI 95 [ D.3590 ]) (mem/u/c:TI (post_inc:DI (reg/f:DI 106)) [0 S16 A128])) "pr94873.c":11:48 58 {*movti_aarch64} (expr_list:REG_INC (reg/f:DI 106) (expr_list:REG_EQUAL (const_wide_int 0x0f4409395252b9560) (nil)))) insn so that it is (insn 7 29 8 2 (set (reg:TI 95 [ D.3590 ]) (const_wide_int 0x0f4409395252b9560)) "pr94873.c":11:48 58 {*movti_aarch64} (expr_list:REG_INC (reg/f:DI 106) (expr_list:REG_EQUAL (const_wide_int 0x0f4409395252b9560) (nil)))) and when counting auto_inc on that, we don't find anything, so happily drop the auto-inc side-effect. So, if REG_EQUAL is invalid on REG_INC insns, the auto-inc pass should throw away those notes from insns which it optimizes, if it is not invalid, combiner needs to ignore REG_EQUAL notes on REG_INC insns or something similar (or count the auto-inc effects on the original vs. replacement and if they disagree, punt).
I can't see how a REG_EQUAL note on an insn with multiple outputs can possibly work -- we wouldn't know what output the REG_EQUAL note refers to. And we have to consider an embedded side effect as having an output. Or to think of it another way, any embedded side effect can be implemented with a parallel at which point it's painfully obvious the insn has multiple outputs and a REG_EQUAL note would be inappropriate.
(In reply to Jeffrey A. Law from comment #5) > I can't see how a REG_EQUAL note on an insn with multiple outputs can > possibly work -- we wouldn't know what output the REG_EQUAL note refers to. > And we have to consider an embedded side effect as having an output. > > Or to think of it another way, any embedded side effect can be implemented > with a parallel at which point it's painfully obvious the insn has multiple > outputs and a REG_EQUAL note would be inappropriate. Yeah, I can see that argument, but to play devil's advocate: I think the requirement for having a single REG SET_DEST makes sense because the REG_EQUAL note would be genuinely ambiguous if there were multiple REG SET_DESTs. But in the case of a REG_INC insn with a single REG SET_DEST, there's no ambiguity about which register is meant. I guess there's also the problem that stack pushes don't need a REG_INC note, so anything we do can't just be keyed off REG_INC. The only sure way to check whether a register is set as a side-effect is to look at the complete pattern (like dse.c:check_for_inc_dec). So I think there's the argument that optimisers have to be wary of this in the same way that they need to be wary of folding: (set (reg X) (and (mem (pre_inc (reg Y))) (reg Z))) into (set (reg X) (const_int 0)) when Z can be proven to be zero.
REG_EQ* is documented as only being allowed on insns that set only one register. If you want to change that, you'll have to check *all* code that consumes this, see if they rely on that fact or not, and if so, change that.
(In reply to Segher Boessenkool from comment #7) > REG_EQ* is documented as only being allowed on insns that set only one > register. If you want to change that, you'll have to check *all* code > that consumes this, see if they rely on that fact or not, and if so, > change that. But the point is that the word "set" is ambiguous. Does it mean set by a SET or set by any means? I think it can be read both ways. After all, a CLOBBER is a form of set too, but that's clearly meant to be excluded. Either way will need auditing. If we say that this kind of REG_EQUAL is wrong, we'd in theory need to audit everything that adds REG_EQUAL notes to make sure it has an appropriate check, or doesn't need one. I'm also not sure if we really are concerned about multiple registers in this particular case, or whether it's more the case that we don't want REG_EQUAL notes on insns with side effects.
"clobber" is a red herring; it is impossible to make a REG_EQ* note for a clobber, a clobber does not set a new value (that is the whole point of a clobber). I think we could allow auto-modify, sure, just as long as it stays clear what lhs the REG_EQ* note is talking about; and, as you say, everything needs to be audited for it :-/
Oh, and ideally, we would replace the whole REG_EQ* stuff with a more powerful interface that is to-the-side, not embedded in the instruction stream. For known exact values, nonzero_bits, known ranges, the works.
(In reply to Segher Boessenkool from comment #9) > "clobber" is a red herring; it is impossible to make a REG_EQ* note for > a clobber, a clobber does not set a new value (that is the whole point > of a clobber). It's not possible to attach a REG_EQ* note to an auto-inc-dec either though (that was my point). So a clobber doesn't seem any less of a red herring than the auto-inc-dec itself. Both set registers in the sense of changing them, and neither can be described by a REG_EQ* note. > I think we could allow auto-modify, sure, just as long as it stays clear > what lhs the REG_EQ* note is talking about; and, as you say, everything > needs to be audited for it :-/ Yeah. But to be clear: I don't think this is more obviously a change from the status quo than going in the other direction would be. The point is that the status quo is ambiguous: the documentation can be read either way, and the implementation isn't consistent (hence the bug). So it's a question of how we resolve the ambiguity. If we want passes to be able to assume without checking that insns with REG_EQ* notes don't also include auto inc-dec, we'll need to audit places that create the notes, or that update insns with existing notes. I think it comes down to what the REG_EQ* notes are supposed to achieve (conceptually, ignoring documentation and the current implementation for now). The "weak" guarantee is that the SET_DEST has the specified value after the instruction. The "strong" guarantee extends the weak guarantee by saying that the SET_SRC of the definition can be replaced by the REG_EQ* note without changing behaviour. Having auto-inc-dec in the SET_SRC of the definition is OK for the weak guarantee but not the strong guarantee. But the same would be true of any SET_SRC with side effects. So to frame the question in a different way: let's assume there's a target-specific intrinsic that has side effects that can be described using unspec_volatile, and that the intrinsic also sets a register. Normally this would be described as: (set (reg X) (unspec_volatile ...)) But if, in a particular context, the target could predict what the value of X was, could it attach a REG_EQ* note to say that? It would then be valid to simplify later uses of X, even though the definition of X can't change. IMO this is easier to answer for REG_EQUIV. That mostly exists to allow the RA to rematerialise a value instead of reloading it. So it's all about replacing the uses of the register rather than about replacing the definition. (I'm not saying that the RA would handle a REG_EQUIV note on the above unspec_volatile correctly -- haven't checked either way -- but in principle it could.) The weak guarantee makes life harder for consumers of the notes that want the strong guarantee, since they then have to check for side effects themselves. The weak guarantee is easy for producers of the notes and for consumers that only care about users of the register. The strong guarantee makes life harder for producers of the notes, or for optimisations that modify insns with existing notes. The strong guarantee is easy for consumers of the notes because it's more conservative. The weak guarantee potentially allows more optimisation than the strong guarantee. I don't think there's much in it. But I guess I personally prefer the weak guarantee for the "more optimisation" reason. There's also a very tenuous analogy with REG_RETURNED, which is explicitly for saying what the return value is, rather than saying how the definition can be rewritten. But whichever we go for, I think it should be a decision about side effects vs. no side effects, with auto inc-dec being just one of several potential side effects. I don't see any reason why the auto-inc-dec case would be different from the unspec_volatile case.
Just noticed that Eric wasn't on cc: (although he might have preferred that it stay that way).
Since Richard kindly invited me to the party, I feel entitled to voice my personal opinion :-) which is apparently aligned with Richard's. I think that we should allow REG_EQUAL notes for insns with exactly one SET of a register, the contents of the note being the value present in this register after the execution of the insn at run time, and disregarding side effects. IMO that's the spirit of the current implementation and thus also probably the most straightforward way out.
So, hrm, we could in principle attach a REG_EQ* note to any single_set instruction?
> So, hrm, we could in principle attach a REG_EQ* note to any single_set > instruction? Yes, I think that's what is currently implemented modulo bugs, although of course we do not create a REG_EQUAL note for every single_set insn in practice.
Ok, so what we do about this bug then if it ought to be combine.c that needs changing? For REG_EQUAL notes in combine_instructions check for the auto-incdec side-effects in the pattern (I'd hope we don't have them in REG_EQUAL notes content) and if there are any, punt? At least for backporting that seems like the right solution, and given that it seems try_combine also punts on these if we remove or add any in the patterns, it seems in line with what the rest of combiner does. Or, shall it e.g. queue the side-effects in some new argument to try_combine and if the combination would succeed, add the side-effect as yet another instruction (if it can match and is cheaper than what we have previously)?
> Ok, so what we do about this bug then if it ought to be combine.c that needs > changing? For REG_EQUAL notes in combine_instructions check for the > auto-incdec side-effects in the pattern (I'd hope we don't have them in > REG_EQUAL notes content) and if there are any, punt? At least for > backporting that seems like the right solution, and given that it seems > try_combine also punts on these if we remove or add any in the patterns, it > seems in line with what the rest of combiner does. At least the fix for the branches seems to be clear: do not do the temporary replacement in combine_instructions if the SET_SRC has side effects; there are a bunch of similar side_effects_p tests in the combiner. > Or, shall it e.g. queue the side-effects in some new argument to try_combine > and if the combination would succeed, add the side-effect as yet another > instruction (if it can match and is cheaper than what we have previously)? That seems overkill to me, even on the mainline, but others might disagree.
Created attachment 48451 [details] gcc11-pr94873.patch Untested patch then.
(In reply to Jakub Jelinek from comment #18) > Created attachment 48451 [details] > gcc11-pr94873.patch > > Untested patch then. This one-liner is pre-approved. Thank you!
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:f14848aea70066777faf201c0b6eb3c5520bfab9 commit r11-127-gf14848aea70066777faf201c0b6eb3c5520bfab9 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed May 6 09:31:19 2020 +0200 combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873] There were some discussions about whether REG_EQUAL notes are valid on insns with a single set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that it should be valid. So, this patch fixes the combiner to punt in that case, because otherwise the auto-inc-dec side-effects from the SET_SRC are lost. 2020-05-06 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/94873 * combine.c (combine_instructions): Don't optimize using REG_EQUAL note if SET_SRC (set) has side-effects. * gcc.dg/pr94873.c: New test.
Fixed on the trunk so far.
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:8982e39b46b1e4a4b09022ddebd758b77ab73bac commit r10-8112-g8982e39b46b1e4a4b09022ddebd758b77ab73bac Author: Jakub Jelinek <jakub@redhat.com> Date: Wed May 6 09:31:19 2020 +0200 combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873] There were some discussions about whether REG_EQUAL notes are valid on insns with a single set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that it should be valid. So, this patch fixes the combiner to punt in that case, because otherwise the auto-inc-dec side-effects from the SET_SRC are lost. 2020-05-06 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/94873 * combine.c (combine_instructions): Don't optimize using REG_EQUAL note if SET_SRC (set) has side-effects. * gcc.dg/pr94873.c: New test.
Fixed for 10.2+ too.
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:0f717ba5975ab42e1176db4cd2384f1862872519 commit r9-8895-g0f717ba5975ab42e1176db4cd2384f1862872519 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed May 6 09:31:19 2020 +0200 combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873] There were some discussions about whether REG_EQUAL notes are valid on insns with a single set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that it should be valid. So, this patch fixes the combiner to punt in that case, because otherwise the auto-inc-dec side-effects from the SET_SRC are lost. 2020-05-06 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/94873 * combine.c (combine_instructions): Don't optimize using REG_EQUAL note if SET_SRC (set) has side-effects. * gcc.dg/pr94873.c: New test. (cherry picked from commit 8982e39b46b1e4a4b09022ddebd758b77ab73bac)
Fixed for 8.5 in r8-10501-g10f44fe62fc461d4333290bfdadee4d2ea1f79d4 and by the above commit for 9.4+ too.