Bug 94873 - [8/9 Regression] wrong code with -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre
Summary: [8/9 Regression] wrong code with -O -fno-merge-constants -fno-split-wide-type...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 10.0
: P2 normal
Target Milestone: 8.5
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2020-04-30 10:44 UTC by Zdenek Sojka
Modified: 2020-09-17 17:49 UTC (History)
9 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: aarch64-unknown-linux-gnu
Build:
Known to work:
Known to fail: 10.0, 8.4.1, 9.3.1
Last reconfirmed: 2020-04-30 00:00:00


Attachments
reduced testcase (182 bytes, text/plain)
2020-04-30 10:44 UTC, Zdenek Sojka
Details
gcc11-pr94873.patch (898 bytes, patch)
2020-05-05 10:22 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2020-04-30 10:44:33 UTC
Created attachment 48416 [details]
reduced testcase

Output:
$ aarch64-unknown-linux-gnu-gcc -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre testcase.c -static
$ ./a.out 
`�+%��@�

The generated code passes wrong data as the "format" argument; I wasn't able to reproduce this without using __builtin_printf() quickly.

...
	and	w1, w1, 255	//, MEM[(volatile unsigned char *)&x]
	adrp	x0, .LANCHOR0	// tmp98,
	add	x0, x0, :lo12:.LANCHOR0	//, tmp98,
	bl	printf		//
...
	.set	.LANCHOR0,. + 0
.LC0:
	.xword	-846514461162629792
	.xword	0
.LC1:
	.string	"%02x"
...

$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r10-8064-20200430095601-g31e6f829336-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/10.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r10-8064-20200430095601-g31e6f829336-checking-yes-rtl-df-extra-aarch64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.0.1 20200430 (experimental) (GCC)
Comment 1 Jakub Jelinek 2020-04-30 12:36:28 UTC
Started with r8-7309-g74a9301d6128c0c7a4a8570545b9db40505c93f2
Comment 2 Jakub Jelinek 2020-04-30 12:42:57 UTC
Testcase without printf:
__attribute__((noipa)) void
foo (const char *p, int q)
{
  if (p[0] != '%' || p[1] != '0' || p[2] != '2' || p[3] != 'x' || p[4] != '\0' || (unsigned char) q != 0x95)
    __builtin_abort ();
}

int
main ()
{
  char x = ((union { __int128 a; char b[16]; }){ .a = 0xF4409395252B9560}).b[1];
  for (unsigned i = 0; i < sizeof (x); i++)
    foo ("%02x", i[(volatile unsigned char *) &x]);
  return 0;
}

The difference (before that commit and with that) is:
--- pr94873.s1	2020-04-30 14:31:25.834578823 +0200
+++ pr94873.s2	2020-04-30 14:32:46.452456073 +0200
@@ -19,8 +19,11 @@ main:
 	ret
 	.size	main, .-main
 	.section	.rodata
-	.align	3
+	.align	4
 	.set	.LANCHOR0,. + 0
+.LC0:
+	.xword	-846514461162629792
+	.xword	0
 .LC1:
 	.string	"%02x"
 	.ident	"GCC: (GNU) 8.0.1 20180315 (experimental)"
i.e. the unused constant somehow made it into the .rodata section.
Comment 3 Jakub Jelinek 2020-04-30 15:00:58 UTC
I think this goes wrong during combine, auto-inc-dec makes
(insn 29 6 7 2 (set (reg/f:DI 106)
        (reg/f:DI 97)) "pr94873.c":11:48 -1
     (expr_list:REG_DEAD (reg/f:DI 97)
        (nil)))
(insn 7 29 8 2 (set (reg:TI 95 [ D.3590 ])
        (mem/u/c:TI (post_inc:DI (reg/f:DI 106)) [0  S16 A128])) "pr94873.c":11:48 58 {*movti_aarch64}
     (expr_list:REG_INC (reg/f:DI 106)
        (expr_list:REG_EQUAL (const_wide_int 0x0f4409395252b9560)
            (nil))))
...
(insn 19 18 20 2 (set (reg:DI 0 x0)
        (reg/f:DI 106)) "pr94873.c":13:5 53 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg/f:DI 106)
        (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
                    (const_int 16 [0x10])))
            (nil))))
where pseudo 97 holds .LANCHOR0 address, at which there is the 16-byte constant followed by the string literal.
But combine optimizes that REG_INC insn away and we end up with
(insn 19 18 20 2 (set (reg:DI 0 x0)
        (lo_sum:DI (reg/f:DI 98)
            (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]))) "pr94873.c":13:5 1054 {add_losym_di}
     (expr_list:REG_DEAD (reg/f:DI 98)
        (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
                    (const_int 16 [0x10])))
            (nil))))
as you can see, the actual value in the insn isn't equal to what REG_EQUAL note says.
Comment 4 Jakub Jelinek 2020-05-01 15:11:10 UTC
I think this boils down whether it is valid to have REG_EQUAL note on a REG_INC insn or not.
The documentation says:
"This note is only valid on an insn that sets only one register"
but REG_INC insns actually set two registers, the result and increment/decrement some other register.
The combiner has code to count the autoincs before/after transformation and punts if they disagree, but because of the REG_EQUAL note the:
1498			  /* Temporarily replace the set's source with the
1499			     contents of the REG_EQUAL note.  The insn will
1500			     be deleted or recognized by try_combine.  */
...
code just replaces the
(insn 7 29 8 2 (set (reg:TI 95 [ D.3590 ])
        (mem/u/c:TI (post_inc:DI (reg/f:DI 106)) [0  S16 A128])) "pr94873.c":11:48 58 {*movti_aarch64}
     (expr_list:REG_INC (reg/f:DI 106)
        (expr_list:REG_EQUAL (const_wide_int 0x0f4409395252b9560)
            (nil))))
insn so that it is
(insn 7 29 8 2 (set (reg:TI 95 [ D.3590 ])
        (const_wide_int 0x0f4409395252b9560)) "pr94873.c":11:48 58 {*movti_aarch64}
     (expr_list:REG_INC (reg/f:DI 106)
        (expr_list:REG_EQUAL (const_wide_int 0x0f4409395252b9560)
            (nil))))
and when counting auto_inc on that, we don't find anything, so happily drop the  auto-inc side-effect.

So, if REG_EQUAL is invalid on REG_INC insns, the auto-inc pass should throw away those notes from insns which it optimizes, if it is not invalid, combiner
needs to ignore REG_EQUAL notes on REG_INC insns or something similar (or count the auto-inc effects on the original vs. replacement and if they disagree, punt).
Comment 5 Jeffrey A. Law 2020-05-01 15:21:16 UTC
I can't see how a REG_EQUAL note on an insn with multiple outputs can possibly work -- we wouldn't know what output the REG_EQUAL note refers to.  And we have to consider an embedded side effect as having an output.

Or to think of it another way, any embedded side effect can be implemented with a parallel at which point it's painfully obvious the insn has multiple outputs and a REG_EQUAL note would be inappropriate.
Comment 6 Richard Sandiford 2020-05-01 17:17:40 UTC
(In reply to Jeffrey A. Law from comment #5)
> I can't see how a REG_EQUAL note on an insn with multiple outputs can
> possibly work -- we wouldn't know what output the REG_EQUAL note refers to. 
> And we have to consider an embedded side effect as having an output.
> 
> Or to think of it another way, any embedded side effect can be implemented
> with a parallel at which point it's painfully obvious the insn has multiple
> outputs and a REG_EQUAL note would be inappropriate.

Yeah, I can see that argument, but to play devil's advocate:

I think the requirement for having a single REG SET_DEST makes
sense because the REG_EQUAL note would be genuinely ambiguous if
there were multiple REG SET_DESTs.  But in the case of a REG_INC
insn with a single REG SET_DEST, there's no ambiguity about which
register is meant.

I guess there's also the problem that stack pushes don't need a
REG_INC note, so anything we do can't just be keyed off REG_INC.
The only sure way to check whether a register is set as a side-effect
is to look at the complete pattern (like dse.c:check_for_inc_dec).

So I think there's the argument that optimisers have to be wary
of this in the same way that they need to be wary of folding:

   (set (reg X)
        (and (mem (pre_inc (reg Y)))
             (reg Z)))

into

   (set (reg X) (const_int 0))

when Z can be proven to be zero.
Comment 7 Segher Boessenkool 2020-05-01 18:31:31 UTC
REG_EQ* is documented as only being allowed on insns that set only one
register.  If you want to change that, you'll have to check *all* code
that consumes this, see if they rely on that fact or not, and if so,
change that.
Comment 8 Richard Sandiford 2020-05-01 18:52:30 UTC
(In reply to Segher Boessenkool from comment #7)
> REG_EQ* is documented as only being allowed on insns that set only one
> register.  If you want to change that, you'll have to check *all* code
> that consumes this, see if they rely on that fact or not, and if so,
> change that.

But the point is that the word "set" is ambiguous.  Does it mean
set by a SET or set by any means?  I think it can be read both ways.
After all, a CLOBBER is a form of set too, but that's clearly meant
to be excluded.  

Either way will need auditing.  If we say that this kind of REG_EQUAL
is wrong, we'd in theory need to audit everything that adds REG_EQUAL
notes to make sure it has an appropriate check, or doesn't need one.

I'm also not sure if we really are concerned about multiple registers
in this particular case, or whether it's more the case that we don't
want REG_EQUAL notes on insns with side effects.
Comment 9 Segher Boessenkool 2020-05-04 19:09:50 UTC
"clobber" is a red herring; it is impossible to make a REG_EQ* note for
a clobber, a clobber does not set a new value (that is the whole point
of a clobber).

I think we could allow auto-modify, sure, just as long as it stays clear
what lhs the REG_EQ* note is talking about; and, as you say, everything
needs to be audited for it :-/
Comment 10 Segher Boessenkool 2020-05-04 19:26:10 UTC
Oh, and ideally, we would replace the whole REG_EQ* stuff with a more
powerful interface that is to-the-side, not embedded in the instruction
stream.  For known exact values, nonzero_bits, known ranges, the works.
Comment 11 Richard Sandiford 2020-05-04 19:58:57 UTC
(In reply to Segher Boessenkool from comment #9)
> "clobber" is a red herring; it is impossible to make a REG_EQ* note for
> a clobber, a clobber does not set a new value (that is the whole point
> of a clobber).

It's not possible to attach a REG_EQ* note to an auto-inc-dec
either though (that was my point).  So a clobber doesn't seem
any less of a red herring than the auto-inc-dec itself.  Both
set registers in the sense of changing them, and neither can
be described by a REG_EQ* note.

> I think we could allow auto-modify, sure, just as long as it stays clear
> what lhs the REG_EQ* note is talking about; and, as you say, everything
> needs to be audited for it :-/

Yeah.  But to be clear: I don't think this is more obviously
a change from the status quo than going in the other direction
would be.  The point is that the status quo is ambiguous:
the documentation can be read either way, and the
implementation isn't consistent (hence the bug).

So it's a question of how we resolve the ambiguity.
If we want passes to be able to assume without checking
that insns with REG_EQ* notes don't also include auto
inc-dec, we'll need to audit places that create the notes,
or that update insns with existing notes.

I think it comes down to what the REG_EQ* notes are supposed
to achieve (conceptually, ignoring documentation and the
current implementation for now).  The "weak" guarantee is
that the SET_DEST has the specified value after the
instruction.  The "strong" guarantee extends the weak guarantee
by saying that the SET_SRC of the definition can be replaced
by the REG_EQ* note without changing behaviour.

Having auto-inc-dec in the SET_SRC of the definition is
OK for the weak guarantee but not the strong guarantee.
But the same would be true of any SET_SRC with side effects.
So to frame the question in a different way: let's assume
there's a target-specific intrinsic that has side effects
that can be described using unspec_volatile, and that the
intrinsic also sets a register.  Normally this would be
described as:

  (set (reg X) (unspec_volatile ...))

But if, in a particular context, the target could predict
what the value of X was, could it attach a REG_EQ* note
to say that?  It would then be valid to simplify later
uses of X, even though the definition of X can't change.

IMO this is easier to answer for REG_EQUIV.  That mostly
exists to allow the RA to rematerialise a value instead
of reloading it.  So it's all about replacing the uses
of the register rather than about replacing the definition.
(I'm not saying that the RA would handle a REG_EQUIV note
on the above unspec_volatile correctly -- haven't checked
either way -- but in principle it could.)

The weak guarantee makes life harder for consumers of the
notes that want the strong guarantee, since they then have
to check for side effects themselves.  The weak guarantee
is easy for producers of the notes and for consumers that
only care about users of the register.

The strong guarantee makes life harder for producers of the
notes, or for optimisations that modify insns with existing
notes.  The strong guarantee is easy for consumers of the notes
because it's more conservative.

The weak guarantee potentially allows more optimisation
than the strong guarantee.

I don't think there's much in it.  But I guess I personally
prefer the weak guarantee for the "more optimisation" reason.
There's also a very tenuous analogy with REG_RETURNED,
which is explicitly for saying what the return value is,
rather than saying how the definition can be rewritten.

But whichever we go for, I think it should be a decision
about side effects vs. no side effects, with auto inc-dec
being just one of several potential side effects.  I don't
see any reason why the auto-inc-dec case would be different
from the unspec_volatile case.
Comment 12 Richard Sandiford 2020-05-04 20:47:59 UTC
Just noticed that Eric wasn't on cc: (although he might have
preferred that it stay that way).
Comment 13 Eric Botcazou 2020-05-04 22:12:57 UTC
Since Richard kindly invited me to the party, I feel entitled to voice my personal opinion :-) which is apparently aligned with Richard's.  I think that we should allow REG_EQUAL notes for insns with exactly one SET of a register, the contents of the note being the value present in this register after the execution of the insn at run time, and disregarding side effects.

IMO that's the spirit of the current implementation and thus also probably the most straightforward way out.
Comment 14 Segher Boessenkool 2020-05-04 23:16:24 UTC
So, hrm, we could in principle attach a REG_EQ* note to any single_set
instruction?
Comment 15 Eric Botcazou 2020-05-05 07:24:46 UTC
> So, hrm, we could in principle attach a REG_EQ* note to any single_set
> instruction?

Yes, I think that's what is currently implemented modulo bugs, although of course we do not create a REG_EQUAL note for every single_set insn in practice.
Comment 16 Jakub Jelinek 2020-05-05 07:34:55 UTC
Ok, so what we do about this bug then if it ought to be combine.c that needs changing?  For REG_EQUAL notes in combine_instructions check for the auto-incdec side-effects in the pattern (I'd hope we don't have them in REG_EQUAL notes content) and if there are any, punt?  At least for backporting that seems like the right solution, and given that it seems try_combine also punts on these if we remove or add any in the patterns, it seems in line with what the rest of combiner does.  Or, shall it e.g. queue the side-effects in some new argument to try_combine and if the combination would succeed, add the side-effect as yet another instruction (if it can match and is cheaper than what we have previously)?
Comment 17 Eric Botcazou 2020-05-05 08:04:25 UTC
> Ok, so what we do about this bug then if it ought to be combine.c that needs
> changing?  For REG_EQUAL notes in combine_instructions check for the
> auto-incdec side-effects in the pattern (I'd hope we don't have them in
> REG_EQUAL notes content) and if there are any, punt?  At least for
> backporting that seems like the right solution, and given that it seems
> try_combine also punts on these if we remove or add any in the patterns, it
> seems in line with what the rest of combiner does.

At least the fix for the branches seems to be clear: do not do the temporary replacement in combine_instructions if the SET_SRC has side effects; there are a bunch of similar side_effects_p tests in the combiner.

> Or, shall it e.g. queue the side-effects in some new argument to try_combine
> and if the combination would succeed, add the side-effect as yet another 
> instruction (if it can match and is cheaper than what we have previously)?

That seems overkill to me, even on the mainline, but others might disagree.
Comment 18 Jakub Jelinek 2020-05-05 10:22:17 UTC
Created attachment 48451 [details]
gcc11-pr94873.patch

Untested patch then.
Comment 19 Segher Boessenkool 2020-05-05 15:02:17 UTC
(In reply to Jakub Jelinek from comment #18)
> Created attachment 48451 [details]
> gcc11-pr94873.patch
> 
> Untested patch then.

This one-liner is pre-approved.  Thank you!
Comment 20 GCC Commits 2020-05-06 07:34:44 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f14848aea70066777faf201c0b6eb3c5520bfab9

commit r11-127-gf14848aea70066777faf201c0b6eb3c5520bfab9
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed May 6 09:31:19 2020 +0200

    combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873]
    
    There were some discussions about whether REG_EQUAL notes are valid on insns with a single
    set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that
    it should be valid.  So, this patch fixes the combiner to punt in that case, because otherwise
    the auto-inc-dec side-effects from the SET_SRC are lost.
    
    2020-05-06  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/94873
            * combine.c (combine_instructions): Don't optimize using REG_EQUAL
            note if SET_SRC (set) has side-effects.
    
            * gcc.dg/pr94873.c: New test.
Comment 21 Jakub Jelinek 2020-05-06 11:50:31 UTC
Fixed on the trunk so far.
Comment 22 GCC Commits 2020-05-07 13:28:00 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:8982e39b46b1e4a4b09022ddebd758b77ab73bac

commit r10-8112-g8982e39b46b1e4a4b09022ddebd758b77ab73bac
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed May 6 09:31:19 2020 +0200

    combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873]
    
    There were some discussions about whether REG_EQUAL notes are valid on insns with a single
    set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that
    it should be valid.  So, this patch fixes the combiner to punt in that case, because otherwise
    the auto-inc-dec side-effects from the SET_SRC are lost.
    
    2020-05-06  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/94873
            * combine.c (combine_instructions): Don't optimize using REG_EQUAL
            note if SET_SRC (set) has side-effects.
    
            * gcc.dg/pr94873.c: New test.
Comment 23 Jakub Jelinek 2020-05-07 13:33:48 UTC
Fixed for 10.2+ too.
Comment 24 GCC Commits 2020-09-16 19:21:51 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:0f717ba5975ab42e1176db4cd2384f1862872519

commit r9-8895-g0f717ba5975ab42e1176db4cd2384f1862872519
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed May 6 09:31:19 2020 +0200

    combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873]
    
    There were some discussions about whether REG_EQUAL notes are valid on insns with a single
    set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that
    it should be valid.  So, this patch fixes the combiner to punt in that case, because otherwise
    the auto-inc-dec side-effects from the SET_SRC are lost.
    
    2020-05-06  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/94873
            * combine.c (combine_instructions): Don't optimize using REG_EQUAL
            note if SET_SRC (set) has side-effects.
    
            * gcc.dg/pr94873.c: New test.
    
    (cherry picked from commit 8982e39b46b1e4a4b09022ddebd758b77ab73bac)
Comment 25 Jakub Jelinek 2020-09-17 17:49:05 UTC
Fixed for 8.5 in r8-10501-g10f44fe62fc461d4333290bfdadee4d2ea1f79d4 and by the above commit for 9.4+ too.