Bug 100342 - [10 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
Summary: [10 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 11.1.1
: P2 normal
Target Milestone: 10.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2021-04-29 20:10 UTC by Zdenek Sojka
Modified: 2022-05-10 10:45 UTC (History)
4 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build:
Known to work: 12.0, 9.3.1
Known to fail: 10.3.1, 11.1.1
Last reconfirmed: 2021-04-30 00:00:00


Attachments
semi-reduced testcase (1.27 KB, text/plain)
2021-04-29 20:10 UTC, Zdenek Sojka
Details
gcc11-pr100342.patch (1.57 KB, patch)
2021-05-05 15:12 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2021-04-29 20:10:11 UTC
Created attachment 50712 [details]
semi-reduced testcase

I am failing to further reduce the testcase. I checked the code and I think the behavior is fully defined. Also I am unable to find out what exactly goes wrong; please close the PR if you don't want to spend time on this.

On master, it stopped failing between r12-100 (BAD) and r12-159 (OK); but it might have been just papered over.

$ x86_64-pc-linux-gnu-gcc -O2 -fno-dse -fno-forward-propagate -mno-sse2 testcase.c -Wno-psabi
$ ./a.out 
Aborted

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-11-branch/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-11-branch/binary-11-branch-20210427124134-gfb7c736c2f1-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-11-branch//configure --enable-languages=c,c++ --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libsanitizer --disable-libstdcxx-pch --prefix=/repo/gcc-11-branch//binary-11-branch-20210427124134-gfb7c736c2f1-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.1 20210427 (GCC)
Comment 1 Richard Biener 2021-04-30 06:51:00 UTC
Confirmed.
Comment 2 Jakub Jelinek 2021-04-30 08:54:28 UTC
Started with r10-3542-g0b92cf305dcf34387a8e2564e55ca8948df3b47a
Stopped on the trunk with r12-139-g7d6bb80931b429631f63e0fd27bee95f32eb57a9
So most likely was latent before and after those.
Comment 3 Uroš Bizjak 2021-05-04 12:50:02 UTC
For some reason the *input* value at BSWAP insn is truncated to 32bits.

  v256u128 v256u128_1 =
    SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <=
	  0, v256u128_0);

u128_0 is an argument to foo0, passes value of 123842323652213865 (decimal).

The binary has only one BSWAP insn, so:

$ objdump -dr a.out | grep bswap
  401ddd:       48 0f ca                bswap  %rdx

(gdb) b *0x401ddd
Breakpoint 1 at 0x401ddd: file pr100342.c, line 72.

Breakpoint 1, 0x0000000000401ddd in foo0 (u8_0=u8_0@entry=0 '\000', v128u8_0=v128u8_0@entry=..., v512u8_0=..., u16_0=u16_0@entry=10, 
    v128u16_0=v128u16_0@entry=..., u32_0=u32_0@entry=4, u64_0=u64_0@entry=2, v512u64_0=..., u128_0=<optimized out>, v256u128_0=..., v512u128_0=...)
    at zzz.c:72
72          SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <=
(gdb) p $rdx
$2 = 3983735913
(gdb) p/x $rdx
$3 = 0xed72fc69

We can "fix" the clobbered value manually in gdb session:

(gdb) set $rdx = 123842323652213865
(gdb) p/x $rdx
$4 = 0x1b7f9efed72fc69
(gdb) c
Continuing.
[Inferior 1 (process 20630) exited normally]
Comment 4 Uroš Bizjak 2021-05-04 13:08:46 UTC
The problematic insn is:

  401cec:	44 89 f6             	mov    %r14d,%esi

This one should be 64 bit wide,

	movl	%r14d, %esi	# 613	[c=4 l=3]  *movqi_internal/2

but is actually a QImode move from r14 to esi.
Comment 5 Uroš Bizjak 2021-05-04 15:14:01 UTC
The problem can be seen in _.pro_and_epilogue pass:

Starting with:

_.cmpelim

 2741: r14:DI=[sp:DI+0x38]
  ...
  368: di:DI=r14:DI
  ...
  613: si:QI=r14:QI
  ...
 2737: bp:DI=r14:DI
  ...
  658: strict_low_part(ax:QI)=bp:QI
 2275: dx:DI=si:DI
  ...
 2287: dx:DI=bp:DI
  710: dx:DI=bswap(dx:DI)


The pass converts:

_.pro_and_epilogue

rescanning insn with uid = 658.
insn 658: replaced reg 6 with 4
verify found no changes in insn with uid = 658.
verify found no changes in insn with uid = 2275.
deleting insn with uid = 2287.


 2741: r14:DI=[sp:DI+0x38]
  ...
  368: di:DI=r14:DI
  ...
  613: si:QI=r14:QI
  ...
 2737: bp:DI=r14:DI
  ...
  658: strict_low_part(ax:QI)=si:QI
 2275: dx:DI=si:DI
  ...
 2287: ?
  710: dx:DI=bswap(dx:DI)

Please note how the optimization removes insn 2287, assuming that rsi holds the whole 64bit value ... Which is not true!
Comment 6 Jakub Jelinek 2021-05-04 15:29:36 UTC
That seems to happen during shrink-wrapping:
#0  df_insn_delete (insn=0x7fffe9e3a6c0) at ../../gcc/df-scan.c:932
#1  0x0000000000978328 in delete_insn (insn=0x7fffe9e3a6c0) at ../../gcc/cfgrtl.c:178
#2  0x0000000000d5a270 in copyprop_hardreg_forward_1 (bb=<basic_block 0x7fffe9f763a8 (2)>, vd=<optimized out>) at ../../gcc/regcprop.c:825
#3  0x0000000000d5b772 in copyprop_hardreg_forward_bb_without_debug_insn (bb=<basic_block 0x7fffe9f763a8 (2)>) at ../../gcc/regcprop.c:1211
#4  0x0000000000db616c in prepare_shrink_wrap (entry_block=<basic_block 0x7fffe9f763a8 (2)>) at ../../gcc/shrink-wrap.c:451
#5  try_shrink_wrapping (entry_edge=0x7fffffffd518, prologue_seq=0x7fffe9e81440) at ../../gcc/shrink-wrap.c:674
#6  0x0000000000ae7ffc in thread_prologue_and_epilogue_insns () at ../../gcc/function.c:6025
#7  0x0000000000ae86c3 in rest_of_handle_thread_prologue_and_epilogue () at ../../gcc/function.c:6510
Comment 7 Uroš Bizjak 2021-05-04 15:33:24 UTC
I have traced a bit where (insn 2275) and (insn 2287) come from.

In _.ira, we have:

  613: r125:QI=r2067:DI#0
  ...
  659: zero_extract(r2080:DI,0x8,0x8)=r125:QI#0

And in _.reload, a DImode reload is inserted before zero_extract RTX:

  613: si:QI=r14:QI
  ...
 2275: dx:DI=si:DI
  659: zero_extract(ax:DI,0x8,0x8)=dx:DI

However, %rsi is only initialized with QImode value in (insn 613).

This word-mode reload for zero_extract RTX interferes with register propagation in subsequent passes, as shown in the comment #5.
Comment 8 Uroš Bizjak 2021-05-04 15:34:48 UTC
FYI, this whole analysis was done with Fedora 33 system compiler:

gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)
Comment 9 Jakub Jelinek 2021-05-04 16:36:18 UTC
Looking at current 10 branch (previously looked at 11), I see:
(insn 2741 1965 368 2 (set (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])
        (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 56 [0x38])) [9 %sfp+-1032 S8 A64])) "pr100342.c":65:24 66 {*movdi_internal}
     (nil))
...
(insn 613 612 1970 2 (set (reg:QI 4 si [orig:125 _98 ] [125])
        (reg:QI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:15 69 {*movqi_internal}
     (nil))
...
(insn 2737 1971 615 2 (set (reg:DI 6 bp [orig:2067 u128_0 ] [2067])
        (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
...
(insn 2275 658 659 2 (set (reg:DI 1 dx [2225])
        (reg:DI 4 si [orig:125 _98 ] [125])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
...
(insn 2287 2286 710 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
        (reg:DI 6 bp [orig:2067 u128_0 ] [2067])) "pr100342.c":72:5 66 {*movdi_internal}
     (nil))
(insn 710 2287 2172 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
        (bswap:DI (reg:DI 1 dx [orig:131 _104 ] [131]))) "pr100342.c":72:5 872 {*bswapdi2}
     (nil))
in the reload dump and I must say I don't find anything wrong on that,
what looks wrong is that regcprop thinks that the insn 2287 is a redundant move
because it thinks that at that point %rdx contains the same value as %rbp -
      /* Detect noop sets and remove them before processing side effects.  */
      if (set && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set)))
        {
          unsigned int regno = REGNO (SET_SRC (set));
          rtx r1 = find_oldest_value_reg (REGNO_REG_CLASS (regno),
                                          SET_DEST (set), vd);
          rtx r2 = find_oldest_value_reg (REGNO_REG_CLASS (regno),
                                          SET_SRC (set), vd);
          if (rtx_equal_p (r1 ? r1 : SET_DEST (set), r2 ? r2 : SET_SRC (set)))
            {
              bool last = insn == BB_END (bb);
              delete_insn (insn);
r1 is %rbp and r2 is NULL, so it is compared against SET_SRC (set) of %rbp.
Something misses that only the lowpart QImode is equivalent that way.
Comment 10 Jakub Jelinek 2021-05-04 16:57:11 UTC
And there is one more important insn in between 2737 and 2275, in particular
(insn 2911 2867 2853 2 (set (reg:DI 42 r14 [2223])
        (const_int 72057594037927935 [0xffffffffffffff])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
On that one the old value of r14 is lost.
Comment 11 Jakub Jelinek 2021-05-05 10:59:34 UTC
(insn 2741 1965 368 2 (set (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])
        (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 56 [0x38])) [9 %sfp+-1032 S8 A64])) "pr100342.c":65:24 66 {*movdi_internal}
     (nil))
...
(insn 613 612 1970 2 (set (reg:QI 4 si [orig:125 _98 ] [125])
        (reg:QI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:15 69 {*movqi_internal}
     (nil))
...
(insn 2737 1971 615 2 (set (reg:DI 6 bp [orig:2067 u128_0 ] [2067])
        (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
...
(insn 2829 2749 2855 2 (set (reg:DI 42 r14 [2222])
        (const_int -71776119061217281 [0xff00ffffffffffff])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
...
(insn 2275 658 659 2 (set (reg:DI 1 dx [2225])
        (reg:DI 4 si [orig:125 _98 ] [125])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
...
(insn 2910 2860 677 2 (set (reg:DI 4 si [2227])
        (const_int -1095216660481 [0xffffff00ffffffff])) "pr100342.c":68:12 66 {*movdi_internal}
     (nil))
...
(insn 2287 2286 710 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
        (reg:DI 6 bp [orig:2067 u128_0 ] [2067])) "pr100342.c":72:5 66 {*movdi_internal}
     (nil))
(insn 710 2287 2172 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
        (bswap:DI (reg:DI 1 dx [orig:131 _104 ] [131]))) "pr100342.c":72:5 872 {*bswapdi2}
     (nil))
In copyprop_hardreg_forward_1 at the start of that insn 2737 we have:
(gdb) p vd->e[1]
$32 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$33 = {mode = E_QImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$34 = {mode = E_SImode, oldest_regno = 6, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$35 = {mode = E_DImode, oldest_regno = 42, next_regno = 4, debug_insn_changes = 0x0}
which is I think ok, it says %sil is QImode copy of %r14, all other regs we care about here contain their values.
At the end of that we have:
(gdb) p vd->e[1]
$36 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$37 = {mode = E_QImode, oldest_regno = 42, next_regno = 6, debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$38 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$39 = {mode = E_DImode, oldest_regno = 42, next_regno = 4, debug_insn_changes = 0x0}
as well as before the insn 2829.  That changes it to:
(gdb) p vd->e[1]
$51 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$52 = {mode = E_QImode, oldest_regno = 4, next_regno = 6, debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$53 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$54 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0}
Same thing at the start of insn 2275.  But after that we have:
(gdb) p vd->e[1]
$61 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$62 = {mode = E_QImode, oldest_regno = 4, next_regno = 6, debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$63 = {mode = E_DImode, oldest_regno = 4, next_regno = 1, debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$64 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0}
I think this step is what looks wrong, it is true that the movq %rsi, %rdx
copied all 64-bits of %rsi, but vd->e[4].mode was just QImode, so I think vd->e[1].mode
should be also QImode.
That remains until before insn 2910, which changes that to:
(gdb) p vd->e[1]
$84 = {mode = E_DImode, oldest_regno = 6, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$85 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295, debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$86 = {mode = E_DImode, oldest_regno = 6, next_regno = 1, debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$87 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0}
and at that point, the fact that %rdx doesn't contain the same value as %rbp - only
%dl is guaranteed to be the same as %bpl - is lost.
Then when we look up the oldest value of %rdx on insn 2287 we are told it is %rbp.
Comment 12 Jakub Jelinek 2021-05-05 15:12:33 UTC
Created attachment 50761 [details]
gcc11-pr100342.patch

Untested fix.  But what do I know about regcprop?
Comment 13 Hongtao.liu 2021-05-06 05:22:58 UTC
(In reply to Jakub Jelinek from comment #12)
> Created attachment 50761 [details]
> gcc11-pr100342.patch
> 
> Untested fix.  But what do I know about regcprop?

I just want to confirm that this patch alone also fixed pr98694.
Comment 14 GCC Commits 2021-05-15 08:14:37 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f

commit r12-808-g425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat May 15 10:12:11 2021 +0200

    regcprop: Fix another cprop_hardreg bug [PR100342]
    
    On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
    > Ah, ok, thanks for the extra context.
    >
    > So AIUI the problem when recording xmm2<-di isn't just:
    >
    >  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
    >
    > but also that:
    >
    >  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
    >
    > For example, all registers in this sequence can be part of the same chain:
    >
    >     (set (reg:HI R1) (reg:HI R0))
    >     (set (reg:SI R2) (reg:SI R1)) // [A]
    >     (set (reg:DI R3) (reg:DI R2)) // [A]
    >     (set (reg:SI R4) (reg:SI R[0-3]))
    >     (set (reg:HI R5) (reg:HI R[0-4]))
    >
    > But:
    >
    >     (set (reg:SI R1) (reg:SI R0))
    >     (set (reg:HI R2) (reg:HI R1))
    >     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
    >
    > is problematic because it dips below the precision of the oldest regno
    > and then increases again.
    >
    > When this happens, I guess we have two choices:
    >
    > (1) what the patch does: treat R3 as the start of a new chain.
    > (2) pretend that the copy occured in vd->e[sr].mode instead
    >     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
    >
    > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
    > Maybe the optimisation provided by (2) compared to (1) isn't common
    > enough to be worth the complication.
    >
    > I think we should test [B] as well as [A] though.  The pass is set
    > up to do some quite elaborate mode changes and I think rejecting
    > [A] on its own would make some of the other code redundant.
    > It also feels like it should be a seperate âifâ or âelse ifâ,
    > with its own comment.
    
    Unfortunately, we now have a testcase that shows that testing also [B]
    is a problem (unfortunately now latent on the trunk, only reproduces
    on 10 and 11 branches).
    
    The comment in the patch tries to list just the interesting instructions,
    we have a 64-bit value, copy low 8 bit of those to another register,
    copy full 64 bits to another register and then clobber the original register.
    Before that (set (reg:DI r14) (const_int ...)) we have a chain
    DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
    we have QI si, DI bp , si being the oldest_regno.
    Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
    the rest is unspecified, but we would add DI dx into that same chain at the
    end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
    DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
    as redundant, because we think bp and dx are already equivalent, when in
    reality that is true only for the lowpart 8 bits.
    I believe the [*] marked step above is where the bug is.
    
    The committed regcprop.c (copy_value) change (but only committed to
    trunk/11, not to 10) added
      else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
               && partial_subreg_p (vd->e[sr].mode,
                                    vd->e[vd->e[sr].oldest_regno].mode))
        return;
    and while the first partial_subreg_p call returns true, the second one
    doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
    true and we'd return, but as that reg got clobbered, si became the oldest
    regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
    and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
    But as the testcase shows, what is the oldest_regno in the chain is
    something that changes over time, so relying on it for anything is
    problematic, something could have a different oldest_regno and later
    on get a different oldest_regno (perhaps with different mode) because
    the oldest_regno got overwritten and it can change both ways.
    
    The following patch effectively implements your (2) above.
    
    2021-05-15  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/100342
            * regcprop.c (copy_value): When copying a source reg in a wider
            mode than it has recorded for the value, adjust recorded destination
            mode too or punt if !REG_CAN_CHANGE_MODE_P.
    
            * gcc.target/i386/pr100342.c: New test.
Comment 15 GCC Commits 2021-05-31 14:08:24 UTC
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:81c2cd08fafcdc0ff48f6cf440bef86f98822a23

commit r11-8486-g81c2cd08fafcdc0ff48f6cf440bef86f98822a23
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat May 15 10:12:11 2021 +0200

    regcprop: Fix another cprop_hardreg bug [PR100342]
    
    On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
    > Ah, ok, thanks for the extra context.
    >
    > So AIUI the problem when recording xmm2<-di isn't just:
    >
    >  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
    >
    > but also that:
    >
    >  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
    >
    > For example, all registers in this sequence can be part of the same chain:
    >
    >     (set (reg:HI R1) (reg:HI R0))
    >     (set (reg:SI R2) (reg:SI R1)) // [A]
    >     (set (reg:DI R3) (reg:DI R2)) // [A]
    >     (set (reg:SI R4) (reg:SI R[0-3]))
    >     (set (reg:HI R5) (reg:HI R[0-4]))
    >
    > But:
    >
    >     (set (reg:SI R1) (reg:SI R0))
    >     (set (reg:HI R2) (reg:HI R1))
    >     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
    >
    > is problematic because it dips below the precision of the oldest regno
    > and then increases again.
    >
    > When this happens, I guess we have two choices:
    >
    > (1) what the patch does: treat R3 as the start of a new chain.
    > (2) pretend that the copy occured in vd->e[sr].mode instead
    >     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
    >
    > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
    > Maybe the optimisation provided by (2) compared to (1) isn't common
    > enough to be worth the complication.
    >
    > I think we should test [B] as well as [A] though.  The pass is set
    > up to do some quite elaborate mode changes and I think rejecting
    > [A] on its own would make some of the other code redundant.
    > It also feels like it should be a seperate âifâ or âelse ifâ,
    > with its own comment.
    
    Unfortunately, we now have a testcase that shows that testing also [B]
    is a problem (unfortunately now latent on the trunk, only reproduces
    on 10 and 11 branches).
    
    The comment in the patch tries to list just the interesting instructions,
    we have a 64-bit value, copy low 8 bit of those to another register,
    copy full 64 bits to another register and then clobber the original register.
    Before that (set (reg:DI r14) (const_int ...)) we have a chain
    DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
    we have QI si, DI bp , si being the oldest_regno.
    Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
    the rest is unspecified, but we would add DI dx into that same chain at the
    end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
    DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
    as redundant, because we think bp and dx are already equivalent, when in
    reality that is true only for the lowpart 8 bits.
    I believe the [*] marked step above is where the bug is.
    
    The committed regcprop.c (copy_value) change (but only committed to
    trunk/11, not to 10) added
      else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
               && partial_subreg_p (vd->e[sr].mode,
                                    vd->e[vd->e[sr].oldest_regno].mode))
        return;
    and while the first partial_subreg_p call returns true, the second one
    doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
    true and we'd return, but as that reg got clobbered, si became the oldest
    regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
    and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
    But as the testcase shows, what is the oldest_regno in the chain is
    something that changes over time, so relying on it for anything is
    problematic, something could have a different oldest_regno and later
    on get a different oldest_regno (perhaps with different mode) because
    the oldest_regno got overwritten and it can change both ways.
    
    The following patch effectively implements your (2) above.
    
    2021-05-15  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/100342
            * regcprop.c (copy_value): When copying a source reg in a wider
            mode than it has recorded for the value, adjust recorded destination
            mode too or punt if !REG_CAN_CHANGE_MODE_P.
    
            * gcc.target/i386/pr100342.c: New test.
    
    (cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)
Comment 16 Jakub Jelinek 2021-05-31 14:56:15 UTC
Fixed also for GCC 11.2.
Comment 17 GCC Commits 2022-05-10 08:18:03 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:90330d0e046b11817382db6545bfd8e882f21d41

commit r10-10612-g90330d0e046b11817382db6545bfd8e882f21d41
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat May 15 10:12:11 2021 +0200

    regcprop: Fix another cprop_hardreg bug [PR100342]
    
    On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
    > Ah, ok, thanks for the extra context.
    >
    > So AIUI the problem when recording xmm2<-di isn't just:
    >
    >  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
    >
    > but also that:
    >
    >  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
    >
    > For example, all registers in this sequence can be part of the same chain:
    >
    >     (set (reg:HI R1) (reg:HI R0))
    >     (set (reg:SI R2) (reg:SI R1)) // [A]
    >     (set (reg:DI R3) (reg:DI R2)) // [A]
    >     (set (reg:SI R4) (reg:SI R[0-3]))
    >     (set (reg:HI R5) (reg:HI R[0-4]))
    >
    > But:
    >
    >     (set (reg:SI R1) (reg:SI R0))
    >     (set (reg:HI R2) (reg:HI R1))
    >     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
    >
    > is problematic because it dips below the precision of the oldest regno
    > and then increases again.
    >
    > When this happens, I guess we have two choices:
    >
    > (1) what the patch does: treat R3 as the start of a new chain.
    > (2) pretend that the copy occured in vd->e[sr].mode instead
    >     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
    >
    > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
    > Maybe the optimisation provided by (2) compared to (1) isn't common
    > enough to be worth the complication.
    >
    > I think we should test [B] as well as [A] though.  The pass is set
    > up to do some quite elaborate mode changes and I think rejecting
    > [A] on its own would make some of the other code redundant.
    > It also feels like it should be a seperate âifâ or âelse ifâ,
    > with its own comment.
    
    Unfortunately, we now have a testcase that shows that testing also [B]
    is a problem (unfortunately now latent on the trunk, only reproduces
    on 10 and 11 branches).
    
    The comment in the patch tries to list just the interesting instructions,
    we have a 64-bit value, copy low 8 bit of those to another register,
    copy full 64 bits to another register and then clobber the original register.
    Before that (set (reg:DI r14) (const_int ...)) we have a chain
    DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
    we have QI si, DI bp , si being the oldest_regno.
    Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
    the rest is unspecified, but we would add DI dx into that same chain at the
    end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
    DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
    as redundant, because we think bp and dx are already equivalent, when in
    reality that is true only for the lowpart 8 bits.
    I believe the [*] marked step above is where the bug is.
    
    The committed regcprop.c (copy_value) change (but only committed to
    trunk/11, not to 10) added
      else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
               && partial_subreg_p (vd->e[sr].mode,
                                    vd->e[vd->e[sr].oldest_regno].mode))
        return;
    and while the first partial_subreg_p call returns true, the second one
    doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
    true and we'd return, but as that reg got clobbered, si became the oldest
    regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
    and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
    But as the testcase shows, what is the oldest_regno in the chain is
    something that changes over time, so relying on it for anything is
    problematic, something could have a different oldest_regno and later
    on get a different oldest_regno (perhaps with different mode) because
    the oldest_regno got overwritten and it can change both ways.
    
    The following patch effectively implements your (2) above.
    
    2021-05-15  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/100342
            * regcprop.c (copy_value): When copying a source reg in a wider
            mode than it has recorded for the value, adjust recorded destination
            mode too or punt if !REG_CAN_CHANGE_MODE_P.
    
            * gcc.target/i386/pr100342.c: New test.
    
    (cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)
Comment 18 Jakub Jelinek 2022-05-10 10:45:52 UTC
Fixed for 10.4 too.