Created attachment 50712 [details] semi-reduced testcase I am failing to further reduce the testcase. I checked the code and I think the behavior is fully defined. Also I am unable to find out what exactly goes wrong; please close the PR if you don't want to spend time on this. On master, it stopped failing between r12-100 (BAD) and r12-159 (OK); but it might have been just papered over. $ x86_64-pc-linux-gnu-gcc -O2 -fno-dse -fno-forward-propagate -mno-sse2 testcase.c -Wno-psabi $ ./a.out Aborted $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-11-branch/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-11-branch/binary-11-branch-20210427124134-gfb7c736c2f1-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.1.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-11-branch//configure --enable-languages=c,c++ --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libsanitizer --disable-libstdcxx-pch --prefix=/repo/gcc-11-branch//binary-11-branch-20210427124134-gfb7c736c2f1-checking-yes-rtl-df-extra-amd64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.1.1 20210427 (GCC)
Confirmed.
Started with r10-3542-g0b92cf305dcf34387a8e2564e55ca8948df3b47a Stopped on the trunk with r12-139-g7d6bb80931b429631f63e0fd27bee95f32eb57a9 So most likely was latent before and after those.
For some reason the *input* value at BSWAP insn is truncated to 32bits. v256u128 v256u128_1 = SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <= 0, v256u128_0); u128_0 is an argument to foo0, passes value of 123842323652213865 (decimal). The binary has only one BSWAP insn, so: $ objdump -dr a.out | grep bswap 401ddd: 48 0f ca bswap %rdx (gdb) b *0x401ddd Breakpoint 1 at 0x401ddd: file pr100342.c, line 72. Breakpoint 1, 0x0000000000401ddd in foo0 (u8_0=u8_0@entry=0 '\000', v128u8_0=v128u8_0@entry=..., v512u8_0=..., u16_0=u16_0@entry=10, v128u16_0=v128u16_0@entry=..., u32_0=u32_0@entry=4, u64_0=u64_0@entry=2, v512u64_0=..., u128_0=<optimized out>, v256u128_0=..., v512u128_0=...) at zzz.c:72 72 SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <= (gdb) p $rdx $2 = 3983735913 (gdb) p/x $rdx $3 = 0xed72fc69 We can "fix" the clobbered value manually in gdb session: (gdb) set $rdx = 123842323652213865 (gdb) p/x $rdx $4 = 0x1b7f9efed72fc69 (gdb) c Continuing. [Inferior 1 (process 20630) exited normally]
The problematic insn is: 401cec: 44 89 f6 mov %r14d,%esi This one should be 64 bit wide, movl %r14d, %esi # 613 [c=4 l=3] *movqi_internal/2 but is actually a QImode move from r14 to esi.
The problem can be seen in _.pro_and_epilogue pass: Starting with: _.cmpelim 2741: r14:DI=[sp:DI+0x38] ... 368: di:DI=r14:DI ... 613: si:QI=r14:QI ... 2737: bp:DI=r14:DI ... 658: strict_low_part(ax:QI)=bp:QI 2275: dx:DI=si:DI ... 2287: dx:DI=bp:DI 710: dx:DI=bswap(dx:DI) The pass converts: _.pro_and_epilogue rescanning insn with uid = 658. insn 658: replaced reg 6 with 4 verify found no changes in insn with uid = 658. verify found no changes in insn with uid = 2275. deleting insn with uid = 2287. 2741: r14:DI=[sp:DI+0x38] ... 368: di:DI=r14:DI ... 613: si:QI=r14:QI ... 2737: bp:DI=r14:DI ... 658: strict_low_part(ax:QI)=si:QI 2275: dx:DI=si:DI ... 2287: ? 710: dx:DI=bswap(dx:DI) Please note how the optimization removes insn 2287, assuming that rsi holds the whole 64bit value ... Which is not true!
That seems to happen during shrink-wrapping: #0 df_insn_delete (insn=0x7fffe9e3a6c0) at ../../gcc/df-scan.c:932 #1 0x0000000000978328 in delete_insn (insn=0x7fffe9e3a6c0) at ../../gcc/cfgrtl.c:178 #2 0x0000000000d5a270 in copyprop_hardreg_forward_1 (bb=<basic_block 0x7fffe9f763a8 (2)>, vd=<optimized out>) at ../../gcc/regcprop.c:825 #3 0x0000000000d5b772 in copyprop_hardreg_forward_bb_without_debug_insn (bb=<basic_block 0x7fffe9f763a8 (2)>) at ../../gcc/regcprop.c:1211 #4 0x0000000000db616c in prepare_shrink_wrap (entry_block=<basic_block 0x7fffe9f763a8 (2)>) at ../../gcc/shrink-wrap.c:451 #5 try_shrink_wrapping (entry_edge=0x7fffffffd518, prologue_seq=0x7fffe9e81440) at ../../gcc/shrink-wrap.c:674 #6 0x0000000000ae7ffc in thread_prologue_and_epilogue_insns () at ../../gcc/function.c:6025 #7 0x0000000000ae86c3 in rest_of_handle_thread_prologue_and_epilogue () at ../../gcc/function.c:6510
I have traced a bit where (insn 2275) and (insn 2287) come from. In _.ira, we have: 613: r125:QI=r2067:DI#0 ... 659: zero_extract(r2080:DI,0x8,0x8)=r125:QI#0 And in _.reload, a DImode reload is inserted before zero_extract RTX: 613: si:QI=r14:QI ... 2275: dx:DI=si:DI 659: zero_extract(ax:DI,0x8,0x8)=dx:DI However, %rsi is only initialized with QImode value in (insn 613). This word-mode reload for zero_extract RTX interferes with register propagation in subsequent passes, as shown in the comment #5.
FYI, this whole analysis was done with Fedora 33 system compiler: gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)
Looking at current 10 branch (previously looked at 11), I see: (insn 2741 1965 368 2 (set (reg:DI 42 r14 [orig:2067 u128_0 ] [2067]) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 56 [0x38])) [9 %sfp+-1032 S8 A64])) "pr100342.c":65:24 66 {*movdi_internal} (nil)) ... (insn 613 612 1970 2 (set (reg:QI 4 si [orig:125 _98 ] [125]) (reg:QI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:15 69 {*movqi_internal} (nil)) ... (insn 2737 1971 615 2 (set (reg:DI 6 bp [orig:2067 u128_0 ] [2067]) (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) ... (insn 2275 658 659 2 (set (reg:DI 1 dx [2225]) (reg:DI 4 si [orig:125 _98 ] [125])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) ... (insn 2287 2286 710 2 (set (reg:DI 1 dx [orig:131 _104 ] [131]) (reg:DI 6 bp [orig:2067 u128_0 ] [2067])) "pr100342.c":72:5 66 {*movdi_internal} (nil)) (insn 710 2287 2172 2 (set (reg:DI 1 dx [orig:131 _104 ] [131]) (bswap:DI (reg:DI 1 dx [orig:131 _104 ] [131]))) "pr100342.c":72:5 872 {*bswapdi2} (nil)) in the reload dump and I must say I don't find anything wrong on that, what looks wrong is that regcprop thinks that the insn 2287 is a redundant move because it thinks that at that point %rdx contains the same value as %rbp - /* Detect noop sets and remove them before processing side effects. */ if (set && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))) { unsigned int regno = REGNO (SET_SRC (set)); rtx r1 = find_oldest_value_reg (REGNO_REG_CLASS (regno), SET_DEST (set), vd); rtx r2 = find_oldest_value_reg (REGNO_REG_CLASS (regno), SET_SRC (set), vd); if (rtx_equal_p (r1 ? r1 : SET_DEST (set), r2 ? r2 : SET_SRC (set))) { bool last = insn == BB_END (bb); delete_insn (insn); r1 is %rbp and r2 is NULL, so it is compared against SET_SRC (set) of %rbp. Something misses that only the lowpart QImode is equivalent that way.
And there is one more important insn in between 2737 and 2275, in particular (insn 2911 2867 2853 2 (set (reg:DI 42 r14 [2223]) (const_int 72057594037927935 [0xffffffffffffff])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) On that one the old value of r14 is lost.
(insn 2741 1965 368 2 (set (reg:DI 42 r14 [orig:2067 u128_0 ] [2067]) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 56 [0x38])) [9 %sfp+-1032 S8 A64])) "pr100342.c":65:24 66 {*movdi_internal} (nil)) ... (insn 613 612 1970 2 (set (reg:QI 4 si [orig:125 _98 ] [125]) (reg:QI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:15 69 {*movqi_internal} (nil)) ... (insn 2737 1971 615 2 (set (reg:DI 6 bp [orig:2067 u128_0 ] [2067]) (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) ... (insn 2829 2749 2855 2 (set (reg:DI 42 r14 [2222]) (const_int -71776119061217281 [0xff00ffffffffffff])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) ... (insn 2275 658 659 2 (set (reg:DI 1 dx [2225]) (reg:DI 4 si [orig:125 _98 ] [125])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) ... (insn 2910 2860 677 2 (set (reg:DI 4 si [2227]) (const_int -1095216660481 [0xffffff00ffffffff])) "pr100342.c":68:12 66 {*movdi_internal} (nil)) ... (insn 2287 2286 710 2 (set (reg:DI 1 dx [orig:131 _104 ] [131]) (reg:DI 6 bp [orig:2067 u128_0 ] [2067])) "pr100342.c":72:5 66 {*movdi_internal} (nil)) (insn 710 2287 2172 2 (set (reg:DI 1 dx [orig:131 _104 ] [131]) (bswap:DI (reg:DI 1 dx [orig:131 _104 ] [131]))) "pr100342.c":72:5 872 {*bswapdi2} (nil)) In copyprop_hardreg_forward_1 at the start of that insn 2737 we have: (gdb) p vd->e[1] $32 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[4] $33 = {mode = E_QImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[6] $34 = {mode = E_SImode, oldest_regno = 6, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[42] $35 = {mode = E_DImode, oldest_regno = 42, next_regno = 4, debug_insn_changes = 0x0} which is I think ok, it says %sil is QImode copy of %r14, all other regs we care about here contain their values. At the end of that we have: (gdb) p vd->e[1] $36 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[4] $37 = {mode = E_QImode, oldest_regno = 42, next_regno = 6, debug_insn_changes = 0x0} (gdb) p vd->e[6] $38 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[42] $39 = {mode = E_DImode, oldest_regno = 42, next_regno = 4, debug_insn_changes = 0x0} as well as before the insn 2829. That changes it to: (gdb) p vd->e[1] $51 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[4] $52 = {mode = E_QImode, oldest_regno = 4, next_regno = 6, debug_insn_changes = 0x0} (gdb) p vd->e[6] $53 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[42] $54 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0} Same thing at the start of insn 2275. But after that we have: (gdb) p vd->e[1] $61 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[4] $62 = {mode = E_QImode, oldest_regno = 4, next_regno = 6, debug_insn_changes = 0x0} (gdb) p vd->e[6] $63 = {mode = E_DImode, oldest_regno = 4, next_regno = 1, debug_insn_changes = 0x0} (gdb) p vd->e[42] $64 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0} I think this step is what looks wrong, it is true that the movq %rsi, %rdx copied all 64-bits of %rsi, but vd->e[4].mode was just QImode, so I think vd->e[1].mode should be also QImode. That remains until before insn 2910, which changes that to: (gdb) p vd->e[1] $84 = {mode = E_DImode, oldest_regno = 6, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[4] $85 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295, debug_insn_changes = 0x0} (gdb) p vd->e[6] $86 = {mode = E_DImode, oldest_regno = 6, next_regno = 1, debug_insn_changes = 0x0} (gdb) p vd->e[42] $87 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295, debug_insn_changes = 0x0} and at that point, the fact that %rdx doesn't contain the same value as %rbp - only %dl is guaranteed to be the same as %bpl - is lost. Then when we look up the oldest value of %rdx on insn 2287 we are told it is %rbp.
Created attachment 50761 [details] gcc11-pr100342.patch Untested fix. But what do I know about regcprop?
(In reply to Jakub Jelinek from comment #12) > Created attachment 50761 [details] > gcc11-pr100342.patch > > Untested fix. But what do I know about regcprop? I just want to confirm that this patch alone also fixed pr98694.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f commit r12-808-g425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f Author: Jakub Jelinek <jakub@redhat.com> Date: Sat May 15 10:12:11 2021 +0200 regcprop: Fix another cprop_hardreg bug [PR100342] On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote: > Ah, ok, thanks for the extra context. > > So AIUI the problem when recording xmm2<-di isn't just: > > [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) > > but also that: > > [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode) > > For example, all registers in this sequence can be part of the same chain: > > (set (reg:HI R1) (reg:HI R0)) > (set (reg:SI R2) (reg:SI R1)) // [A] > (set (reg:DI R3) (reg:DI R2)) // [A] > (set (reg:SI R4) (reg:SI R[0-3])) > (set (reg:HI R5) (reg:HI R[0-4])) > > But: > > (set (reg:SI R1) (reg:SI R0)) > (set (reg:HI R2) (reg:HI R1)) > (set (reg:SI R3) (reg:SI R2)) // [A] && [B] > > is problematic because it dips below the precision of the oldest regno > and then increases again. > > When this happens, I guess we have two choices: > > (1) what the patch does: treat R3 as the start of a new chain. > (2) pretend that the copy occured in vd->e[sr].mode instead > (i.e. copy vd->e[sr].mode to vd->e[dr].mode) > > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P. > Maybe the optimisation provided by (2) compared to (1) isn't common > enough to be worth the complication. > > I think we should test [B] as well as [A] though. The pass is set > up to do some quite elaborate mode changes and I think rejecting > [A] on its own would make some of the other code redundant. > It also feels like it should be a seperate âifâ or âelse ifâ, > with its own comment. Unfortunately, we now have a testcase that shows that testing also [B] is a problem (unfortunately now latent on the trunk, only reproduces on 10 and 11 branches). The comment in the patch tries to list just the interesting instructions, we have a 64-bit value, copy low 8 bit of those to another register, copy full 64 bits to another register and then clobber the original register. Before that (set (reg:DI r14) (const_int ...)) we have a chain DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so we have QI si, DI bp , si being the oldest_regno. Next DI si is copied into DI dx. Only the low 8 bits of that are defined, the rest is unspecified, but we would add DI dx into that same chain at the end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it as redundant, because we think bp and dx are already equivalent, when in reality that is true only for the lowpart 8 bits. I believe the [*] marked step above is where the bug is. The committed regcprop.c (copy_value) change (but only committed to trunk/11, not to 10) added else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) && partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)) return; and while the first partial_subreg_p call returns true, the second one doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be true and we'd return, but as that reg got clobbered, si became the oldest regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false. But as the testcase shows, what is the oldest_regno in the chain is something that changes over time, so relying on it for anything is problematic, something could have a different oldest_regno and later on get a different oldest_regno (perhaps with different mode) because the oldest_regno got overwritten and it can change both ways. The following patch effectively implements your (2) above. 2021-05-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/100342 * regcprop.c (copy_value): When copying a source reg in a wider mode than it has recorded for the value, adjust recorded destination mode too or punt if !REG_CAN_CHANGE_MODE_P. * gcc.target/i386/pr100342.c: New test.
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:81c2cd08fafcdc0ff48f6cf440bef86f98822a23 commit r11-8486-g81c2cd08fafcdc0ff48f6cf440bef86f98822a23 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat May 15 10:12:11 2021 +0200 regcprop: Fix another cprop_hardreg bug [PR100342] On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote: > Ah, ok, thanks for the extra context. > > So AIUI the problem when recording xmm2<-di isn't just: > > [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) > > but also that: > > [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode) > > For example, all registers in this sequence can be part of the same chain: > > (set (reg:HI R1) (reg:HI R0)) > (set (reg:SI R2) (reg:SI R1)) // [A] > (set (reg:DI R3) (reg:DI R2)) // [A] > (set (reg:SI R4) (reg:SI R[0-3])) > (set (reg:HI R5) (reg:HI R[0-4])) > > But: > > (set (reg:SI R1) (reg:SI R0)) > (set (reg:HI R2) (reg:HI R1)) > (set (reg:SI R3) (reg:SI R2)) // [A] && [B] > > is problematic because it dips below the precision of the oldest regno > and then increases again. > > When this happens, I guess we have two choices: > > (1) what the patch does: treat R3 as the start of a new chain. > (2) pretend that the copy occured in vd->e[sr].mode instead > (i.e. copy vd->e[sr].mode to vd->e[dr].mode) > > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P. > Maybe the optimisation provided by (2) compared to (1) isn't common > enough to be worth the complication. > > I think we should test [B] as well as [A] though. The pass is set > up to do some quite elaborate mode changes and I think rejecting > [A] on its own would make some of the other code redundant. > It also feels like it should be a seperate âifâ or âelse ifâ, > with its own comment. Unfortunately, we now have a testcase that shows that testing also [B] is a problem (unfortunately now latent on the trunk, only reproduces on 10 and 11 branches). The comment in the patch tries to list just the interesting instructions, we have a 64-bit value, copy low 8 bit of those to another register, copy full 64 bits to another register and then clobber the original register. Before that (set (reg:DI r14) (const_int ...)) we have a chain DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so we have QI si, DI bp , si being the oldest_regno. Next DI si is copied into DI dx. Only the low 8 bits of that are defined, the rest is unspecified, but we would add DI dx into that same chain at the end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it as redundant, because we think bp and dx are already equivalent, when in reality that is true only for the lowpart 8 bits. I believe the [*] marked step above is where the bug is. The committed regcprop.c (copy_value) change (but only committed to trunk/11, not to 10) added else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) && partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)) return; and while the first partial_subreg_p call returns true, the second one doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be true and we'd return, but as that reg got clobbered, si became the oldest regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false. But as the testcase shows, what is the oldest_regno in the chain is something that changes over time, so relying on it for anything is problematic, something could have a different oldest_regno and later on get a different oldest_regno (perhaps with different mode) because the oldest_regno got overwritten and it can change both ways. The following patch effectively implements your (2) above. 2021-05-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/100342 * regcprop.c (copy_value): When copying a source reg in a wider mode than it has recorded for the value, adjust recorded destination mode too or punt if !REG_CAN_CHANGE_MODE_P. * gcc.target/i386/pr100342.c: New test. (cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)
Fixed also for GCC 11.2.
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:90330d0e046b11817382db6545bfd8e882f21d41 commit r10-10612-g90330d0e046b11817382db6545bfd8e882f21d41 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat May 15 10:12:11 2021 +0200 regcprop: Fix another cprop_hardreg bug [PR100342] On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote: > Ah, ok, thanks for the extra context. > > So AIUI the problem when recording xmm2<-di isn't just: > > [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) > > but also that: > > [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode) > > For example, all registers in this sequence can be part of the same chain: > > (set (reg:HI R1) (reg:HI R0)) > (set (reg:SI R2) (reg:SI R1)) // [A] > (set (reg:DI R3) (reg:DI R2)) // [A] > (set (reg:SI R4) (reg:SI R[0-3])) > (set (reg:HI R5) (reg:HI R[0-4])) > > But: > > (set (reg:SI R1) (reg:SI R0)) > (set (reg:HI R2) (reg:HI R1)) > (set (reg:SI R3) (reg:SI R2)) // [A] && [B] > > is problematic because it dips below the precision of the oldest regno > and then increases again. > > When this happens, I guess we have two choices: > > (1) what the patch does: treat R3 as the start of a new chain. > (2) pretend that the copy occured in vd->e[sr].mode instead > (i.e. copy vd->e[sr].mode to vd->e[dr].mode) > > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P. > Maybe the optimisation provided by (2) compared to (1) isn't common > enough to be worth the complication. > > I think we should test [B] as well as [A] though. The pass is set > up to do some quite elaborate mode changes and I think rejecting > [A] on its own would make some of the other code redundant. > It also feels like it should be a seperate âifâ or âelse ifâ, > with its own comment. Unfortunately, we now have a testcase that shows that testing also [B] is a problem (unfortunately now latent on the trunk, only reproduces on 10 and 11 branches). The comment in the patch tries to list just the interesting instructions, we have a 64-bit value, copy low 8 bit of those to another register, copy full 64 bits to another register and then clobber the original register. Before that (set (reg:DI r14) (const_int ...)) we have a chain DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so we have QI si, DI bp , si being the oldest_regno. Next DI si is copied into DI dx. Only the low 8 bits of that are defined, the rest is unspecified, but we would add DI dx into that same chain at the end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it as redundant, because we think bp and dx are already equivalent, when in reality that is true only for the lowpart 8 bits. I believe the [*] marked step above is where the bug is. The committed regcprop.c (copy_value) change (but only committed to trunk/11, not to 10) added else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) && partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)) return; and while the first partial_subreg_p call returns true, the second one doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be true and we'd return, but as that reg got clobbered, si became the oldest regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false. But as the testcase shows, what is the oldest_regno in the chain is something that changes over time, so relying on it for anything is problematic, something could have a different oldest_regno and later on get a different oldest_regno (perhaps with different mode) because the oldest_regno got overwritten and it can change both ways. The following patch effectively implements your (2) above. 2021-05-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/100342 * regcprop.c (copy_value): When copying a source reg in a wider mode than it has recorded for the value, adjust recorded destination mode too or punt if !REG_CAN_CHANGE_MODE_P. * gcc.target/i386/pr100342.c: New test. (cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)
Fixed for 10.4 too.