Since r10-6426-g5f0653a8b75a5ad5a5405a27dd92d3a5759eed4c we on x86_64-linux miscompile following testcase at -O2: struct S { unsigned a:1, b:1, c:1, d:1, e:14, f:14; }; __attribute__((noipa)) int foo (struct S x) { if (x.a != 0 || x.b != 1 || x.c != 0 || x.d != 1 || x.e != 7239 || x.f != 6474) __builtin_abort (); } __attribute__((noipa)) void bar (struct S x, struct S y) { if (x.a != 0 || x.b != 1 || x.c != 0 || x.d != 1 || x.e != 7239 || x.f != 6474) __builtin_abort (); if (y.a != 0 || y.b != 1 || y.c != 1 || y.d != 1 || y.e != 16320 || y.f != 7315) __builtin_abort (); } __attribute__((noipa)) void baz (struct S x) { if (x.a != 1 || x.b != 1 || x.c != 1 || x.d != 1 || x.e != 16320 || x.f != 7315) __builtin_abort (); } __attribute__((noipa)) void qux (struct S x, struct S y, unsigned z) { struct S a = x, b; for (unsigned i = 0; i < z; ++i) foo (x); if (x.a && x.e == 16) a.e = 32; b = a; b.c = y.c; b.e = y.e; b.f = y.f; bar (a, b); a = b; __asm volatile ("" : : : "ax", "bx", "cx", "dx", "si", "di", #ifdef __OPTIMIZE__ "bp", #endif "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"); a.a = 1; a.c = 1; baz (a); } int main () { struct S x = { 0, 1, 0, 1, 7239, 6474 }; struct S y = { 1, 0, 1, 0, 16320, 7315 }; qux (x, y, 1); return 0; } (in original source obviously there was no inline asm, but instead the function was large enough that the variable got spilled). I believe this is a RA bug though. In *.ira we have: (insn 67 66 68 3 (parallel [ (set (strict_low_part (subreg:QI (reg/v:SI 94 [ a ]) 0)) (ior:QI (subreg:QI (reg/v:SI 94 [ a ]) 0) (const_int 5 [0x5]))) (clobber (reg:CC 17 flags)) ]) "gallivm2.c":51:3 492 {*iorqi_1_slp} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 68 67 69 3 (set (reg:SI 5 di) (reg/v:SI 94 [ a ])) "gallivm2.c":51:3 67 {*movsi_internal} (expr_list:REG_DEAD (reg/v:SI 94 [ a ]) (nil))) which looks good to me, the strict_low_part in there says that the low 8 bits of pseudo 94 are ored with 5 and the upper 24 bits stay as is (so, in the end it is the same as SImode |= 5, not really sure why we ended up with that, e.g. if the loop calling foo is changed into a single call to foo, it doesn't happen). But LRA changes this into: (insn 120 66 67 3 (set (reg:QI 0 ax [137]) (mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8])) [4 %sfp+-8 S1 A32])) "gallivm2.c":51:3 69 {*movqi_internal} (nil)) (insn 67 120 121 3 (parallel [ (set (strict_low_part (reg:QI 0 ax [137])) (ior:QI (reg:QI 0 ax [137]) (const_int 5 [0x5]))) (clobber (reg:CC 17 flags)) ]) "gallivm2.c":51:3 492 {*iorqi_1_slp} (nil)) (insn 121 67 122 3 (set (reg:QI 5 di [orig:94 a ] [94]) (reg:QI 0 ax [137])) "gallivm2.c":51:3 69 {*movqi_internal} (nil)) which is not equivalent, because it makes the upper 24 bits undefined instead of loaded from the spill slot.
Jakub, thank you for working on the PR and providing the test case. It seems to me that the problem occurs in inheritance sub-pass of LRA. It is a very complicated sub-pass. Making a fix and testing it can take a few days. But I hope to fix the PR on this week.
I wonder where we'd simplify things like (insn 67 66 68 3 (parallel [ (set (strict_low_part (subreg:QI (reg/v:SI 94 [ a ]) 0)) (ior:QI (subreg:QI (reg/v:SI 94 [ a ]) 0) (const_int 5 [0x5]))) (clobber (reg:CC 17 flags)) ]) "gallivm2.c":51:3 492 {*iorqi_1_slp} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) I guess simplify_rtx would just see the SET_SRC? The above might appear quite frequently from bit fiddling RTL expansion I guess.
Perhaps define_split? For xor and and too.
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>: https://gcc.gnu.org/g:5261cf8ce824bfc75eb6f12ad5e3716c085b6f9a commit r11-937-g5261cf8ce824bfc75eb6f12ad5e3716c085b6f9a Author: Vladimir N. Makarov <vmakarov@redhat.com> Date: Thu Jun 4 12:04:48 2020 -0400 Add processing STRICT_LOW_PART for matched reloads. 2020-06-04 Vladimir Makarov <vmakarov@redhat.com> PR middle-end/95464 * lra.c (lra_emit_move): Add processing STRICT_LOW_PART. * lra-constraints.c (match_reload): Use STRICT_LOW_PART in output reload if the original insn has it too.
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>: https://gcc.gnu.org/g:e7ef9a40cd0c688cd331bc26224d1fbe360c1fe6 commit r11-951-ge7ef9a40cd0c688cd331bc26224d1fbe360c1fe6 Author: Vladimir N. Makarov <vmakarov@redhat.com> Date: Thu Jun 4 13:32:24 2020 -0400 Add test for PR95464.c. 2020-06-04 Vladimir Makarov <vmakarov@redhat.com> PR middle-end/95464 * gcc.target/i386/pr95464.c: New.
GCC 10.2 is released, adjusting target milestone.
Vlad, do you plan to backport this to 10.3?
(In reply to Jakub Jelinek from comment #7) > Vlad, do you plan to backport this to 10.3? I guess so. We had enough time to test it. I don't see any complaints about this patch. I'll backport it today into release/gcc-10 branch.
The releases/gcc-10 branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>: https://gcc.gnu.org/g:038b65f378b36624b1fd3867fa5e578c1bfa50cc commit r10-8799-g038b65f378b36624b1fd3867fa5e578c1bfa50cc Author: Vladimir N. Makarov <vmakarov@redhat.com> Date: Thu Jun 4 12:04:48 2020 -0400 Add processing STRICT_LOW_PART for matched reloads. 2020-06-04 Vladimir Makarov <vmakarov@redhat.com> PR middle-end/95464 * lra.c (lra_emit_move): Add processing STRICT_LOW_PART. * lra-constraints.c (match_reload): Use STRICT_LOW_PART in output reload if the original insn has it too. (cherry picked from commit 5261cf8ce824bfc75eb6f12ad5e3716c085b6f9a)
The releases/gcc-10 branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>: https://gcc.gnu.org/g:957e37ac288fe6c40ee0bb98a0b06a85be8a35e3 commit r10-8800-g957e37ac288fe6c40ee0bb98a0b06a85be8a35e3 Author: Vladimir N. Makarov <vmakarov@redhat.com> Date: Thu Jun 4 13:32:24 2020 -0400 Add test for PR95464.c. 2020-06-04 Vladimir Makarov <vmakarov@redhat.com> PR middle-end/95464 * gcc.target/i386/pr95464.c: New. (cherry picked from commit e7ef9a40cd0c688cd331bc26224d1fbe360c1fe6)
(In reply to Jakub Jelinek from comment #7) > Vlad, do you plan to backport this to 10.3? Unfortunately, a few days ago people reported a serious problem with the patch (see PR97313). I've just submitted a patch fixing the new problem https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=bb37ad8cc0fc937c7afcdab471e5d65d176041c3 I believe this new patch should be also cherry-picked to gcc-10 branch.
Certainly.
(In reply to Jakub Jelinek from comment #12) > Certainly. I've just committed it into the branch https://gcc.gnu.org/g:70a66ff0228277b4dd89263a663c0a87eb5d782f
GCC 10.3 is being released, retargeting bugs to GCC 10.4.
Fixed.