I noticed earlier that ffmpeg's test suite started failing in the 'camellia' test. I've not dug into this yet other than bisecting to r14-4968-g89e5d902fc55ad. Needs x86_64-pc-linux-gnu-gcc -m32 (not tried native 32-bit x86). It fails as (the ref'd .err file is the same as the long output underneath it): ``` /var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1/tests/fate-run.sh fate-camellia "" "" "/var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1-abi_x86_32.x86" 'run libavutil/tests/camellia' 'n ull' '' '' '1' '' '' '' '' '' '' '' '' '' '' /var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1-abi_x86_32.x86/libavutil/tests/camellia Test camellia failed. Look at tests/data/fate/camellia.err for details. 0 67 d0 1 67 ad 2 31 5d 3 38 69 4 54 7e 5 96 27 6 69 89 7 73 81 8 08 a8 9 57 2b 10 06 24 11 56 09 12 48 44 13 ea f7 14 be 51 15 43 a1 0 01 bc 1 23 aa 2 45 43 3 67 36 4 89 51 5 ab e3 6 cd dd 7 ef fb 8 fe 55 9 dc 9f 10 ba f1 11 98 ea 12 76 f8 13 54 c5 14 32 c9 15 10 39 0 b4 cc 1 99 40 2 34 11 3 01 5a 4 b3 a7 5 e9 90 6 96 98 7 f8 b7 8 4e 34 9 e5 f2 10 ce c8 11 e7 1d 12 d7 b0 11 e7 1d 12 d7 b0 13 9b 4e 14 09 30 15 b9 a3 0 01 79 1 23 90 2 45 41 3 67 b1 4 89 95 5 ab 9b 6 cd 13 7 ef 21 8 fe 96 9 dc ac 10 ba 0b 11 98 4b 12 76 87 13 54 d4 14 32 80 15 10 07 0 9a 54 1 cc 2c 2 23 62 3 7d 5a 4 ff 33 5 16 61 6 d7 4e 7 6c c8 8 20 74 9 ef 34 10 7c e0 11 91 da 12 9e 75 13 3a ed 14 75 c5 15 09 d1 0 01 48 1 23 a8 2 45 0a 3 67 18 4 89 13 5 ab af 6 cd 69 7 ef 9a 8 fe 48 9 dc 0c 10 ba fe 11 98 94 12 76 9b 13 54 03 14 32 ba 15 10 3a ``` The large reproducer for now is: ``` git clone https://git.videolan.org/git/ffmpeg.git # also repro'd with 6.0.1 release ./configure --cc="gcc -m32" --disable-asm --disable-stripping --extra-cflags="-march=znver2" make -j$(nproc) V=1 make fate-camellia V=1 ```
ubsan doesn't find anything in the camellia test. The general test suite suffers from an issue in the teardown code though. I'll upload both binaries now.
Jakub is testing a fix.
Not for this, for the r14-5385 introduced issues.
Created attachment 56581 [details] camellia-PR112526.tar.xz Binaries were built with: * bad: 89e5d902fc55ad375f149f25a84c516ad360a606 (r14-4968-g89e5d902fc55ad) * good: 8697d3a1dcf32750a3b9dc007586eb5f9ba5f17a (r14-4967-g8697d3a1dcf327) The bad object looks like camellia.o as: rm ./libavutil/camellia.o ./libavutil/tests/camellia.o ./libavutil/tests/camellia -f ; gcc -m32 ... # rebuild camelia.o ; make fate-camellia fixes it (and can toggle it on/off by changing -march)
Created attachment 56582 [details] reduced.c First stab at a reduction.
Created attachment 56583 [details] reduced.i
$ gcc uhoh-camellia.i -O2 -m32 -o uhoh ; ./uhoh $ gcc uhoh-camellia.i -O2 -m32 -o uhoh -march=znver2 ; ./uhoh Aborted (core dumped)
Created attachment 56584 [details] reduced.c
Created attachment 56585 [details] reduced.i
I think --- gcc/config/i386/i386.md.jj 2023-11-14 21:38:38.667046713 +0100 +++ gcc/config/i386/i386.md 2023-11-15 16:06:30.638353592 +0100 @@ -9918,7 +9918,10 @@ && REGNO (operands[0]) != REGNO (operands[3]) && (REGNO (operands[0]) == REGNO (operands[4]) || REGNO (operands[0]) == REGNO (operands[5]) - || peep2_reg_dead_p (3, operands[0]))" + || peep2_reg_dead_p (3, operands[0])) + && (REGNO (operands[2]) == REGNO (operands[4]) + || REGNO (operands[2]) == REGNO (operands[5]) + || peep2_reg_dead_p (3, operands[2]))" [(set (match_dup 2) (match_dup 1)) (parallel [(set (match_dup 4) (mult:DWIH (match_dup 2) (match_dup 3))) ought to fix this. Unlike the normal imul case where the destination is always %[re]dx:%[re]ax and operands[2] must be %[re]ax, so it is naturally dead at the end and all we need is verify operands[0], in the mulx case the destination is arbitrary pair of registers, so we need to check that both operands[0] and operands[2] (the latter must be %[re]dx) are dead so that we can optimize, because we stop initializing operands[0] altogether and set operands[2] to the immediate rather than the previous value. The question is if I manage to create a small reproducer or not for the testsuite.
Created attachment 56593 [details] gcc14-pr112526.patch Full untested fix.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:f158bd511df1f55ebbbc0df3dee52c4400291984 commit r14-5519-gf158bd511df1f55ebbbc0df3dee52c4400291984 Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Nov 16 08:33:18 2023 +0100 i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi peephole2 [PR112526] The following testcase is miscompiled on x86_64 since PR110551 r14-4968 commit. That commit added 2 peephole2s, one for mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY which I believe is ok, and another one for mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, %rZZ, %rWW which is wrong. Both peephole2s verify that %rXX above is dead at the end of the pattern, by checking if %rXX is either one of the registers overwritten in the multiplication (%rdx:%rax in the first case, the 2 destination registers of mulx in the latter case), because we no longer set %rXX to that immediate (we set %rax resp. %rdx to it instead) when the peephole2 replaces it. But, we also need to ensure that the other register previously set to the value of %rYY and newly to imm isn't used after the multiplication, and neither of the peephole2s does that. Now, for the first one (at least assuming in the % pattern the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go first) I think it is always the case, because operands[2] if it must be %rax register will be overwritten by mulq writing to %rdx:%rax. But in the second case, there is no reason why %rdx couldn't be used after the pattern, and if it is (like in the testcase), we can't make those changes. So, the patch checks similarly to operands[0] that operands[2] (which ought to be %rdx if RA puts the % match_dup operand first and nothing swaps it afterwards) is either the same register as one of the destination registers of mulx or dies at the end of the multiplication. 2023-11-16 Jakub Jelinek <jakub@redhat.com> PR target/112526 * config/i386/i386.md (mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi): Verify in define_peephole2 that operands[2] dies or is overwritten at the end of multiplication. * gcc.target/i386/bmi2-pr112526.c: New test.
Should be fixed now.
Created attachment 56606 [details] reduced_smaller.c Here's a smaller (cvise'd) C testcase. I can add it to the testsuite if it's ok.
*** Bug 112518 has been marked as a duplicate of this bug. ***