Bug 112526 - [14 regression] Miscompilation of ffmpeg test for -m32 since r14-4968-g89e5d902fc55ad
Summary: [14 regression] Miscompilation of ffmpeg test for -m32 since r14-4968-g89e5d9...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 14.0
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
: 112518 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-11-14 09:07 UTC by Sam James
Modified: 2023-11-22 10:34 UTC (History)
3 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-11-15 00:00:00


Attachments
camellia-PR112526.tar.xz (21.47 KB, application/x-xz)
2023-11-14 09:50 UTC, Sam James
Details
reduced.c (5.51 KB, text/plain)
2023-11-14 10:19 UTC, Sam James
Details
reduced.i (17.56 KB, text/plain)
2023-11-14 10:20 UTC, Sam James
Details
reduced.c (4.98 KB, text/plain)
2023-11-14 11:57 UTC, Sam James
Details
reduced.i (16.69 KB, text/plain)
2023-11-14 11:58 UTC, Sam James
Details
gcc14-pr112526.patch (877 bytes, patch)
2023-11-15 16:11 UTC, Jakub Jelinek
Details | Diff
reduced_smaller.c (2.11 KB, text/plain)
2023-11-16 16:28 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2023-11-14 09:07:50 UTC
I noticed earlier that ffmpeg's test suite started failing in the 'camellia' test.

I've not dug into this yet other than bisecting to r14-4968-g89e5d902fc55ad. Needs x86_64-pc-linux-gnu-gcc -m32 (not tried native 32-bit x86).

It fails as (the ref'd .err file is the same as the long output underneath it):
```
/var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1/tests/fate-run.sh fate-camellia "" "" "/var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1-abi_x86_32.x86" 'run libavutil/tests/camellia' 'n
ull' '' '' '1' '' '' '' '' '' '' '' '' '' ''
 /var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1-abi_x86_32.x86/libavutil/tests/camellia
Test camellia failed. Look at tests/data/fate/camellia.err for details.
0 67 d0
1 67 ad
2 31 5d
3 38 69
4 54 7e
5 96 27
6 69 89
7 73 81
8 08 a8
9 57 2b
10 06 24
11 56 09
12 48 44
13 ea f7
14 be 51
15 43 a1
0 01 bc
1 23 aa
2 45 43
3 67 36
4 89 51
5 ab e3
6 cd dd
7 ef fb
8 fe 55
9 dc 9f
10 ba f1
11 98 ea
12 76 f8
13 54 c5
14 32 c9
15 10 39
0 b4 cc
1 99 40
2 34 11
3 01 5a
4 b3 a7
5 e9 90
6 96 98
7 f8 b7
8 4e 34
9 e5 f2
10 ce c8
11 e7 1d
12 d7 b0
11 e7 1d
12 d7 b0
13 9b 4e
14 09 30
15 b9 a3
0 01 79
1 23 90
2 45 41
3 67 b1
4 89 95
5 ab 9b
6 cd 13
7 ef 21
8 fe 96
9 dc ac
10 ba 0b
11 98 4b
12 76 87
13 54 d4
14 32 80
15 10 07
0 9a 54
1 cc 2c
2 23 62
3 7d 5a
4 ff 33
5 16 61
6 d7 4e
7 6c c8
8 20 74
9 ef 34
10 7c e0
11 91 da
12 9e 75
13 3a ed
14 75 c5
15 09 d1
0 01 48
1 23 a8
2 45 0a
3 67 18
4 89 13
5 ab af
6 cd 69
7 ef 9a
8 fe 48
9 dc 0c
10 ba fe
11 98 94
12 76 9b
13 54 03
14 32 ba
15 10 3a
```

The large reproducer for now is:
```
git clone https://git.videolan.org/git/ffmpeg.git # also repro'd with 6.0.1 release
./configure --cc="gcc -m32" --disable-asm --disable-stripping --extra-cflags="-march=znver2"
make -j$(nproc) V=1
make fate-camellia V=1
```
Comment 1 Sam James 2023-11-14 09:08:33 UTC
ubsan doesn't find anything in the camellia test. The general test suite suffers from an issue in the teardown code though.

I'll upload both binaries now.
Comment 2 Richard Biener 2023-11-14 09:28:32 UTC
Jakub is testing a fix.
Comment 3 Jakub Jelinek 2023-11-14 09:30:05 UTC
Not for this, for the r14-5385 introduced issues.
Comment 4 Sam James 2023-11-14 09:50:57 UTC
Created attachment 56581 [details]
camellia-PR112526.tar.xz

Binaries were built with:
* bad: 89e5d902fc55ad375f149f25a84c516ad360a606 (r14-4968-g89e5d902fc55ad)
* good: 8697d3a1dcf32750a3b9dc007586eb5f9ba5f17a (r14-4967-g8697d3a1dcf327)

The bad object looks like camellia.o as:
rm ./libavutil/camellia.o ./libavutil/tests/camellia.o ./libavutil/tests/camellia -f ; gcc -m32 ... # rebuild camelia.o ; make fate-camellia fixes it (and can toggle it on/off by changing -march)
Comment 5 Sam James 2023-11-14 10:19:47 UTC
Created attachment 56582 [details]
reduced.c

First stab at a reduction.
Comment 6 Sam James 2023-11-14 10:20:11 UTC
Created attachment 56583 [details]
reduced.i
Comment 7 Sam James 2023-11-14 10:20:49 UTC
$ gcc uhoh-camellia.i -O2 -m32 -o uhoh ; ./uhoh
$ gcc uhoh-camellia.i -O2 -m32 -o uhoh -march=znver2 ; ./uhoh
Aborted (core dumped)
Comment 8 Sam James 2023-11-14 11:57:45 UTC
Created attachment 56584 [details]
reduced.c
Comment 9 Sam James 2023-11-14 11:58:55 UTC
Created attachment 56585 [details]
reduced.i
Comment 10 Jakub Jelinek 2023-11-15 15:16:07 UTC
I think
--- gcc/config/i386/i386.md.jj	2023-11-14 21:38:38.667046713 +0100
+++ gcc/config/i386/i386.md	2023-11-15 16:06:30.638353592 +0100
@@ -9918,7 +9918,10 @@
    && REGNO (operands[0]) != REGNO (operands[3])
    && (REGNO (operands[0]) == REGNO (operands[4])
        || REGNO (operands[0]) == REGNO (operands[5])
-       || peep2_reg_dead_p (3, operands[0]))"
+       || peep2_reg_dead_p (3, operands[0]))
+   && (REGNO (operands[2]) == REGNO (operands[4])
+       || REGNO (operands[2]) == REGNO (operands[5])
+       || peep2_reg_dead_p (3, operands[2]))"
   [(set (match_dup 2) (match_dup 1))
    (parallel [(set (match_dup 4)
 		   (mult:DWIH (match_dup 2) (match_dup 3)))
ought to fix this.  Unlike the normal imul case where the destination is always
%[re]dx:%[re]ax and operands[2] must be %[re]ax, so it is naturally dead at the end and all we need is verify operands[0], in the mulx case the destination is arbitrary pair of registers, so we need to check that both operands[0] and operands[2] (the latter must be %[re]dx) are dead so that we can optimize, because we stop initializing operands[0] altogether and set operands[2] to the immediate rather than the previous value.

The question is if I manage to create a small reproducer or not for the testsuite.
Comment 11 Jakub Jelinek 2023-11-15 16:11:32 UTC
Created attachment 56593 [details]
gcc14-pr112526.patch

Full untested fix.
Comment 12 GCC Commits 2023-11-16 07:33:57 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f158bd511df1f55ebbbc0df3dee52c4400291984

commit r14-5519-gf158bd511df1f55ebbbc0df3dee52c4400291984
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 16 08:33:18 2023 +0100

    i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi peephole2 [PR112526]
    
    The following testcase is miscompiled on x86_64 since PR110551 r14-4968
    commit.  That commit added 2 peephole2s, one for
    mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
    which I believe is ok, and another one for
    mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, %rZZ, %rWW
    which is wrong.  Both peephole2s verify that %rXX above is dead at
    the end of the pattern, by checking if %rXX is either one of the
    registers overwritten in the multiplication (%rdx:%rax in the first
    case, the 2 destination registers of mulx in the latter case), because
    we no longer set %rXX to that immediate (we set %rax resp. %rdx to it
    instead) when the peephole2 replaces it.  But, we also need to ensure
    that the other register previously set to the value of %rYY and newly
    to imm isn't used after the multiplication, and neither of the peephole2s
    does that.  Now, for the first one (at least assuming in the % pattern
    the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go
    first) I think it is always the case, because operands[2] if it must be %rax
    register will be overwritten by mulq writing to %rdx:%rax.  But in the
    second case, there is no reason why %rdx couldn't be used after the pattern,
    and if it is (like in the testcase), we can't make those changes.
    So, the patch checks similarly to operands[0] that operands[2] (which ought
    to be %rdx if RA puts the % match_dup operand first and nothing swaps it
    afterwards) is either the same register as one of the destination registers
    of mulx or dies at the end of the multiplication.
    
    2023-11-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/112526
            * config/i386/i386.md
            (mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi):
            Verify in define_peephole2 that operands[2] dies or is overwritten
            at the end of multiplication.
    
            * gcc.target/i386/bmi2-pr112526.c: New test.
Comment 13 Jakub Jelinek 2023-11-16 07:34:45 UTC
Should be fixed now.
Comment 14 Sam James 2023-11-16 16:28:09 UTC
Created attachment 56606 [details]
reduced_smaller.c

Here's a smaller (cvise'd) C testcase. I can add it to the testsuite if it's ok.
Comment 15 Jakub Jelinek 2023-11-22 10:34:44 UTC
*** Bug 112518 has been marked as a duplicate of this bug. ***