112526 – [14 regression] Miscompilation of ffmpeg test for -m32 since r14-4968-g89e5d902fc55ad

Bug 112526 - [14 regression] Miscompilation of ffmpeg test for -m32 since r14-4968-g89e5d902fc55ad

Summary: [14 regression] Miscompilation of ffmpeg test for -m32 since r14-4968-g89e5d9...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	14.0

Importance:	P3 normal
Target Milestone:	14.0
Assignee:	Jakub Jelinek

URL:
Keywords:	wrong-code

Duplicates (1):	112518 (view as bug list)
Depends on:
Blocks:

Reported:	2023-11-14 09:07 UTC by Sam James
Modified:	2023-11-22 10:34 UTC (History)
CC List:	3 users (show)

See Also:	https://bugs.gentoo.org/show_bug.cgi?id=917322
Host:
Target:	i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2023-11-15 00:00:00

Attachments
camellia-PR112526.tar.xz (21.47 KB, application/x-xz) 2023-11-14 09:50 UTC, Sam James	Details
reduced.c (5.51 KB, text/plain) 2023-11-14 10:19 UTC, Sam James	Details
reduced.i (17.56 KB, text/plain) 2023-11-14 10:20 UTC, Sam James	Details
reduced.c (4.98 KB, text/plain) 2023-11-14 11:57 UTC, Sam James	Details
reduced.i (16.69 KB, text/plain) 2023-11-14 11:58 UTC, Sam James	Details
gcc14-pr112526.patch (877 bytes, patch) 2023-11-15 16:11 UTC, Jakub Jelinek	Details \| Diff
reduced_smaller.c (2.11 KB, text/plain) 2023-11-16 16:28 UTC, Sam James	Details
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sam James 2023-11-14 09:07:50 UTC

I noticed earlier that ffmpeg's test suite started failing in the 'camellia' test.

I've not dug into this yet other than bisecting to r14-4968-g89e5d902fc55ad. Needs x86_64-pc-linux-gnu-gcc -m32 (not tried native 32-bit x86).

It fails as (the ref'd .err file is the same as the long output underneath it):
```
/var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1/tests/fate-run.sh fate-camellia "" "" "/var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1-abi_x86_32.x86" 'run libavutil/tests/camellia' 'n
ull' '' '' '1' '' '' '' '' '' '' '' '' '' ''
 /var/tmp/portage/media-video/ffmpeg-6.0.1/work/ffmpeg-6.0.1-abi_x86_32.x86/libavutil/tests/camellia
Test camellia failed. Look at tests/data/fate/camellia.err for details.
0 67 d0
1 67 ad
2 31 5d
3 38 69
4 54 7e
5 96 27
6 69 89
7 73 81
8 08 a8
9 57 2b
10 06 24
11 56 09
12 48 44
13 ea f7
14 be 51
15 43 a1
0 01 bc
1 23 aa
2 45 43
3 67 36
4 89 51
5 ab e3
6 cd dd
7 ef fb
8 fe 55
9 dc 9f
10 ba f1
11 98 ea
12 76 f8
13 54 c5
14 32 c9
15 10 39
0 b4 cc
1 99 40
2 34 11
3 01 5a
4 b3 a7
5 e9 90
6 96 98
7 f8 b7
8 4e 34
9 e5 f2
10 ce c8
11 e7 1d
12 d7 b0
11 e7 1d
12 d7 b0
13 9b 4e
14 09 30
15 b9 a3
0 01 79
1 23 90
2 45 41
3 67 b1
4 89 95
5 ab 9b
6 cd 13
7 ef 21
8 fe 96
9 dc ac
10 ba 0b
11 98 4b
12 76 87
13 54 d4
14 32 80
15 10 07
0 9a 54
1 cc 2c
2 23 62
3 7d 5a
4 ff 33
5 16 61
6 d7 4e
7 6c c8
8 20 74
9 ef 34
10 7c e0
11 91 da
12 9e 75
13 3a ed
14 75 c5
15 09 d1
0 01 48
1 23 a8
2 45 0a
3 67 18
4 89 13
5 ab af
6 cd 69
7 ef 9a
8 fe 48
9 dc 0c
10 ba fe
11 98 94
12 76 9b
13 54 03
14 32 ba
15 10 3a
```

The large reproducer for now is:
```
git clone https://git.videolan.org/git/ffmpeg.git # also repro'd with 6.0.1 release
./configure --cc="gcc -m32" --disable-asm --disable-stripping --extra-cflags="-march=znver2"
make -j$(nproc) V=1
make fate-camellia V=1
```

Comment 1 Sam James 2023-11-14 09:08:33 UTC

ubsan doesn't find anything in the camellia test. The general test suite suffers from an issue in the teardown code though.

I'll upload both binaries now.

Comment 2 Richard Biener 2023-11-14 09:28:32 UTC

Jakub is testing a fix.

Comment 3 Jakub Jelinek 2023-11-14 09:30:05 UTC

Not for this, for the r14-5385 introduced issues.

Comment 4 Sam James 2023-11-14 09:50:57 UTC

Created attachment 56581 [details]
camellia-PR112526.tar.xz

Binaries were built with:
* bad: 89e5d902fc55ad375f149f25a84c516ad360a606 (r14-4968-g89e5d902fc55ad)
* good: 8697d3a1dcf32750a3b9dc007586eb5f9ba5f17a (r14-4967-g8697d3a1dcf327)

The bad object looks like camellia.o as:
rm ./libavutil/camellia.o ./libavutil/tests/camellia.o ./libavutil/tests/camellia -f ; gcc -m32 ... # rebuild camelia.o ; make fate-camellia fixes it (and can toggle it on/off by changing -march)

Comment 5 Sam James 2023-11-14 10:19:47 UTC

Created attachment 56582 [details]
reduced.c

First stab at a reduction.

Comment 6 Sam James 2023-11-14 10:20:11 UTC

Created attachment 56583 [details]
reduced.i

Comment 7 Sam James 2023-11-14 10:20:49 UTC

$ gcc uhoh-camellia.i -O2 -m32 -o uhoh ; ./uhoh
$ gcc uhoh-camellia.i -O2 -m32 -o uhoh -march=znver2 ; ./uhoh
Aborted (core dumped)

Comment 8 Sam James 2023-11-14 11:57:45 UTC

Created attachment 56584 [details]
reduced.c

Comment 9 Sam James 2023-11-14 11:58:55 UTC

Created attachment 56585 [details]
reduced.i

Comment 10 Jakub Jelinek 2023-11-15 15:16:07 UTC

I think
--- gcc/config/i386/i386.md.jj	2023-11-14 21:38:38.667046713 +0100
+++ gcc/config/i386/i386.md	2023-11-15 16:06:30.638353592 +0100
@@ -9918,7 +9918,10 @@
    && REGNO (operands[0]) != REGNO (operands[3])
    && (REGNO (operands[0]) == REGNO (operands[4])
        || REGNO (operands[0]) == REGNO (operands[5])
-       || peep2_reg_dead_p (3, operands[0]))"
+       || peep2_reg_dead_p (3, operands[0]))
+   && (REGNO (operands[2]) == REGNO (operands[4])
+       || REGNO (operands[2]) == REGNO (operands[5])
+       || peep2_reg_dead_p (3, operands[2]))"
   [(set (match_dup 2) (match_dup 1))
    (parallel [(set (match_dup 4)
 		   (mult:DWIH (match_dup 2) (match_dup 3)))
ought to fix this.  Unlike the normal imul case where the destination is always
%[re]dx:%[re]ax and operands[2] must be %[re]ax, so it is naturally dead at the end and all we need is verify operands[0], in the mulx case the destination is arbitrary pair of registers, so we need to check that both operands[0] and operands[2] (the latter must be %[re]dx) are dead so that we can optimize, because we stop initializing operands[0] altogether and set operands[2] to the immediate rather than the previous value.

The question is if I manage to create a small reproducer or not for the testsuite.

Comment 11 Jakub Jelinek 2023-11-15 16:11:32 UTC

Created attachment 56593 [details]
gcc14-pr112526.patch

Full untested fix.

Comment 12 GCC Commits 2023-11-16 07:33:57 UTC

The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f158bd511df1f55ebbbc0df3dee52c4400291984

commit r14-5519-gf158bd511df1f55ebbbc0df3dee52c4400291984
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 16 08:33:18 2023 +0100

    i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi peephole2 [PR112526]
    
    The following testcase is miscompiled on x86_64 since PR110551 r14-4968
    commit.  That commit added 2 peephole2s, one for
    mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
    which I believe is ok, and another one for
    mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, %rZZ, %rWW
    which is wrong.  Both peephole2s verify that %rXX above is dead at
    the end of the pattern, by checking if %rXX is either one of the
    registers overwritten in the multiplication (%rdx:%rax in the first
    case, the 2 destination registers of mulx in the latter case), because
    we no longer set %rXX to that immediate (we set %rax resp. %rdx to it
    instead) when the peephole2 replaces it.  But, we also need to ensure
    that the other register previously set to the value of %rYY and newly
    to imm isn't used after the multiplication, and neither of the peephole2s
    does that.  Now, for the first one (at least assuming in the % pattern
    the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go
    first) I think it is always the case, because operands[2] if it must be %rax
    register will be overwritten by mulq writing to %rdx:%rax.  But in the
    second case, there is no reason why %rdx couldn't be used after the pattern,
    and if it is (like in the testcase), we can't make those changes.
    So, the patch checks similarly to operands[0] that operands[2] (which ought
    to be %rdx if RA puts the % match_dup operand first and nothing swaps it
    afterwards) is either the same register as one of the destination registers
    of mulx or dies at the end of the multiplication.
    
    2023-11-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/112526
            * config/i386/i386.md
            (mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi):
            Verify in define_peephole2 that operands[2] dies or is overwritten
            at the end of multiplication.
    
            * gcc.target/i386/bmi2-pr112526.c: New test.

Comment 13 Jakub Jelinek 2023-11-16 07:34:45 UTC

Should be fixed now.

Comment 14 Sam James 2023-11-16 16:28:09 UTC

Created attachment 56606 [details]
reduced_smaller.c

Here's a smaller (cvise'd) C testcase. I can add it to the testsuite if it's ok.

Comment 15 Jakub Jelinek 2023-11-22 10:34:44 UTC

*** Bug 112518 has been marked as a duplicate of this bug. ***