Bug 67736 - Wrong optimization with -fexpensive-optimizations on mips64el
Summary: Wrong optimization with -fexpensive-optimizations on mips64el
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 5.2.1
: P3 normal
Target Milestone: 5.3
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2015-09-27 21:48 UTC by Aurelien Jarno
Modified: 2023-07-07 04:32 UTC (History)
1 user (show)

See Also:
Host: mips64el-unknown-linux-gnu
Target: mips64el-unknown-linux-gnu
Build: mips64el-unknown-linux-gnu
Known to work: 5.3.0, 6.0
Known to fail: 4.8.5, 4.9.3, 5.2.1
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aurelien Jarno 2015-09-27 21:48:58 UTC
The following code is wrongly compiled on a mips64el target when using -fexpensive-optimizations:

#include <inttypes.h>
#include <stdio.h>

int compare(uint64_t state, uint32_t *last, uint8_t buf)
{
    if (*last == ((state | buf) & 0xFFFFFFFF)) {
        printf("B %"PRIx64" %"PRIx32"\n", state, *last);
        return 0;
    }
    return 1;
}

It produces the following code:

00000000000009a0 <compare>:
 9a0:   30c200ff        andi    v0,a2,0xff
 9a4:   8ca60000        lw      a2,0(a1)
 9a8:   00441025        or      v0,v0,a0
 9ac:   10460004        beq     v0,a2,9c0 <compare+0x20>
 9b0:   24020001        li      v0,1
 9b4:   03e00008        jr      ra
 9b8:   00000000        nop
 9bc:   00000000        nop
...

Note how the comparison is done incorrectly, dropping the & 0xFFFFFFFF and sign-extending the value when loading *last.

Using -fno-expensive-optimizations produces the following valid code:
00000000000009a0 <compare>:
 9a0:   30c200ff        andi    v0,a2,0xff
 9a4:   8ca60000        lw      a2,0(a1)
 9a8:   00441025        or      v0,v0,a0
 9ac:   7cc3f803        dext    v1,a2,0x0,0x20
 9b0:   7c42f803        dext    v0,v0,0x0,0x20
 9b4:   10620004        beq     v1,v0,9c8 <compare+0x28>
 9b8:   24020001        li      v0,1
 9bc:   03e00008        jr      ra
 9c0:   00000000        nop
 9c4:   00000000        nop
 9c8:   67bdfff0        daddiu  sp

Here both values are zero extended to 32-bit before the comparison.

Alternatively, when removing the line with the printf, GCC also produces correct code, which is more optimized:

0000000000000960 <compare>:
 960:   30c600ff        andi    a2,a2,0xff
 964:   00c42025        or      a0,a2,a0
 968:   9ca20000        lwu     v0,0(a1)
 96c:   7c86f803        dext    a2,a0,0x0,0x20
 970:   00c21026        xor     v0,a2,v0
 974:   03e00008        jr      ra
 978:   0002102b        sltu    v0,zero,v0
 97c:   00000000        nop

Note that at least gcc 4.8.5, 4.9.3 and 5.2.1 are affected.
Comment 1 Andrew Pinski 2015-09-27 23:26:20 UTC
Can you try the patch in https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00401.html ?

I never got around to updating it for the comments.
Comment 2 Aurelien Jarno 2015-09-29 10:39:48 UTC
Thanks for the patch. I have just tested it, and it indeed fixes the issue from the testcase. I am currently trying to build ffmpeg, from which the testcase has been extracted to see if all the issues are really fixed.
Comment 3 Aurelien Jarno 2015-09-29 12:09:32 UTC
The ffmpeg testsuite is successful on mips64el-linux-gnu with this patch applied.
Comment 4 Steve Ellcey 2015-10-23 15:56:47 UTC
Author: sje
Date: Fri Oct 23 15:56:15 2015
New Revision: 229259

URL: https://gcc.gnu.org/viewcvs?rev=229259&root=gcc&view=rev
Log:
2015-10-23  Steve Ellcey  <sellcey@imgtec.com>
	    Andrew Pinski  <apinski@cavium.com>

	PR rtl-optimization/67736
	* combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead
	of gen_lowpart.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
Comment 5 Steve Ellcey 2015-10-23 15:59:05 UTC
Author: sje
Date: Fri Oct 23 15:58:33 2015
New Revision: 229260

URL: https://gcc.gnu.org/viewcvs?rev=229260&root=gcc&view=rev
Log:
2015-10-23  Steve Ellcey  <sellcey@imgtec.com>
	    Andrew Pinski  <apinski@cavium.com>

	PR rtl-optimization/67736
	* gcc.dg/torture/pr67736.c: New test.
	* gcc.dg/combine-subregs.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/combine-subregs.c
    trunk/gcc/testsuite/gcc.dg/torture/pr67736.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 6 Steve Ellcey 2015-11-02 21:05:05 UTC
Author: sje
Date: Mon Nov  2 21:04:33 2015
New Revision: 229677

URL: https://gcc.gnu.org/viewcvs?rev=229677&root=gcc&view=rev
Log:
2015-11-02  Steve Ellcey  <sellcey@imgtec.com>

	Backport from mainline
	2015-10-23  Steve Ellcey  <sellcey@imgtec.com>
	            Andrew Pinski  <apinski@cavium.com>

	PR rtl-optimization/67736
	* combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead
	of gen_lowpart.

Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/combine.c
Comment 7 Steve Ellcey 2015-11-02 21:08:33 UTC
Author: sje
Date: Mon Nov  2 21:08:02 2015
New Revision: 229678

URL: https://gcc.gnu.org/viewcvs?rev=229678&root=gcc&view=rev
Log:
2015-11-02  Steve Ellcey  <sellcey@imgtec.com>

	2015-10-23  Steve Ellcey  <sellcey@imgtec.com>
		    Andrew Pinski  <apinski@cavium.com>

	PR rtl-optimization/67736
	* gcc.dg/torture/pr67736.c: New test.
	* gcc.dg/combine-subregs.c: New test.

Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.dg/combine-subregs.c
    branches/gcc-5-branch/gcc/testsuite/gcc.dg/torture/pr67736.c
Modified:
    branches/gcc-5-branch/gcc/testsuite/ChangeLog
Comment 8 Steve Ellcey 2015-12-15 18:05:19 UTC
Patch checked in on ToT for 6.0 and on 5.* branch.