Bug 78438 - [7 Regression] incorrect comparison optimization
Summary: [7 Regression] incorrect comparison optimization
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 7.0
: P1 normal
Target Milestone: 7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2016-11-21 00:33 UTC by Dmitry Babokin
Modified: 2021-11-01 23:07 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-11-21 00:00:00


Attachments
reproducer (286 bytes, application/x-gzip)
2016-11-21 00:33 UTC, Dmitry Babokin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Babokin 2016-11-21 00:33:55 UTC
Created attachment 40093 [details]
reproducer

> g++ -O0 main.cpp func.cpp -o no_opt; ./no_opt
0
> g++ -O1 main.cpp func.cpp -o opt; ./opt
2

> cat main.cpp
#include <stdio.h>

char C = 0;
int I = 197412621;

void foo();

int main() {
  foo();
  printf("%d\n", C);
}

> cat func.cpp
extern char C;
extern int I;
void foo() {
  C = 0 > (short) (I >> 11);
}

> g++ -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/home/dybaboki/gcc/bin/libexec/gcc/x86_64-pc-linux-gnu/7.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --prefix=/home/dybaboki/gcc/bin --enable-languages=c,c++,fortran,lto : (reconfigured) ../gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --prefix=/home/dybaboki/gcc/bin --enable-languages=c,c++,fortran,lto : (reconfigured) ../gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --prefix=/home/dybaboki/gcc/bin --enable-languages=c,c++,fortran,lto : (reconfigured) ../gcc_github/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --prefix=/home/dybaboki/gcc/bin --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --prefix=/home/dybaboki/gcc/bin --enable-languages=c,c++,fortran,lto --no-create --no-recursion
Thread model: posix
gcc version 7.0.0 20161119 (experimental) (GCC)
Comment 1 Jakub Jelinek 2016-11-21 08:37:44 UTC
Started with r242414.
Simplified testcase:
char a = 0;
int b = 197412621;

__attribute__ ((noinline, noclone))
void foo ()
{
  a = 0 > (short) (b >> 11);
}

int
main ()
{
  asm volatile ("" : : : "memory");
  foo ();
  if (a != 0)
    __builtin_abort ();
  return 0;
}
Comment 2 Jakub Jelinek 2016-11-21 08:43:49 UTC
Before combine we have:
(insn 6 5 7 2 (parallel [
            (set (reg:SI 92)
                (ashiftrt:SI (reg:SI 93 [ b ])
                    (const_int 11 [0xb])))
            (clobber (reg:CC 17 flags))
        ]) "pr78438.c":7 563 {*ashrsi3_1}
     (expr_list:REG_DEAD (reg:SI 93 [ b ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_EQUAL (ashiftrt:SI (mem/c:SI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7f8ffbb66ea0 b>) [1 b+0 S4 A32])
                    (const_int 11 [0xb]))
                (nil)))))
(insn 7 6 8 2 (parallel [
            (set (reg:HI 94)
                (lshiftrt:HI (subreg:HI (reg:SI 92) 0)
                    (const_int 15 [0xf])))
            (clobber (reg:CC 17 flags))
        ]) "pr78438.c":7 572 {*lshrhi3_1}
     (expr_list:REG_DEAD (reg:SI 92)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
combine turns that into non-equivalent:
(insn 7 6 8 2 (parallel [
            (set (reg:SI 94)
                (ashiftrt:SI (reg:SI 93 [ b ])
                    (const_int 26 [0x1a])))
            (clobber (reg:CC 17 flags))
        ]) "pr78438.c":7 563 {*ashrsi3_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg:SI 93 [ b ])
            (nil))))
Comment 3 Jakub Jelinek 2016-11-21 09:25:19 UTC
The over-sized bitfield isn't really needed for this, making it more severe:
struct S {
  long int : 23;
  long int a : 24;
  long int b : 10;
  long int c : 24;
  signed char d : 8;
} s;

__attribute__((noinline, noclone)) void
foo ()
{
  s.d = 0;
  s.c = -1193165L;
}

int
main ()
{
  foo ();
  if (s.d != 0)
    __builtin_abort ();
  return 0;
}
Comment 4 Jakub Jelinek 2016-11-21 09:40:55 UTC
Oops, #c3 was meant for PR78436.
Comment 5 Segher Boessenkool 2016-11-22 01:58:53 UTC
Why is this wrong?  This isn't the same reg 94 (it is reused):

Failed to match this instruction:
(set (mem/c:QI (symbol_ref:DI ("a") [flags 0x2]  <var_decl 0x3fffa1d60cf0 a>) [0 a+0 S1 A8])
    (subreg:QI (ashiftrt:SI (reg:SI 93 [ b ])
            (const_int 26 [0x1a])) 0))
Successfully matched this instruction:
(set (reg:SI 94)
    (ashiftrt:SI (reg:SI 93 [ b ])
        (const_int 26 [0x1a])))
Successfully matched this instruction:
(set (mem/c:QI (symbol_ref:DI ("a") [flags 0x2]  <var_decl 0x3fffa1d60cf0 a>) [0 a+0 S1 A8])
    (subreg:QI (reg:SI 94) 0))

The low 8 bits of  r93 a>> 26  are exactly what we need to store afaics?
Comment 6 Segher Boessenkool 2016-11-22 02:04:12 UTC
n/m, too tired I guess, please ignore comment 5.
Comment 7 Segher Boessenkool 2016-11-23 14:33:46 UTC
Author: segher
Date: Wed Nov 23 14:33:13 2016
New Revision: 242757

URL: https://gcc.gnu.org/viewcvs?rev=242757&root=gcc&view=rev
Log:
combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)

r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
It all has the same root cause: that patch makes combine convert every
lowpart subreg of a logical shift right to a zero_extract.  This cannot
work at all if it is not a constant shift, and it has to be a bit more
careful exactly which bits it extracts.


	PR target/77881
	PR bootstrap/78390
	PR target/78438
	PR bootstrap/78477
	* combine.c (make_compound_operation_int): Do not convert a subreg of
	a non-constant logical shift right to a zero_extract.  Handle the case
	where some zero bits have been shifted into the range covered by that
	subreg.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
Comment 8 Jakub Jelinek 2016-11-25 19:24:54 UTC
Author: jakub
Date: Fri Nov 25 19:24:22 2016
New Revision: 242883

URL: https://gcc.gnu.org/viewcvs?rev=242883&root=gcc&view=rev
Log:
	PR rtl-optimization/78438
	PR rtl-optimization/78477
	* gcc.c-torture/execute/pr78438.c: New test.
	* gcc.c-torture/execute/pr78477.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr78438.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr78477.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 9 Jakub Jelinek 2016-11-25 19:25:37 UTC
Fixed.