81423 – [7 Regression] Wrong code at -O2

Bug 81423 - [7 Regression] Wrong code at -O2

Summary: [7 Regression] Wrong code at -O2

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	8.0

Importance:	P2 normal
Target Milestone:	8.0
Assignee:	Segher Boessenkool

URL:
Keywords:	wrong-code

Depends on:
Blocks:	yarpgen
	Show dependency tree / graph

Reported:	2017-07-13 00:12 UTC by Dmitry Babokin
Modified:	2021-11-01 23:07 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:	8.0
Known to fail:	7.5.0
Last reconfirmed:	2017-07-13 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dmitry Babokin 2017-07-13 00:12:45 UTC

gcc trunk, rev250140, x86_64.

Test case has no undefined behavior, but -O2 produces incorrect result.

> cat f.cpp
#include <iostream>

unsigned long long int ll = 0;
unsigned long long int ull1 = 1ULL;
unsigned long long int ull2 = 12008284144813806346ULL;
unsigned long long int ull3;

void foo() {
  ll = -5597998501375493990LL;

  ll = unsigned(5677365550390624949L - ll) - (ull1 > 0);
  ull3 = unsigned(2067854353L << (((ll + -2129105131L) ^ 10280750144413668236ULL) - 10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2) - 12098357307243495419ULL);
}

int main() {
  foo();
  std::cout << ull3 << " (expected 3998784)\n";
  return 0;
}

> g++ f.cpp -O0 -o out; ./out
3998784 (expected 3998784)

> g++ f.cpp -O2 -o out; ./out
3493659712 (expected 3998784)

Comment 1 Martin Liška 2017-07-13 08:04:51 UTC

Confirmed, started with r225463.

Comment 2 Uroš Bizjak 2017-07-13 08:20:24 UTC

The referred commit enabled new extzv<mode> and insv<mode> patterns, and in the past, it exposed a bunch of bugs in the generic RTL part.

Let me analyse what is going on here.

Comment 3 Uroš Bizjak 2017-07-13 11:13:58 UTC

Following is a c testcase:

--cut here--
unsigned long long int ll = 0;
unsigned long long int ull1 = 1ULL;
unsigned long long int ull2 = 12008284144813806346ULL;
unsigned long long int ull3;

void
foo ()
{
  ll = -5597998501375493990LL;

  ll = (5677365550390624949L - ll) - (ull1 > 0);
  ull3 = (unsigned int)
    (2067854353L <<
     (((ll + -2129105131L) ^ 10280750144413668236ULL) -
      10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2)
				    - 12098357307243495419ULL);
}

int
main ()
{
  foo ();
  printf ("%llu expected 3998784)\n", ull3);
  printf ("%x expected 3d0440)\n", ull3);
  return 0;
}
--cut here--

The problem is in combine pass, which combines following sequence in the main function:

(insn 17 16 18 2 (parallel [
            (set (reg:SI 114)
                (ior:SI (subreg:SI (reg:DI 113 [ ull2 ]) 0)
                    (const_int -8651009 [0xffffffffff7bfeff])))
            (clobber (reg:CC 17 flags))
        ]) "pr81423.c":15 415 {*iorsi_1}
     (expr_list:REG_DEAD (reg:DI 113 [ ull2 ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 18 17 19 2 (parallel [
            (set (reg:SI 115)
                (plus:SI (reg:SI 114)
                    (const_int 5 [0x5])))
            (clobber (reg:CC 17 flags))
        ]) "pr81423.c":16 217 {*addsi_1}
     (expr_list:REG_DEAD (reg:SI 114)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 19 18 20 2 (parallel [
            (set (reg:SI 116)
                (lshiftrt:SI (subreg:SI (reg:DI 111) 0)
                    (subreg:QI (reg:SI 115) 0)))
            (clobber (reg:CC 17 flags))
        ]) "pr81423.c":15 544 {*lshrsi3_1}
     (expr_list:REG_DEAD (reg:SI 115)
        (expr_list:REG_DEAD (reg:DI 111)
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(insn 20 19 21 2 (set (reg:DI 103 [ _20 ])
        (zero_extend:DI (reg:SI 116))) "pr81423.c":15 131 {*zero_extendsidi2}
     (expr_list:REG_DEAD (reg:SI 116)
        (nil)))


Trying 17, 18, 19 -> 20:
Failed to match this instruction:
(set (reg:DI 103 [ _20 ])
    (zero_extract:DI (reg:DI 111)
        (const_int 32 [0x20])
        (const_int 4 [0x4])))
Failed to match this instruction:
(set (reg:DI 103 [ _20 ])
    (and:DI (lshiftrt:DI (reg:DI 111)
            (const_int 4 [0x4]))
        (const_int 4294967295 [0xffffffff])))
Successfully matched this instruction:
(set (reg:DI 116)
    (ashift:DI (reg:DI 111)
        (const_int 28 [0x1c])))
Successfully matched this instruction:
(set (reg:DI 103 [ _20 ])
    (lshiftrt:DI (reg:DI 116)
        (const_int 32 [0x20])))
allowing combination of insns 17, 18, 19 and 20
original costs 6 + 4 + 4 + 1 = 15
replacement costs 4 + 4 = 8
deferring deletion of insn with uid = 18.
deferring deletion of insn with uid = 17.
deferring deletion of insn with uid = 16.
modifying insn i2    19: {r116:DI=r111:DI<<0x1c;clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r111:DI
deferring rescan insn with uid = 19.
modifying insn i3    20: {r103:DI=r116:DI 0>>0x20;clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r116:DI
deferring rescan insn with uid = 20.


Please note that we have in the original sequence:

(insn 19 18 20 2 (parallel [
            (set (reg:SI 116)
                (lshiftrt:SI (subreg:SI (reg:DI 111) 0)
                    (subreg:QI (reg:SI 115) 0)))
            (clobber (reg:CC 17 flags))
        ]) "pr81423.c":15 544 {*lshrsi3_1}
     (expr_list:REG_DEAD (reg:SI 115)
        (expr_list:REG_DEAD (reg:DI 111)
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))

in effect, this insn is shifting value, truncated to 32 bits, from

rax            0x1ed03d04400    2117482857472

by 4 via:

   0x0000000000400557 <+71>:    add    $0x5,%ecx
   0x000000000040055a <+74>:    shr    %cl,%eax

resulting in:

rax            0x3d0440 3998784
rcx            0x4      4

However, combine tries to create:

(set (reg:DI 103 [ _20 ])
    (zero_extract:DI (reg:DI 111)
        (const_int 32 [0x20])
        (const_int 4 [0x4])))

trying to extract 32bit value from position 4. This is not correct, due to 32bit  left shift  in the sequence, the combined insn should zero-extract 28 bits only.

Comment 4 Uroš Bizjak 2017-07-13 11:40:47 UTC

CC maintainer.

Comment 5 Segher Boessenkool 2017-07-17 09:35:29 UTC

There are a few issues here.

1) I cannot reproduce it: a trunk x86_64-linux compiler stores to
memory in that insn 20 (the testcase from comment 3).

2) 17, 18 -> 19 should already match.  Instead we get

Trying 17, 18 -> 19:
Failed to match this instruction:
(parallel [
        (set (reg:SI 114)
            (lshiftrt:SI (subreg:SI (reg:DI 109) 0)
                (plus:QI (subreg:QI (ior:SI (subreg:SI (reg:DI 111 [ ull2 ]) 0)
                            (const_int -8651009 [0xffffffffff7bfeff])) 0)
                    (const_int 5 [0x5]))))
        (clobber (reg:CC 17 flags))
    ])

If the subreg:QI there was simplified properly (to just 0xff) the
plus:QI could be simplified as well.

3) The reported issue.  Confirmed.

Comment 6 Uroš Bizjak 2017-07-17 12:23:46 UTC

(In reply to Segher Boessenkool from comment #5)
> There are a few issues here.
> 
> 1) I cannot reproduce it: a trunk x86_64-linux compiler stores to
> memory in that insn 20 (the testcase from comment 3).

Just re-checked with todays SVN. There are two (insn 20) RTXes; one in the function itself, the second one (the problematic one) in the inlined copy in the main function.

Comment 7 Segher Boessenkool 2017-07-19 19:29:13 UTC

Author: segher
Date: Wed Jul 19 19:28:41 2017
New Revision: 250363

URL: https://gcc.gnu.org/viewcvs?rev=250363&root=gcc&view=rev
Log:
simplify-rtx: The truncation of an IOR can have all bits set (PR81423)

... if it is an IOR with a constant with all bits set in the mode
that is truncated to, for example.  Handle that case.


	PR rtl-optimization/81423
	* simplify-rtx.c (simplify_truncation): Handle truncating an IOR
	with a constant that is -1 in the truncated to mode.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/simplify-rtx.c

Comment 8 Segher Boessenkool 2017-07-19 19:31:58 UTC

Author: segher
Date: Wed Jul 19 19:31:26 2017
New Revision: 250365

URL: https://gcc.gnu.org/viewcvs?rev=250365&root=gcc&view=rev
Log:
combine: Fix for PR81423

We here have an AND of a SUBREG of an LSHIFTRT.  If that SUBREG is
paradoxical, the extraction we form is the length of the size of the
inner mode, which includes some bits that should not be in the result.
Just give up in that case.


	PR rtl-optimization/81423
	* combine.c (make_compound_operation_int): Don't try to optimize
	the AND of a SUBREG of an LSHIFTRT if that SUBREG is paradoxical.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c

Comment 9 Segher Boessenkool 2017-08-09 21:06:16 UTC

Author: segher
Date: Wed Aug  9 21:05:45 2017
New Revision: 251004

URL: https://gcc.gnu.org/viewcvs?rev=251004&root=gcc&view=rev
Log:
Testcase for PR81423

gcc/testsuite/
	PR rtl-optimization/81423
	* gcc.c-torture/execute/pr81423.c: New testcase.

Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 10 Segher Boessenkool 2017-08-09 21:53:02 UTC

Author: segher
Date: Wed Aug  9 21:52:30 2017
New Revision: 251011

URL: https://gcc.gnu.org/viewcvs?rev=251011&root=gcc&view=rev
Log:
This time with the file added.

Testcase for PR81423

gcc/testsuite/
	PR rtl-optimization/81423
	* gcc.c-torture/execute/pr81423.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr81423.c

Comment 11 Segher Boessenkool 2017-08-09 22:33:21 UTC

Fixed on trunk.

Comment 12 Aldy Hernandez 2017-09-13 16:45:21 UTC

Author: aldyh
Date: Wed Sep 13 16:44:48 2017
New Revision: 252366

URL: https://gcc.gnu.org/viewcvs?rev=252366&root=gcc&view=rev
Log:
Testcase for PR81423

gcc/testsuite/
	PR rtl-optimization/81423
	* gcc.c-torture/execute/pr81423.c: New testcase.

Modified:
    branches/range-gen2/gcc/testsuite/ChangeLog

Comment 13 Aldy Hernandez 2017-09-13 16:47:11 UTC

Author: aldyh
Date: Wed Sep 13 16:46:39 2017
New Revision: 252374

URL: https://gcc.gnu.org/viewcvs?rev=252374&root=gcc&view=rev
Log:
This time with the file added.

Testcase for PR81423

gcc/testsuite/
	PR rtl-optimization/81423
	* gcc.c-torture/execute/pr81423.c: New testcase.

Added:
    branches/range-gen2/gcc/testsuite/gcc.c-torture/execute/pr81423.c

Comment 14 Jakub Jelinek 2017-10-11 10:45:30 UTC

Assuming this is fixed on the trunk now.

Comment 15 Davin McCall 2017-10-14 07:49:50 UTC

Apologies if I'm wrong and just making noise but doesn't the test case involve a left shift of a value greater than the bit width of the left operand and therefore invoke UB?

  ll = -5597998501375493990LL;

  ll = (5677365550390624949L - ll) - (ull1 > 0);
  //ull3 = (unsigned int)
  //  (2067854353L <<
  //   (((ll + -2129105131L) ^ 10280750144413668236ULL) -
  //    10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2)
  //				    - 12098357307243495419ULL);

  // Above broken down:

  unsigned long long int t1 = ll + -2129105131L;
  unsigned long long int t2 = t1 ^ 10280750144413668236ULL;
  unsigned long long int t3 = t2 - 10280750143997242009ULL;
  // t3 = 9523455470476460042 !!!
  long t4 = 2067854353L << t3;  // UB!!
  unsigned int t5 = (unsigned int)t4;
  
  unsigned long long int t6 = 2873442921854271231ULL | ull2;
  unsigned long long int t7 = t6 - 12098357307243495419ULL;
  ull3 = t5 >> t7;

Fixed test case would be:

--- begin ---
unsigned long long int ll = 0;
unsigned long long int ull1 = 1ULL;
unsigned long long int ull2 = 12008284144813806346ULL;
unsigned long long int ull3;

void
foo ()
{
  ll = -5597998501375493990LL;

  ll = (5677365550390624949L - ll) - (ull1 > 0);
  ull3 = (unsigned int)
    (2067854353L <<
     (((ll + -2129105131L) ^ 10280750144413668236ULL) +
      17089282532945401191ULL)) >> ((2873442921854271231ULL | ull2)
				    - 12098357307243495419ULL);
}

int
main ()
{
  foo ();
  printf ("%llu expected 3998784)\n", ull3);
  printf ("%llx expected 3d0440)\n", ull3);
  return 0;
}
--- end ---

This still demonstrates the bug.

Comment 16 Dmitry Babokin 2017-10-14 08:27:47 UTC

ll = -5597998501375493990LL;

// result is 2595916314 here.
ll = unsigned(5677365550390624949L - ll) - (ull1 > 0);

So:
  // t1 is 466811183
  unsigned long long int t1 = ll + -2129105131L;
  // t2 is 10280750143997242019
  unsigned long long int t2 = t1 ^ 10280750144413668236ULL;
  // t3 is 10
  unsigned long long int t3 = t2 - 10280750143997242009ULL;

So the test case looks correct to me.

UBSAN also doesn't complain.

Comment 17 Jakub Jelinek 2017-10-14 08:34:42 UTC

Author: jakub
Date: Sat Oct 14 08:34:11 2017
New Revision: 253749

URL: https://gcc.gnu.org/viewcvs?rev=253749&root=gcc&view=rev
Log:
	PR rtl-optimization/81423
	* gcc.c-torture/execute/pr81423.c (foo): Add missing cast.  Change L
	suffixes to LL.
	(main): Punt if either long long isn't 64-bit or int isn't 32-bit.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.c-torture/execute/pr81423.c

Comment 18 Jakub Jelinek 2017-10-14 08:39:08 UTC

(In reply to Davin McCall from comment #15)
> Apologies if I'm wrong and just making noise but doesn't the test case
> involve a left shift of a value greater than the bit width of the left
> operand and therefore invoke UB?

The test as committed invokes UB, because the #c3 transcription into C omitted
the cast unsigned(5677365550390624949L - ll) from the original testcase.

Fixed thusly, I've additionally replaced all spots using L suffixes with LL for consistent behavior on 32-bit targets, and add a test that int is 32-bit, for say 16-bit int and 64-bit long long the testcase would have UB again (as well as end up with a different than expected value anyway, as 3998784 doesn't fit into 16-bit int.

Comment 19 Jakub Jelinek 2018-10-26 10:12:04 UTC

GCC 6 branch is being closed

Comment 20 Richard Biener 2019-11-14 10:13:10 UTC

Fixed in GCC 8.