Bug 82192 - [6/7 Regression] gcc produces incorrect code with -O2 and bit-field
Summary: [6/7 Regression] gcc produces incorrect code with -O2 and bit-field
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 8.0
: P2 normal
Target Milestone: 6.5
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2017-09-12 15:16 UTC by Vsevolod Livinskii
Modified: 2021-11-01 23:07 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-09-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2017-09-12 15:16:39 UTC
gcc produces incorrect code with -O2 and higher and unsigned int : 13 bit-field

>$ g++ -O2 driver.cpp func.cpp ; ./a.out 
6930

>$ g++ -O0 driver.cpp func.cpp ; ./a.out 
0

>$ cat init.h 
extern const unsigned long int a;

struct struct_t {
    unsigned int memb : 13;
};

extern struct_t b;

>$ cat driver.cpp 
#include <stdio.h>
#include "init.h"

const unsigned long int a = 10798855141994013410UL;

struct_t b;

extern void foo ();

int main () {
    foo ();
    printf("%llu\n", b.memb);
    return 0;
}

>$ cat func.cpp 
#include "init.h"
void foo() {
  b.memb = unsigned(a) >> (7227976781724269559 | a & ~3739384568U) - 7227976781724531672;
}

>$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/gcc-dev/bin-trunk/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /gcc-dev/trunk/configure --prefix=/gcc-dev/bin-trunk --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20170912 (Rev. 252003) (GCC)
Comment 1 Andrew Pinski 2017-09-12 15:19:26 UTC
Does -fsantize=undefined show anything?

I am suspecting you have undefined behavior with respect to the shift.
Comment 2 Vsevolod Livinskii 2017-09-12 15:30:16 UTC
(In reply to Andrew Pinski from comment #1)
> Does -fsantize=undefined show anything?
> 
> I am suspecting you have undefined behavior with respect to the shift.

Test doesn't contain undefined behavior, and sanitizer verifies it.
(7227976781724269559 | a & ~3739384568U) gives 7227976781724531703, and (7227976781724269559 | a & ~3739384568U) - 7227976781724531672 gives 31, 
so shift is OK.
Comment 3 jsm-csl@polyomino.org.uk 2017-09-12 15:54:55 UTC
On Tue, 12 Sep 2017, vsevolod.livinskij at frtk dot ru wrote:

> struct struct_t {
>     unsigned int memb : 13;
> };
> 
> extern struct_t b;

>     printf("%llu\n", b.memb);

unsigned int : 13 (promoted to int - I think C++ promotes 
narrower-than-int bit-fields to int, though for C++ the bit-field width is 
not considered part of the type) is not valid for printf %llu.  You need 
to explicitly cast to unsigned long long.
Comment 4 Vsevolod Livinskii 2017-09-12 16:50:53 UTC
(In reply to joseph@codesourcery.com from comment #3)
> On Tue, 12 Sep 2017, vsevolod.livinskij at frtk dot ru wrote:
> 
> > struct struct_t {
> >     unsigned int memb : 13;
> > };
> > 
> > extern struct_t b;
> 
> >     printf("%llu\n", b.memb);
> 
> unsigned int : 13 (promoted to int - I think C++ promotes 
> narrower-than-int bit-fields to int, though for C++ the bit-field width is 
> not considered part of the type) is not valid for printf %llu.  You need 
> to explicitly cast to unsigned long long.

printf("%llu\n", (unsigned long long int)b.memb)
doesn't eliminate the error
Comment 5 Jakub Jelinek 2017-09-12 17:32:02 UTC
Simplified testcase:
unsigned long long int a = 0x95dd3d896f7422e2ULL;
struct S { unsigned int m : 13; } b;

__attribute__((noinline, noclone)) void
foo (void)
{
  b.m = ((unsigned)a) >> (0x644eee9667723bf7LL | a & ~0xdee27af8U) - 0x644eee9667763bd8LL;
}

int
main ()
{
  if (__INT_MAX__ != 0x7fffffffULL)
    return 0;
  foo ();
  if (b.m != 0)
    __builtin_abort ();
  return 0;
}

Started with r191928 I believe (optimized dump is identical, r191925 works, r191931 doesn't).  I'll have a look tomorrow.
Comment 6 Jakub Jelinek 2017-09-12 18:37:50 UTC
So, the shift count is computed in the end in QImode and we have essentially
(unsigned char)(((unsigned char) a) | 0xf7) + 0x28) as the shift count.
The only two results of that, whatever a is, are 0x1f or 0x27, and perhaps something in the combiner or simplify-rtx during combine figures out that from those two only 0x1f is valid shift count, but then something decides to emit instead a left shift by -12 and right shift by 19 instead of left shift by 31.
Will need to step through in detail tomorrow.
Comment 7 Jakub Jelinek 2017-09-12 18:39:18 UTC
(In reply to Jakub Jelinek from comment #6)
> emit instead a left shift by -12 and right shift by 19 instead of left shift
> by 31.

Instead of right shift by 31 obviously.
Comment 8 Jakub Jelinek 2017-09-13 10:16:36 UTC
On the trunk, I believe the first invalid transformation is when we subst:
(insn 14 10 15 2 (parallel [
            (set (reg:HI 102)
                (and:HI (subreg:HI (reg:SI 98) 0)
                    (const_int 8191 [0x1fff])))
            (clobber (reg:CC 17 flags))
        ]) "pr82192-2.c":7 391 {*andhi_1}
     (expr_list:REG_DEAD (reg:SI 98)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
from (reg:SI 98) to
(lshiftrt:SI (subreg:SI (reg:DI 94 [ a ]) 0)
    (subreg:QI (reg:SI 97) 0))
we generate:
(parallel [
        (set (subreg:DI (reg:HI 102) 0)
            (zero_extract:DI (reg:DI 94 [ a ])
                (const_int 13 [0xd])
                (zero_extend:SI (subreg:QI (reg:SI 97) 0))))
        (clobber (reg:CC 17 flags))
    ])
which is not the same thing if (subreg:QI (reg:SI 97) 0) is ever bigger than 32 - 13 - such as for the case when at runtime it is 31.  For the original lshiftrt + and, say if (reg:DI 94) is all ones, for shift count 31 we get 1, while for the zero_extract 0x1fff, similarly for shift count 29 we get 7, while for the zero_extract 0x1fff.  If we have a guarantee that the shift count is smaller or equal than 32 - 13, then the expressions are equivalent.
Comment 9 Jakub Jelinek 2017-09-13 13:34:12 UTC
So, we call make_compound_extraction_int on
(and:HI (subreg:HI (lshiftrt:SI (subreg:SI (reg:DI 94 [ a ]) 0)
            (subreg:QI (reg:SI 97) 0)) 0)
    (const_int 8191 [0x1fff]))
and that calls make_extraction (SImode, (subreg:SI (reg:DI 94 [ a ]) 0),
0, (subreg:QI (reg:SI 97) 0), 13, 1, 0, false);
I'm afraid I have no idea what are the requirements/behavior of ZERO_EXTRACT or make_extraction when POS + LEN is bigger than the bitsize of the operand and therefore no idea where the bug is.
If there is a requirement that POS + LEN must be always <= bitsize of the operand, then it is invalid to handle LSHIFTRT with variable shift count through make_extraction, because we actually want to extract fewer than LEN bits for some POS values and the rest are 0.  If POS + LEN may be larger than bitsize and the remaining bits are zero, then it is invalid to 
  if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner))
    {
      /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...),
         consider just the QI as the memory to extract from.
         The subreg adds or removes high bits; its mode is
         irrelevant to the meaning of this extraction,
         since POS and LEN count from the lsb.  */
      if (MEM_P (SUBREG_REG (inner)))
        is_mode = GET_MODE (SUBREG_REG (inner));
      inner = SUBREG_REG (inner);
    }
for non-paradoxical subregs and pos_rtx != NULL_RTX and non-constant or constant, but with POS + LEN larger than original GET_MODE (inner).
We can (zero_extract:SI (subreg:SI (reg:DI 94 [ a ]) 0)
          (const_int 13 [0xd])
          (zero_extend:SI (subreg:QI (reg:SI 97) 0)))
but not (zero_extract:DI (reg:DI 94 [ a ])
          (const_int 13 [0xd])
          (zero_extend:SI (subreg:QI (reg:SI 97) 0)))
because for the pos_rtx larger than 32-13 the former extracts 12 to 1 bits out of the SImode and zero extends that, while for the latter it always extracts 13 bits out of the 64-bit pseudo.
Comment 10 Jeffrey A. Law 2017-09-13 14:45:01 UTC
Does the oddity that shifts truncate on x86, but bit operations do not come into play here?

ie, 

(lshiftrt:SI (subreg:SI (reg:DI 94 [ a ]) 0)
    (subreg:QI (reg:SI 97) 0))

Has a well defined meaning for the hardware (setting aside the RTL issues momentarily).  THe shift count will be truncated in the expected way.

When we substitute into:

(insn 14 10 15 2 (parallel [
            (set (reg:HI 102)
                (and:HI (subreg:HI (reg:SI 98) 0)
                    (const_int 8191 [0x1fff])))
            (clobber (reg:CC 17 flags))
        ]) "pr82192-2.c":7 391 {*andhi_1}
     (expr_list:REG_DEAD (reg:SI 98)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

Generating:

(parallel [
        (set (subreg:DI (reg:HI 102) 0)
            (zero_extract:DI (reg:DI 94 [ a ])
                (const_int 13 [0xd])
                (zero_extend:SI (subreg:QI (reg:SI 97) 0))))
        (clobber (reg:CC 17 flags))
    ])

We no longer have the same behavior because we're not emitting a shift, but instead a bit instruction which have different semantics at the hardware level.


So maybe what's needed is just guarding the transformation on SHIFT_COUNT_TRUNCATED?

I don't really have time to dig into this today, but wanted to get my initial thoughts recorded.
Comment 11 Jakub Jelinek 2017-09-14 14:27:25 UTC
(In reply to Jeffrey A. Law from comment #10)
> Does the oddity that shifts truncate on x86, but bit operations do not come
> into play here?

x86 doesn't define SHIFT_COUNT_TRUNCATED (though in this testcase there is no out of bound shift count anyway) and I believe the requirement is that if it is defined then all the shift/insv/extv and similar instructions have to have the truncation behavior, not just some, so I think this macro isn't relevant to this PR.  The shift count is fine, but shift count + len is or might be too big and the question is what happens with the upper bits.
Comment 12 Segher Boessenkool 2017-09-14 16:19:48 UTC
As far as I know this is undefined; combine should avoid making such
out-of-range patterns (unless the existing insns are already like that,
it will happily make even bigger garbage then).
Comment 13 Jakub Jelinek 2017-09-14 16:25:17 UTC
So like this (untested) or somewhere else?

--- gcc/combine.c.jj	2017-09-14 10:04:56.000000000 +0200
+++ gcc/combine.c	2017-09-14 16:59:28.529783572 +0200
@@ -7444,7 +7444,14 @@ make_extraction (machine_mode mode, rtx
   if (pos_rtx && CONST_INT_P (pos_rtx))
     pos = INTVAL (pos_rtx), pos_rtx = 0;
 
-  if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner))
+  if (GET_CODE (inner) == SUBREG
+      && subreg_lowpart_p (inner)
+      && (paradoxical_subreg_p (inner)
+	  /* If trying or potentionally trying to extract
+	     bits outside of is_mode, don't look through
+	     non-paradoxical SUBREGs.  See PR82192.  */
+	  || (pos_rtx == NULL_RTX
+	      && pos + len <= GET_MODE_PRECISION (is_mode))))
     {
       /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...),
 	 consider just the QI as the memory to extract from.
@@ -7470,7 +7477,12 @@ make_extraction (machine_mode mode, rtx
       if (new_rtx != 0)
 	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
     }
-  else if (GET_CODE (inner) == TRUNCATE)
+  else if (GET_CODE (inner) == TRUNCATE
+	   /* If trying or potentionally trying to extract
+	      bits outside of is_mode, don't look through
+	      TRUNCATE.  See PR82192.  */
+	   && pos_rtx == NULL_RTX
+	   && pos + len <= GET_MODE_PRECISION (is_mode))
     inner = XEXP (inner, 0);
 
   inner_mode = GET_MODE (inner);
--- gcc/testsuite/gcc.c-torture/execute/pr82192.c.jj	2017-09-14 17:02:54.281234432 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr82192.c	2017-09-14 17:02:39.000000000 +0200
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/82192 */
+
+unsigned long long int a = 0x95dd3d896f7422e2ULL;
+struct S { unsigned int m : 13; } b;
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+  b.m = ((unsigned) a) >> (0x644eee9667723bf7LL
+			   | a & ~0xdee27af8U) - 0x644eee9667763bd8LL;
+}
+
+int
+main ()
+{
+  if (__INT_MAX__ != 0x7fffffffULL)
+    return 0;
+  foo ();
+  if (b.m != 0)
+    __builtin_abort ();
+  return 0;
+}
Comment 14 Segher Boessenkool 2017-09-14 17:07:40 UTC
I'll check if this patch regresses code quality on any target.  Looks
good though, thanks!
Comment 15 Jakub Jelinek 2017-09-15 07:49:20 UTC
#c13 passed bootstrap/regtest on x86_64-linux and i686-linux for me.
71876582 49700491 14917420 5101491
is the grand total of combine.c's total_attempts, total_merges, total_extras and total_successes across those 2 bootstraps/regtests without the patch and
71928009 49744171 14941291 5102988
with the patch (both were profiledbootstraps and the combine.c code was slightly different in between).  Out of this there is 4 times a:
35 35 4 3
to
31 31 4 2
change in gcc.c-torture/execute/pr82192.c with -m64, 2x
38605 33326 9261 1923
to
38653 33375 9285 1932
change in combine.c with -m32 and 2x
71226 33524 9201 2720
to
71276 33575 9205 2729
in combine.c with -m64.  Comparing the sorted dumps which contain word bitsize, filename and these 4 numbers is a little bit hard, because some filenames keep changing (lto related, or e.g. the target-supports.exp created snippets which have pid appended to them), I see some cases of gcc.dg/compat/generate-random.c
having different number of lines (due to make -jN check and randomness coming out of that).  After some scripting and eyeballing the results, I believe pr82192.c and combine.c are the only spots that have real changes in those numbers, the rest is just whether some effective target tests or similar has been compiled 20 or 30 times depending on how many runtests have been invoked for the particular *.exp file.
Also, stripped gcc/*.o files from the latest stage between non-patched and patched differ just in the expected combine.o and cc1*checksum.o.
Comment 16 Segher Boessenkool 2017-09-15 14:42:00 UTC
(In reply to Jakub Jelinek from comment #15)
> Also, stripped gcc/*.o files from the latest stage between non-patched and
> patched differ just in the expected combine.o and cc1*checksum.o.

I see no code changes at at all in my usual "build compiler + Linux for 31
architectures" either, so no unexpected changes either :-)  The patch is
preapproved.
Comment 17 Jakub Jelinek 2017-09-15 15:54:35 UTC
Author: jakub
Date: Fri Sep 15 15:54:04 2017
New Revision: 252824

URL: https://gcc.gnu.org/viewcvs?rev=252824&root=gcc&view=rev
Log:
	PR rtl-optimization/82192
	* combine.c (make_extraction): Don't look through non-paradoxical
	SUBREGs or TRUNCATE if pos + len is or might be bigger than
	inner's mode.

	* gcc.c-torture/execute/pr82192.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr82192.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog
Comment 18 Jakub Jelinek 2017-09-26 10:20:36 UTC
Fixed for 8+ so far.
Comment 19 Jakub Jelinek 2017-10-10 13:27:05 UTC
GCC 5 branch is being closed
Comment 20 Jakub Jelinek 2017-10-27 20:32:52 UTC
Author: jakub
Date: Fri Oct 27 20:32:21 2017
New Revision: 254177

URL: https://gcc.gnu.org/viewcvs?rev=254177&root=gcc&view=rev
Log:
	Backported from mainline
	2017-09-15  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/82192
	* combine.c (make_extraction): Don't look through non-paradoxical
	SUBREGs or TRUNCATE if pos + len is or might be bigger than
	inner's mode.

	* gcc.c-torture/execute/pr82192.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.c-torture/execute/pr82192.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/combine.c
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 21 Jakub Jelinek 2018-06-25 16:46:21 UTC
Author: jakub
Date: Mon Jun 25 16:45:10 2018
New Revision: 262028

URL: https://gcc.gnu.org/viewcvs?rev=262028&root=gcc&view=rev
Log:
	Backported from mainline
	2017-09-15  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/82192
	* combine.c (make_extraction): Don't look through non-paradoxical
	SUBREGs or TRUNCATE if pos + len is or might be bigger than
	inner's mode.

	* gcc.c-torture/execute/pr82192.c: New test.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.c-torture/execute/pr82192.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/combine.c
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
Comment 22 Jakub Jelinek 2018-06-25 18:06:52 UTC
Fixed for 6.5 and 7.4+ too.