gcc produces incorrect code with -O2 and higher and unsigned int : 13 bit-field >$ g++ -O2 driver.cpp func.cpp ; ./a.out 6930 >$ g++ -O0 driver.cpp func.cpp ; ./a.out 0 >$ cat init.h extern const unsigned long int a; struct struct_t { unsigned int memb : 13; }; extern struct_t b; >$ cat driver.cpp #include <stdio.h> #include "init.h" const unsigned long int a = 10798855141994013410UL; struct_t b; extern void foo (); int main () { foo (); printf("%llu\n", b.memb); return 0; } >$ cat func.cpp #include "init.h" void foo() { b.memb = unsigned(a) >> (7227976781724269559 | a & ~3739384568U) - 7227976781724531672; } >$ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/gcc-dev/bin-trunk/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /gcc-dev/trunk/configure --prefix=/gcc-dev/bin-trunk --disable-bootstrap Thread model: posix gcc version 8.0.0 20170912 (Rev. 252003) (GCC)
Does -fsantize=undefined show anything? I am suspecting you have undefined behavior with respect to the shift.
(In reply to Andrew Pinski from comment #1) > Does -fsantize=undefined show anything? > > I am suspecting you have undefined behavior with respect to the shift. Test doesn't contain undefined behavior, and sanitizer verifies it. (7227976781724269559 | a & ~3739384568U) gives 7227976781724531703, and (7227976781724269559 | a & ~3739384568U) - 7227976781724531672 gives 31, so shift is OK.
On Tue, 12 Sep 2017, vsevolod.livinskij at frtk dot ru wrote: > struct struct_t { > unsigned int memb : 13; > }; > > extern struct_t b; > printf("%llu\n", b.memb); unsigned int : 13 (promoted to int - I think C++ promotes narrower-than-int bit-fields to int, though for C++ the bit-field width is not considered part of the type) is not valid for printf %llu. You need to explicitly cast to unsigned long long.
(In reply to joseph@codesourcery.com from comment #3) > On Tue, 12 Sep 2017, vsevolod.livinskij at frtk dot ru wrote: > > > struct struct_t { > > unsigned int memb : 13; > > }; > > > > extern struct_t b; > > > printf("%llu\n", b.memb); > > unsigned int : 13 (promoted to int - I think C++ promotes > narrower-than-int bit-fields to int, though for C++ the bit-field width is > not considered part of the type) is not valid for printf %llu. You need > to explicitly cast to unsigned long long. printf("%llu\n", (unsigned long long int)b.memb) doesn't eliminate the error
Simplified testcase: unsigned long long int a = 0x95dd3d896f7422e2ULL; struct S { unsigned int m : 13; } b; __attribute__((noinline, noclone)) void foo (void) { b.m = ((unsigned)a) >> (0x644eee9667723bf7LL | a & ~0xdee27af8U) - 0x644eee9667763bd8LL; } int main () { if (__INT_MAX__ != 0x7fffffffULL) return 0; foo (); if (b.m != 0) __builtin_abort (); return 0; } Started with r191928 I believe (optimized dump is identical, r191925 works, r191931 doesn't). I'll have a look tomorrow.
So, the shift count is computed in the end in QImode and we have essentially (unsigned char)(((unsigned char) a) | 0xf7) + 0x28) as the shift count. The only two results of that, whatever a is, are 0x1f or 0x27, and perhaps something in the combiner or simplify-rtx during combine figures out that from those two only 0x1f is valid shift count, but then something decides to emit instead a left shift by -12 and right shift by 19 instead of left shift by 31. Will need to step through in detail tomorrow.
(In reply to Jakub Jelinek from comment #6) > emit instead a left shift by -12 and right shift by 19 instead of left shift > by 31. Instead of right shift by 31 obviously.
On the trunk, I believe the first invalid transformation is when we subst: (insn 14 10 15 2 (parallel [ (set (reg:HI 102) (and:HI (subreg:HI (reg:SI 98) 0) (const_int 8191 [0x1fff]))) (clobber (reg:CC 17 flags)) ]) "pr82192-2.c":7 391 {*andhi_1} (expr_list:REG_DEAD (reg:SI 98) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) from (reg:SI 98) to (lshiftrt:SI (subreg:SI (reg:DI 94 [ a ]) 0) (subreg:QI (reg:SI 97) 0)) we generate: (parallel [ (set (subreg:DI (reg:HI 102) 0) (zero_extract:DI (reg:DI 94 [ a ]) (const_int 13 [0xd]) (zero_extend:SI (subreg:QI (reg:SI 97) 0)))) (clobber (reg:CC 17 flags)) ]) which is not the same thing if (subreg:QI (reg:SI 97) 0) is ever bigger than 32 - 13 - such as for the case when at runtime it is 31. For the original lshiftrt + and, say if (reg:DI 94) is all ones, for shift count 31 we get 1, while for the zero_extract 0x1fff, similarly for shift count 29 we get 7, while for the zero_extract 0x1fff. If we have a guarantee that the shift count is smaller or equal than 32 - 13, then the expressions are equivalent.
So, we call make_compound_extraction_int on (and:HI (subreg:HI (lshiftrt:SI (subreg:SI (reg:DI 94 [ a ]) 0) (subreg:QI (reg:SI 97) 0)) 0) (const_int 8191 [0x1fff])) and that calls make_extraction (SImode, (subreg:SI (reg:DI 94 [ a ]) 0), 0, (subreg:QI (reg:SI 97) 0), 13, 1, 0, false); I'm afraid I have no idea what are the requirements/behavior of ZERO_EXTRACT or make_extraction when POS + LEN is bigger than the bitsize of the operand and therefore no idea where the bug is. If there is a requirement that POS + LEN must be always <= bitsize of the operand, then it is invalid to handle LSHIFTRT with variable shift count through make_extraction, because we actually want to extract fewer than LEN bits for some POS values and the rest are 0. If POS + LEN may be larger than bitsize and the remaining bits are zero, then it is invalid to if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner)) { /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...), consider just the QI as the memory to extract from. The subreg adds or removes high bits; its mode is irrelevant to the meaning of this extraction, since POS and LEN count from the lsb. */ if (MEM_P (SUBREG_REG (inner))) is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } for non-paradoxical subregs and pos_rtx != NULL_RTX and non-constant or constant, but with POS + LEN larger than original GET_MODE (inner). We can (zero_extract:SI (subreg:SI (reg:DI 94 [ a ]) 0) (const_int 13 [0xd]) (zero_extend:SI (subreg:QI (reg:SI 97) 0))) but not (zero_extract:DI (reg:DI 94 [ a ]) (const_int 13 [0xd]) (zero_extend:SI (subreg:QI (reg:SI 97) 0))) because for the pos_rtx larger than 32-13 the former extracts 12 to 1 bits out of the SImode and zero extends that, while for the latter it always extracts 13 bits out of the 64-bit pseudo.
Does the oddity that shifts truncate on x86, but bit operations do not come into play here? ie, (lshiftrt:SI (subreg:SI (reg:DI 94 [ a ]) 0) (subreg:QI (reg:SI 97) 0)) Has a well defined meaning for the hardware (setting aside the RTL issues momentarily). THe shift count will be truncated in the expected way. When we substitute into: (insn 14 10 15 2 (parallel [ (set (reg:HI 102) (and:HI (subreg:HI (reg:SI 98) 0) (const_int 8191 [0x1fff]))) (clobber (reg:CC 17 flags)) ]) "pr82192-2.c":7 391 {*andhi_1} (expr_list:REG_DEAD (reg:SI 98) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) Generating: (parallel [ (set (subreg:DI (reg:HI 102) 0) (zero_extract:DI (reg:DI 94 [ a ]) (const_int 13 [0xd]) (zero_extend:SI (subreg:QI (reg:SI 97) 0)))) (clobber (reg:CC 17 flags)) ]) We no longer have the same behavior because we're not emitting a shift, but instead a bit instruction which have different semantics at the hardware level. So maybe what's needed is just guarding the transformation on SHIFT_COUNT_TRUNCATED? I don't really have time to dig into this today, but wanted to get my initial thoughts recorded.
(In reply to Jeffrey A. Law from comment #10) > Does the oddity that shifts truncate on x86, but bit operations do not come > into play here? x86 doesn't define SHIFT_COUNT_TRUNCATED (though in this testcase there is no out of bound shift count anyway) and I believe the requirement is that if it is defined then all the shift/insv/extv and similar instructions have to have the truncation behavior, not just some, so I think this macro isn't relevant to this PR. The shift count is fine, but shift count + len is or might be too big and the question is what happens with the upper bits.
As far as I know this is undefined; combine should avoid making such out-of-range patterns (unless the existing insns are already like that, it will happily make even bigger garbage then).
So like this (untested) or somewhere else? --- gcc/combine.c.jj 2017-09-14 10:04:56.000000000 +0200 +++ gcc/combine.c 2017-09-14 16:59:28.529783572 +0200 @@ -7444,7 +7444,14 @@ make_extraction (machine_mode mode, rtx if (pos_rtx && CONST_INT_P (pos_rtx)) pos = INTVAL (pos_rtx), pos_rtx = 0; - if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner)) + if (GET_CODE (inner) == SUBREG + && subreg_lowpart_p (inner) + && (paradoxical_subreg_p (inner) + /* If trying or potentionally trying to extract + bits outside of is_mode, don't look through + non-paradoxical SUBREGs. See PR82192. */ + || (pos_rtx == NULL_RTX + && pos + len <= GET_MODE_PRECISION (is_mode)))) { /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...), consider just the QI as the memory to extract from. @@ -7470,7 +7477,12 @@ make_extraction (machine_mode mode, rtx if (new_rtx != 0) return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } - else if (GET_CODE (inner) == TRUNCATE) + else if (GET_CODE (inner) == TRUNCATE + /* If trying or potentionally trying to extract + bits outside of is_mode, don't look through + TRUNCATE. See PR82192. */ + && pos_rtx == NULL_RTX + && pos + len <= GET_MODE_PRECISION (is_mode)) inner = XEXP (inner, 0); inner_mode = GET_MODE (inner); --- gcc/testsuite/gcc.c-torture/execute/pr82192.c.jj 2017-09-14 17:02:54.281234432 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr82192.c 2017-09-14 17:02:39.000000000 +0200 @@ -0,0 +1,22 @@ +/* PR rtl-optimization/82192 */ + +unsigned long long int a = 0x95dd3d896f7422e2ULL; +struct S { unsigned int m : 13; } b; + +__attribute__((noinline, noclone)) void +foo (void) +{ + b.m = ((unsigned) a) >> (0x644eee9667723bf7LL + | a & ~0xdee27af8U) - 0x644eee9667763bd8LL; +} + +int +main () +{ + if (__INT_MAX__ != 0x7fffffffULL) + return 0; + foo (); + if (b.m != 0) + __builtin_abort (); + return 0; +}
I'll check if this patch regresses code quality on any target. Looks good though, thanks!
#c13 passed bootstrap/regtest on x86_64-linux and i686-linux for me. 71876582 49700491 14917420 5101491 is the grand total of combine.c's total_attempts, total_merges, total_extras and total_successes across those 2 bootstraps/regtests without the patch and 71928009 49744171 14941291 5102988 with the patch (both were profiledbootstraps and the combine.c code was slightly different in between). Out of this there is 4 times a: 35 35 4 3 to 31 31 4 2 change in gcc.c-torture/execute/pr82192.c with -m64, 2x 38605 33326 9261 1923 to 38653 33375 9285 1932 change in combine.c with -m32 and 2x 71226 33524 9201 2720 to 71276 33575 9205 2729 in combine.c with -m64. Comparing the sorted dumps which contain word bitsize, filename and these 4 numbers is a little bit hard, because some filenames keep changing (lto related, or e.g. the target-supports.exp created snippets which have pid appended to them), I see some cases of gcc.dg/compat/generate-random.c having different number of lines (due to make -jN check and randomness coming out of that). After some scripting and eyeballing the results, I believe pr82192.c and combine.c are the only spots that have real changes in those numbers, the rest is just whether some effective target tests or similar has been compiled 20 or 30 times depending on how many runtests have been invoked for the particular *.exp file. Also, stripped gcc/*.o files from the latest stage between non-patched and patched differ just in the expected combine.o and cc1*checksum.o.
(In reply to Jakub Jelinek from comment #15) > Also, stripped gcc/*.o files from the latest stage between non-patched and > patched differ just in the expected combine.o and cc1*checksum.o. I see no code changes at at all in my usual "build compiler + Linux for 31 architectures" either, so no unexpected changes either :-) The patch is preapproved.
Author: jakub Date: Fri Sep 15 15:54:04 2017 New Revision: 252824 URL: https://gcc.gnu.org/viewcvs?rev=252824&root=gcc&view=rev Log: PR rtl-optimization/82192 * combine.c (make_extraction): Don't look through non-paradoxical SUBREGs or TRUNCATE if pos + len is or might be bigger than inner's mode. * gcc.c-torture/execute/pr82192.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr82192.c Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c trunk/gcc/testsuite/ChangeLog
Fixed for 8+ so far.
GCC 5 branch is being closed
Author: jakub Date: Fri Oct 27 20:32:21 2017 New Revision: 254177 URL: https://gcc.gnu.org/viewcvs?rev=254177&root=gcc&view=rev Log: Backported from mainline 2017-09-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/82192 * combine.c (make_extraction): Don't look through non-paradoxical SUBREGs or TRUNCATE if pos + len is or might be bigger than inner's mode. * gcc.c-torture/execute/pr82192.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.c-torture/execute/pr82192.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/combine.c branches/gcc-7-branch/gcc/testsuite/ChangeLog
Author: jakub Date: Mon Jun 25 16:45:10 2018 New Revision: 262028 URL: https://gcc.gnu.org/viewcvs?rev=262028&root=gcc&view=rev Log: Backported from mainline 2017-09-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/82192 * combine.c (make_extraction): Don't look through non-paradoxical SUBREGs or TRUNCATE if pos + len is or might be bigger than inner's mode. * gcc.c-torture/execute/pr82192.c: New test. Added: branches/gcc-6-branch/gcc/testsuite/gcc.c-torture/execute/pr82192.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/combine.c branches/gcc-6-branch/gcc/testsuite/ChangeLog
Fixed for 6.5 and 7.4+ too.