Bug 85805 - [7/8 Regression] Wrong code for 64 bit comparisons on avr-gcc
Summary: [7/8 Regression] Wrong code for 64 bit comparisons on avr-gcc
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 8.1.0
: P2 normal
Target Milestone: 7.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2018-05-16 10:49 UTC by Sandor Zsuga
Modified: 2019-02-19 17:51 UTC (History)
4 users (show)

See Also:
Host:
Target: avr
Build:
Known to work: 6.4.1
Known to fail: 7.1.1, 8.0.1
Last reconfirmed: 2018-05-16 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sandor Zsuga 2018-05-16 10:49:38 UTC
GCC version:

avr-gcc (GCC) 4.8.1

Compiling with either -O1 or -O2 optimizations enabled, tested with some ATMega and XMega targets. Test case:


#include <stdint.h>

uint8_t volatile tmp;

__attribute__((noinline)) void test_64(uint64_t d64)
{
 if ((d64 & 0xFF800000UL) == 0xFF800000UL){
  tmp ++;
 }
}

__attribute__((noinline)) void test_32(uint32_t d32)
{
 if ((d32 & 0xFF800000UL) == 0xFF800000UL){
  tmp ++;
 }
}

int main(void)
{
 test_64(0);
 test_32(0);

 while(1);
}


A cut from the output assembly, showing the critical part (the code generated for test_64, test_32 and main):


00000228 <test_64>:
 228:   08 95           ret

0000022a <test_32>:
 22a:   66 27           eor r22, r22
 22c:   77 27           eor r23, r23
 22e:   80 78           andi    r24, 0x80   ; 128
 230:   61 15           cp  r22, r1
 232:   71 05           cpc r23, r1
 234:   80 48           sbci    r24, 0x80   ; 128
 236:   9f 4f           sbci    r25, 0xFF   ; 255
 238:   09 f0           breq    .+2         ; 0x23c <test_32+0x12>
 23a:   08 95           ret
 23c:   80 91 00 20     lds r24, 0x2000
 240:   8f 5f           subi    r24, 0xFF   ; 255
 242:   80 93 00 20     sts 0x2000, r24
 246:   08 95           ret

00000248 <main>:
 248:   20 e0           ldi r18, 0x00   ; 0
 24a:   30 e0           ldi r19, 0x00   ; 0
 24c:   40 e0           ldi r20, 0x00   ; 0
 24e:   50 e0           ldi r21, 0x00   ; 0
 250:   60 e0           ldi r22, 0x00   ; 0
 252:   70 e0           ldi r23, 0x00   ; 0
 254:   80 e0           ldi r24, 0x00   ; 0
 256:   90 e0           ldi r25, 0x00   ; 0
 258:   0e 94 14 01     call    0x228   ; 0x228 <test_64>
 25c:   60 e0           ldi r22, 0x00   ; 0
 25e:   70 e0           ldi r23, 0x00   ; 0
 260:   cb 01           movw    r24, r22
 262:   0e 94 15 01     call    0x22a   ; 0x22a <test_32>
 266:   ff cf           rjmp    .-2         ; 0x266 <main+0x1e>


It seems like the compiler incorrectly determines that the "if" is always false in the 64 bit case, and produces a corresponding result (a function doing nothing). A native version of GCC produces the expected code (the comparison being performed).
Comment 1 Richard Biener 2018-05-16 11:28:36 UTC
Note GCC 4.8 is long out of maintainance and GCC 4.8.1 is a particularly old version from the branch.  Please try a still supported compiler which would
be GCC 6.4, GCC 7.3 or GCC 8.1.  If you can't do that please try at least
the latest GCC 4.8 based compiler which is GCC 4.8.5.
Comment 2 Sandor Zsuga 2018-05-16 14:32:11 UTC
Tested on a different machine:

avr-gcc (GCC) 4.9.2

This is what comes with Debian Jessie. The behavior is present (function compiles to a single "ret").
Comment 3 Sandor Zsuga 2018-05-16 14:42:56 UTC
I don't have reasonably easy access to a newer version as it doesn't seem like there were precompiled binaries available for Linux which I could try without much hassle.

If someone had one laying around, I would appreciate if he gave a go to the sample with an

    avr-gcc -S -O1 sample.c

where sample.c contained the code above. The resulting assembly file would show whether the particular GCC version was affected.
Comment 4 Sandor Zsuga 2018-05-25 12:21:54 UTC
I tried it with the package offered by Microchip, which has avr-gcc 5.4.0, the behavior is the same, bug is present.
Comment 5 Sandor Zsuga 2018-05-30 09:35:44 UTC
I received a test report with avr-gcc 8.1.0 , -O2 optimization level: The behavior is present ( https://www.avrfreaks.net/comment/2477081#comment-2477081 ).
Comment 6 Georg-Johann Lay 2018-07-16 13:34:53 UTC
Lokks like a bug in insn combiner, hence rtl optimization issue, not a target issue.

Test case:

typedef __UINT64_TYPE__ uint64_t;

char tmp;

void test_64 (uint64_t d64)
{
    if ((d64 & 0xFF800000UL) == 0xFF800000UL)
        tmp++;
}

Compiling with v8.0.1

$ avr-gcc foo.c -Os -c -mmcu=avr5 -save-temps -dap

.combine dump reads:

Trying 30 -> 31:
   30: {cc0=cmp(r18:DI,0xff800000);clobber scratch;}
      REG_DEAD r18:DI
   31: pc={(cc0!=0)?L38:pc}
      REG_BR_PROB 708669604
Successfully matched this instruction:
(set (pc)
    (label_ref:HI 38))
allowing combination of insns 30 and 31

i.e. combiner combines the 64-bit comparison of reg:DI 18 against the constant with the conditional jump on CC0 to an UNCONDITIONAL jump.  Hence anything that is used to set CC0 becomes unused and is thrown away in the remainder...

with -fdisable-rtl-combine the final asm looks correct and reads:

test_64:
	andi r20,lo8(-128)	 ;  16	[c=4 l=1]  andqi3/1
	ldi r18,0		 ;  22	[c=4 l=1]  movqi_insn/0
	ldi r19,0		 ;  23	[c=4 l=1]  movqi_insn/0
	ldi r22,0		 ;  26	[c=4 l=1]  movqi_insn/0
	ldi r23,0		 ;  27	[c=4 l=1]  movqi_insn/0
	ldi r24,0		 ;  28	[c=4 l=1]  movqi_insn/0
	ldi r25,0		 ;  29	[c=4 l=1]  movqi_insn/0
	cp r18,__zero_reg__	 ;  30	[c=4 l=8]  compare_const_di2
	cpc r19,__zero_reg__
	sbci r20,-128
	sbci r21,-1
	cpc r22,__zero_reg__
	cpc r23,__zero_reg__
	cpc r24,__zero_reg__
	cpc r25,__zero_reg__
	brne .L1		 ;  31	[c=16 l=1]  branch
	lds r24,tmp	 ;  33	[c=4 l=2]  movqi_insn/3
	subi r24,lo8(-(1))	 ;  34	[c=4 l=1]  addqi3/1
	sts tmp,r24	 ;  35	[c=4 l=2]  movqi_insn/2
.L1:
	ret		 ;  53	[c=0 l=1]  return

Insns 16..29 perform the AND of the 64-bit value held in r18...r25, insn 30 performs the comparisons against the constant and sets CC0.
Comment 7 Segher Boessenkool 2018-07-19 16:34:36 UTC
I suspect this is because we have hard regs here, not pseudos.  Not a good
idea in general, which is why other targets don't do this.

Perhaps it is a mode mixup in the known value tracking?

Confirmed.
Comment 8 Segher Boessenkool 2018-07-26 10:17:20 UTC
Author: segher
Date: Thu Jul 26 10:16:48 2018
New Revision: 262994

URL: https://gcc.gnu.org/viewcvs?rev=262994&root=gcc&view=rev
Log:
combine: Another hard register problem (PR85805)

The current code in reg_nonzero_bits_for_combine allows using the
reg_stat info when last_set_mode is a different integer mode.  This is
completely wrong for non-pseudos.  For example, as in the PR, a value
in a DImode hard register is set by eight writes to its constituent
QImode parts.  The value written to the DImode is not the same as that
written to the lowest-numbered QImode!


	PR rtl-optimization/85805
	* combine.c (reg_nonzero_bits_for_combine): Only use the last set
	value for hard registers if that was written in the same mode.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
Comment 9 Jakub Jelinek 2018-08-27 13:25:37 UTC
Fixed for 9+ then?
Comment 10 Segher Boessenkool 2018-08-27 15:01:53 UTC
It is fixed for 9 yes, and I am still pondering it for 8.  I guess that's
not going to happen.
Comment 11 Martin Liška 2018-11-20 08:26:38 UTC
Segher: Then let's close it?
Comment 12 Segher Boessenkool 2018-11-20 12:19:32 UTC
Yup, closing.
Comment 13 Segher Boessenkool 2019-02-14 18:46:50 UTC
Author: segher
Date: Thu Feb 14 18:46:18 2019
New Revision: 268888

URL: https://gcc.gnu.org/viewcvs?rev=268888&root=gcc&view=rev
Log:
	Backport from trunk
	2018-07-26  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/85805
	* combine.c (reg_nonzero_bits_for_combine): Only use the last set
	value for hard registers if that was written in the same mode.

Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/combine.c
Comment 14 Segher Boessenkool 2019-02-14 19:57:10 UTC
I backported it anyway.