Bug 50063 - [4.6/4.7/4.8 Regression] DSE: wrong code for gcc.dg/torture/pta-ptrarith-3.c
Summary: [4.6/4.7/4.8 Regression] DSE: wrong code for gcc.dg/torture/pta-ptrarith-3.c
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.6.1
: P4 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on: 49330
Blocks: 58036
  Show dependency treegraph
 
Reported: 2011-08-12 20:14 UTC by Georg-Johann Lay
Modified: 2024-01-20 23:31 UTC (History)
5 users (show)

See Also:
Host:
Target: avr
Build:
Known to work: 4.5.2, 4.7.0
Known to fail: 4.6.1, 4.6.3
Last reconfirmed: 2011-08-12 00:00:00


Attachments
asm-no-dse.s (2.08 KB, text/plain)
2011-08-13 17:04 UTC, Georg-Johann Lay
Details
asm-dse.s (1.98 KB, text/plain)
2011-08-13 17:05 UTC, Georg-Johann Lay
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg-Johann Lay 2011-08-12 20:14:57 UTC
Testcase gcc.dg/torture/pta-ptrarith-3.c
   http://gcc.gnu.org/viewcvs/trunk/gcc/testsuite/gcc.dg/torture/pta-ptrarith-3.c?revision=145490&view=markup
produces wrong code for current trunk and avr-gcc 4.6.1

struct X 
{
  int *p;
  int *q;
  int *r;
};

int __attribute__((noinline))
foo (int i, int j, int k, int off)
{
  struct X x;
  int **p, *q;
  x.p = &i;
  x.q = &j;
  x.r = &k;
  p = &x.q;
  p += off;
  /* *p points to { i, j, k } */
  q = *p;
  return *q;
}

With -Os -mmcu=atmega88 4.6.1 output is (wrong):

foo:
	push r28	 ;  42	*pushqi/1	[length = 1]
	push r29	 ;  43	*pushqi/1	[length = 1]
	in r28,__SP_L__	 ;  44	*movhi_sp/2	[length = 2]
	in r29,__SP_H__
	sbiw r28,12	 ;  45	*addhi3/3	[length = 1]
	in __tmp_reg__,__SREG__	 ;  46	*movhi_sp/1	[length = 5]
	cli
	out __SP_H__,r29
	out __SREG__,__tmp_reg__
	out __SP_L__,r28
/* prologue: function */
/* frame size = 12 */
/* stack size = 14 */
.L__stack_usage = 14
	std Y+8,r25	 ;  2	*movhi/3	[length = 2]
	std Y+7,r24
	std Y+10,r23	 ;  3	*movhi/3	[length = 2]
	std Y+9,r22
	std Y+12,r21	 ;  4	*movhi/3	[length = 2]
	std Y+11,r20
	lsl r18	 ;  55	*ashlhi3_const/2	[length = 2]
	rol r19
	add r18,r28	 ;  16	*addhi3/1	[length = 2]
	adc r19,r29
	movw r26,r18	 ;  41	*movhi/1	[length = 1]
	adiw r26,3	 ;  18	*movhi/2	[length = 4]
	ld r30,X+
	ld r31,X
	sbiw r26,3+1
	ld r24,Z	 ;  35	*movqi/4	[length = 1]
	ldd r25,Z+1	 ;  36	*movqi/4	[length = 1]
/* epilogue start */
	adiw r28,12	 ;  49	*addhi3/2	[length = 1]
	in __tmp_reg__,__SREG__	 ;  50	*movhi_sp/1	[length = 5]
	cli
	out __SP_H__,r29
	out __SREG__,__tmp_reg__
	out __SP_L__,r28
	pop r29	 ;  51	popqi	[length = 1]
	pop r28	 ;  52	popqi	[length = 1]
	ret	 ;  53	return_from_epilogue	[length = 1]



With 4.5.2 and same options, the test case runs on exit:


foo:
	push r29	 ;  42	*pushhi/1	[length = 2]
	push r28
	in r28,__SP_L__	 ;  43	*movhi_sp/2	[length = 2]
	in r29,__SP_H__
	sbiw r28,12	 ;  44	*addhi3/3	[length = 1]
	in __tmp_reg__,__SREG__	 ;  45	*movhi_sp/1	[length = 5]
	cli
	out __SP_H__,r29
	out __SREG__,__tmp_reg__
	out __SP_L__,r28
/* prologue: function */
/* frame size = 12 */
/* stack size = 14 */
.L__stack_usage = 14
	std Y+8,r25	 ;  2	*movhi/3	[length = 2]
	std Y+7,r24
	std Y+10,r23	 ;  3	*movhi/3	[length = 2]
	std Y+9,r22
	std Y+12,r21	 ;  4	*movhi/3	[length = 2]
	std Y+11,r20
	movw r24,r28	 ;  38	*movhi/1	[length = 1]
	adiw r24,7	 ;  9	*addhi3/2	[length = 1]
	std Y+2,r25	 ;  10	*movhi/3	[length = 2]
	std Y+1,r24
	movw r24,r28	 ;  39	*movhi/1	[length = 1]
	adiw r24,9	 ;  11	*addhi3/2	[length = 1]
	std Y+4,r25	 ;  12	*movhi/3	[length = 2]
	std Y+3,r24
	movw r24,r28	 ;  40	*movhi/1	[length = 1]
	adiw r24,11	 ;  13	*addhi3/2	[length = 1]
	std Y+6,r25	 ;  14	*movhi/3	[length = 2]
	std Y+5,r24
	lsl r18	 ;  53	*ashlhi3_const/2	[length = 2]
	rol r19
	add r18,r28	 ;  16	*addhi3/1	[length = 2]
	adc r19,r29
	movw r26,r18	 ;  41	*movhi/1	[length = 1]
	adiw r26,3	 ;  18	*movhi/2	[length = 4]
	ld r30,X+
	ld r31,X
	sbiw r26,3+1
	ld r24,Z	 ;  35	*movqi/4	[length = 1]
	ldd r25,Z+1	 ;  36	*movqi/4	[length = 1]
/* epilogue start */
	adiw r28,12	 ;  48	*addhi3/2	[length = 1]
	in __tmp_reg__,__SREG__	 ;  49	*movhi_sp/1	[length = 5]
	cli
	out __SP_H__,r29
	out __SREG__,__tmp_reg__
	out __SP_L__,r28
	pop r28	 ;  50	pophi	[length = 2]
	pop r29
	ret	 ;  51	return_from_epilogue	[length = 1]
Comment 1 Georg-Johann Lay 2011-08-12 20:33:32 UTC
Setting status to NEW, see gcc-testresults:

4.7.0 20110810 (experimental)
   http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg01258.html

4.6.2 20110805 (prerelease)
   http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00698.html
Comment 2 Georg-Johann Lay 2011-08-13 17:04:10 UTC
Created attachment 25004 [details]
asm-no-dse.s

Compiler output with -Os -mmcu=atmega8 -fno-dse
Comment 3 Georg-Johann Lay 2011-08-13 17:05:39 UTC
Created attachment 25005 [details]
asm-dse.s

Compiler output with -Os -mmcu=atmega8 (-fdse by default)
Comment 4 Georg-Johann Lay 2011-08-13 17:09:49 UTC
This is not a target issue.

RTL Dead Store Elimination (DSE), activated by -fdse, deletes insns that are not dead.
Comment 5 Richard Biener 2011-08-14 09:52:28 UTC
Sounds like some of the latent RTL alias issues we have with regarding to
find_base_value and friends (see some i?86 bugreport I fail to remember right now).
Comment 6 Georg-Johann Lay 2011-08-15 08:07:29 UTC
(In reply to comment #5)
> Sounds like some of the latent RTL alias issues we have with regarding to
> find_base_value and friends (see some i?86 bugreport I fail to remember right
> now).

You mean PR49330?
Comment 7 Richard Biener 2011-08-15 11:05:08 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Sounds like some of the latent RTL alias issues we have with regarding to
> > find_base_value and friends (see some i?86 bugreport I fail to remember right
> > now).
> 
> You mean PR49330?

Yes.
Comment 8 Steven Bosscher 2011-09-11 15:25:10 UTC
Already wrong in the .expand dump:
;; MEM[(volatile unsigned int *)&var][arg_1(D)] ={v} v_2;

(insn 9 8 10 (set (reg:DI 63)
        (sign_extend:DI (reg/v:SI 60 [ argD.1604 ]))) t.c:6 -1
     (nil))

(insn 10 9 11 (set (reg/f:DI 64)
        (symbol_ref:DI ("var") <var_decl 0x7f4b8054f140 var>)) t.c:6 -1
     (nil))

(insn 11 10 0 (set (mem/s:SI (plus:DI (mult:DI (reg:DI 63)
                    (const_int 4 [0x4]))
                (reg/f:DI 64)) [2 varD.1603 S4 A32])
        (reg/v:SI 59 [ vD.1607 ])) t.c:6 -1
     (nil))

It seems to me that the MEM in insn 11 should be mem/s/v.
Comment 9 Steven Bosscher 2011-09-11 15:46:22 UTC
(In reply to comment #8)
> Already wrong in the .expand dump:

This comment somehow ended up in the wrong PR. It should be in bug 50078.
Comment 10 Jakub Jelinek 2011-10-26 17:13:44 UTC
GCC 4.6.2 is being released.
Comment 11 Richard Biener 2011-10-27 09:59:55 UTC
Candidate for downgrading again if only avr-* is affected.
Comment 12 Jakub Jelinek 2011-12-19 18:34:16 UTC
I believe this is just because of very weird target avr stuff, either it is a target bug that can be fixed up in the backend somehow, or perhaps would need some middle-end help, but still it is avr specific.
Doesn't seem to have anything to do with PR49330.

The problem seems to be that the prologue here looks like:
(insn/f 43 7 44 2 (set (mem/c:QI (post_dec:HI (reg/f:HI 32 __SP_L__)) [5 S1 A8])
        (reg:QI 28 r28)) pr50063.c:9 -1
     (nil))
(insn/f 44 43 45 2 (set (mem/c:QI (post_dec:HI (reg/f:HI 32 __SP_L__)) [5 S1 A8])
        (reg:QI 29 r29)) pr50063.c:9 -1
     (expr_list:REG_DEAD (reg:QI 29 r29)
        (nil)))
(insn 45 44 46 2 (set (reg/f:HI 28 r28)
        (reg/f:HI 32 __SP_L__)) pr50063.c:9 -1
     (nil))
(insn/f 46 45 47 2 (set (reg/f:HI 28 r28)
        (plus:HI (reg/f:HI 28 r28)
            (const_int -12 [0xfffffffffffffff4]))) pr50063.c:9 -1
     (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:HI 28 r28)
            (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int -12 [0xfffffffffffffff4])))
        (nil)))
(insn 47 46 48 2 (set (reg/f:HI 32 __SP_L__)
        (reg/f:HI 28 r28)) pr50063.c:9 -1
     (nil))

where reg 28 is frame pointer and reg 32 is stack pointer.  When canon_true_dependence is called by DSE, one of the mem addresses is r28 plus CONST_INT, the other is a VALUE, which is in the end a VALUE of r28 plus some offset.  But r28 (frame pointer) and r32 (stack pointer) have the same VALUE in this case, and because of the initialization of sp from fp r32 is actually before r28 in that VALUE's locs list.  So, find_base_term for that VALUE returns (address r32), while find_base_term for r28 plus CONST_INT doesn't use cselib values at all and thus returns (address r28) and base_alias_check just fails.  The question is why avr does this, and if it really has to do that, it should make sure somehow that for the alias analysis either REG_BASE_VALUE of r28 is the same as REG_BASE_VALUE of r32.  E.g. it could define FIND_BASE_TERM and do something for r28/r32 if they are known to be the same.
init_alias_analysis ignores prologue and epilogue insns after reload, which is probably why record_set isn't called here on the r32 = r28 set.
Comment 13 Georg-Johann Lay 2011-12-19 18:56:49 UTC
(In reply to comment #12)
> I believe this is just because of very weird target avr stuff, either it is a
> target bug that can be fixed up in the backend somehow, or perhaps would need
> some middle-end help, but still it is avr specific.

The insns describe exactly what the machine is doing:

insn 43/44: Save FP (r28/29) to Stack. This in done in two QI pushes and not in one HI. SP push is post-decrement and with a HI push the resulting address would be wrong. AVR is 8-bit machine with 16-bit PMODE.

insn 45: Initialize FP with SP

insn 46: Set up a frame (12 bytes here). AVR's SP cannot be changed directly, not even atomically so changing SP is quite expensive and IRQs must be turned off. Therefore, prologue generation works out two methods of setting up frame/changing SP and picks the most efficient:

* For big offsets it is:
    FP = SP;
    FP -= const;
    SP = FP

* For small offsets SP is adjusted by dummy pushes/pops, for example:
    SP -= 2 as of: push(dummy); push(dummy);
    FP = SP

Similar applies to epilogue generation.

This example exercises the first approach. The 3rd step is (SP = FP):

insn 47: Write back SP

If the generic analysis ignores prologue/epilogue but optimizers optimize against prologue/epilogue using that incorrect information based on lazy analysis, then the problem is in generic code.
Comment 14 Jakub Jelinek 2011-12-19 19:12:56 UTC
Well, it is certainly desirable not to process the prologue insns during init_alias_analysis.  The fact that stack pointer has the same value as frame pointer after the prologue is not usual and something the generic code isn't prepared for (usually either frame pointer is eliminated (of course then the register can be used for something else) or frame pointer is initialized from stack pointer and then decremented (or incremented) still inside of the prologue).
So you need to tell the alias analysis about that, as the prologue isn't scanned, it is the backend responsibility to tell that.
Comment 15 Georg-Johann Lay 2011-12-19 23:24:13 UTC
(In reply to comment #14)
> Well, it is certainly desirable not to process the prologue insns during
> init_alias_analysis.  The fact that stack pointer has the same value as frame
> pointer after the prologue is not usual and something the generic code isn't
> prepared for.

So bottom line is that GCC is not ready for AVR?

> So you need to tell the alias analysis about that, as the prologue isn't
> scanned, it is the backend responsibility to tell that.

Ya, I skimmed the hooks once again... What hook needs to be implemented/adjusted by AVR backend to ship that information if prologue and elimination information is not enough already...?
Comment 16 Jakub Jelinek 2011-12-20 00:22:19 UTC
Well, you or whomever wants to fix this bug needs to propose some solution.
I'm not familiar with avr, so I don't know if avr is doing something like this
(starting function with r32 == r28) always, or always for functions that require frame pointer, conditionally only for some CPUs, ...
If it is always that way, you might want to consider just in some backend initialization that is run after init_alias_target call change
this_target_rtl->x_static_reg_base_value[STACK_POINTER_REGNUM]
to this_target_rtl->x_static_reg_base_value[FRAME_POINTER_REGNUM] (or vice versa), so that the RTL alias analysis doesn't disambiguate r28 from r32 based MEM accesses.  Or a target hook somewhere in init_alias_analysis, or
#define FIND_BASE_TERM(X) avr_find_base_term (X)
where that function would return under the conditions where body starts with r32 == r28 would if (REG_P (x) && REGNO (x) == STACK_POINTER_REGNUM) return find_base_term (frame_pointer_rtx); else return NULL_RTX;
Comment 17 Jakub Jelinek 2011-12-20 06:54:00 UTC
FIND_BASE_TERM is actually supposed just an alternate rtx expression, not its find_base_term value, so something like (perhaps with more conditions, when r28 is never equal to r32 in the body or r32 isn't initialized from r28, it doesn't need to be done):
if (reload_completed && frame_pointer_needed && REG_P (x) && REGNO (x) == STACK_POINTER_REGNUM) return frame_pointer_rtx;
return NULL_RTX;
Comment 18 Georg-Johann Lay 2011-12-22 00:03:08 UTC
From what you wrote the internals documentation need to be fixed, i.e. there should be a disclaimer in expand_prologue documentation that SP=FP is an illegal configuration that breaks GCC.

Moreover there is:

> FIND_BASE_TERM (x): It is always safe for this macro to not be defined.

Which is obviously wrong.

I don't know enough of alias internals,  but I get more and more the impression that implementing FIND_BASE_TERM is just working around problem in generic code and instead of backend hacking around it the generic code should be made robust.

At the moment I tend to deactivate malicous pass(es) in the backend until they use robust approach and don't value performance higher than correctness.
Comment 19 Georg-Johann Lay 2012-02-20 13:41:39 UTC
FYI, after updating to SVN 184386 (2012-02-20) I see 14 new FAILs in the avr test suite; all of which can be cured with -fno-dse:

FAIL: gcc.c-torture/execute/20020215-1.c execution,  -O1 
FAIL: gcc.c-torture/execute/20020215-1.c execution,  -Os 

FAIL: gcc.c-torture/execute/930126-1.c execution,  -O1 

FAIL: gcc.c-torture/execute/991118-1.c execution,  -O1 
FAIL: gcc.c-torture/execute/991118-1.c execution,  -Os 

FAIL: gcc.c-torture/execute/bf64-1.c execution,  -O1 
FAIL: gcc.c-torture/execute/bf64-1.c execution,  -Os 

FAIL: gcc.c-torture/execute/pr51877.c execution,  -O1 
FAIL: gcc.c-torture/execute/pr51877.c execution,  -O2 
FAIL: gcc.c-torture/execute/pr51877.c execution,  -O3 -fomit-frame-pointer 
FAIL: gcc.c-torture/execute/pr51877.c execution,  -O3 -g 
FAIL: gcc.c-torture/execute/pr51877.c execution,  -Os 
plugin -flto-partition=none 
FAIL: gcc.c-torture/execute/pr51877.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
FAIL: gcc.c-torture/execute/pr51877.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
Comment 20 Jakub Jelinek 2012-02-20 13:52:50 UTC
Please just fix up your backend, e.g. by turning that sp = hfp move (insn 47 above) into an UNSPEC move.
Comment 21 Georg-Johann Lay 2012-02-22 09:25:48 UTC
Author: gjl
Date: Wed Feb 22 09:25:35 2012
New Revision: 184461

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184461
Log:
	PR rtl-optimization/50063
	* config/avr/avr.md (movhi_sp_r): Handle -1 (unknown IRQ state)
	and 2 (8-bit SP) in operand 2.
	* config/avr/avr.c (avr_prologue_setup_frame): Adjust prologue
	setup to use movhi_sp_r instead of vanilla move to write SP.
	Adjust REG_CFA notes to superseed unspec.
	(expand_epilogue): Adjust epilogue setup to use movhi_sp_r instead
	of vanilla move.
	As function body might contain CLI or SEI: Use irq_state 0 (IRQ
	known to be off) only with TARGET_NO_INTERRUPTS. Never use
	irq_state 1 (IRQ known to be on) here.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/avr/avr.c
    trunk/gcc/config/avr/avr.md
Comment 22 Jakub Jelinek 2012-03-01 14:38:50 UTC
GCC 4.6.3 is being released.
Comment 23 Georg-Johann Lay 2012-03-22 16:10:43 UTC
Bumping milestone to 4.7.0 where the issue is fixed.