Testcase gcc.dg/torture/pta-ptrarith-3.c http://gcc.gnu.org/viewcvs/trunk/gcc/testsuite/gcc.dg/torture/pta-ptrarith-3.c?revision=145490&view=markup produces wrong code for current trunk and avr-gcc 4.6.1 struct X { int *p; int *q; int *r; }; int __attribute__((noinline)) foo (int i, int j, int k, int off) { struct X x; int **p, *q; x.p = &i; x.q = &j; x.r = &k; p = &x.q; p += off; /* *p points to { i, j, k } */ q = *p; return *q; } With -Os -mmcu=atmega88 4.6.1 output is (wrong): foo: push r28 ; 42 *pushqi/1 [length = 1] push r29 ; 43 *pushqi/1 [length = 1] in r28,__SP_L__ ; 44 *movhi_sp/2 [length = 2] in r29,__SP_H__ sbiw r28,12 ; 45 *addhi3/3 [length = 1] in __tmp_reg__,__SREG__ ; 46 *movhi_sp/1 [length = 5] cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 /* prologue: function */ /* frame size = 12 */ /* stack size = 14 */ .L__stack_usage = 14 std Y+8,r25 ; 2 *movhi/3 [length = 2] std Y+7,r24 std Y+10,r23 ; 3 *movhi/3 [length = 2] std Y+9,r22 std Y+12,r21 ; 4 *movhi/3 [length = 2] std Y+11,r20 lsl r18 ; 55 *ashlhi3_const/2 [length = 2] rol r19 add r18,r28 ; 16 *addhi3/1 [length = 2] adc r19,r29 movw r26,r18 ; 41 *movhi/1 [length = 1] adiw r26,3 ; 18 *movhi/2 [length = 4] ld r30,X+ ld r31,X sbiw r26,3+1 ld r24,Z ; 35 *movqi/4 [length = 1] ldd r25,Z+1 ; 36 *movqi/4 [length = 1] /* epilogue start */ adiw r28,12 ; 49 *addhi3/2 [length = 1] in __tmp_reg__,__SREG__ ; 50 *movhi_sp/1 [length = 5] cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 pop r29 ; 51 popqi [length = 1] pop r28 ; 52 popqi [length = 1] ret ; 53 return_from_epilogue [length = 1] With 4.5.2 and same options, the test case runs on exit: foo: push r29 ; 42 *pushhi/1 [length = 2] push r28 in r28,__SP_L__ ; 43 *movhi_sp/2 [length = 2] in r29,__SP_H__ sbiw r28,12 ; 44 *addhi3/3 [length = 1] in __tmp_reg__,__SREG__ ; 45 *movhi_sp/1 [length = 5] cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 /* prologue: function */ /* frame size = 12 */ /* stack size = 14 */ .L__stack_usage = 14 std Y+8,r25 ; 2 *movhi/3 [length = 2] std Y+7,r24 std Y+10,r23 ; 3 *movhi/3 [length = 2] std Y+9,r22 std Y+12,r21 ; 4 *movhi/3 [length = 2] std Y+11,r20 movw r24,r28 ; 38 *movhi/1 [length = 1] adiw r24,7 ; 9 *addhi3/2 [length = 1] std Y+2,r25 ; 10 *movhi/3 [length = 2] std Y+1,r24 movw r24,r28 ; 39 *movhi/1 [length = 1] adiw r24,9 ; 11 *addhi3/2 [length = 1] std Y+4,r25 ; 12 *movhi/3 [length = 2] std Y+3,r24 movw r24,r28 ; 40 *movhi/1 [length = 1] adiw r24,11 ; 13 *addhi3/2 [length = 1] std Y+6,r25 ; 14 *movhi/3 [length = 2] std Y+5,r24 lsl r18 ; 53 *ashlhi3_const/2 [length = 2] rol r19 add r18,r28 ; 16 *addhi3/1 [length = 2] adc r19,r29 movw r26,r18 ; 41 *movhi/1 [length = 1] adiw r26,3 ; 18 *movhi/2 [length = 4] ld r30,X+ ld r31,X sbiw r26,3+1 ld r24,Z ; 35 *movqi/4 [length = 1] ldd r25,Z+1 ; 36 *movqi/4 [length = 1] /* epilogue start */ adiw r28,12 ; 48 *addhi3/2 [length = 1] in __tmp_reg__,__SREG__ ; 49 *movhi_sp/1 [length = 5] cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 pop r28 ; 50 pophi [length = 2] pop r29 ret ; 51 return_from_epilogue [length = 1]
Setting status to NEW, see gcc-testresults: 4.7.0 20110810 (experimental) http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg01258.html 4.6.2 20110805 (prerelease) http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00698.html
Created attachment 25004 [details] asm-no-dse.s Compiler output with -Os -mmcu=atmega8 -fno-dse
Created attachment 25005 [details] asm-dse.s Compiler output with -Os -mmcu=atmega8 (-fdse by default)
This is not a target issue. RTL Dead Store Elimination (DSE), activated by -fdse, deletes insns that are not dead.
Sounds like some of the latent RTL alias issues we have with regarding to find_base_value and friends (see some i?86 bugreport I fail to remember right now).
(In reply to comment #5) > Sounds like some of the latent RTL alias issues we have with regarding to > find_base_value and friends (see some i?86 bugreport I fail to remember right > now). You mean PR49330?
(In reply to comment #6) > (In reply to comment #5) > > Sounds like some of the latent RTL alias issues we have with regarding to > > find_base_value and friends (see some i?86 bugreport I fail to remember right > > now). > > You mean PR49330? Yes.
Already wrong in the .expand dump: ;; MEM[(volatile unsigned int *)&var][arg_1(D)] ={v} v_2; (insn 9 8 10 (set (reg:DI 63) (sign_extend:DI (reg/v:SI 60 [ argD.1604 ]))) t.c:6 -1 (nil)) (insn 10 9 11 (set (reg/f:DI 64) (symbol_ref:DI ("var") <var_decl 0x7f4b8054f140 var>)) t.c:6 -1 (nil)) (insn 11 10 0 (set (mem/s:SI (plus:DI (mult:DI (reg:DI 63) (const_int 4 [0x4])) (reg/f:DI 64)) [2 varD.1603 S4 A32]) (reg/v:SI 59 [ vD.1607 ])) t.c:6 -1 (nil)) It seems to me that the MEM in insn 11 should be mem/s/v.
(In reply to comment #8) > Already wrong in the .expand dump: This comment somehow ended up in the wrong PR. It should be in bug 50078.
GCC 4.6.2 is being released.
Candidate for downgrading again if only avr-* is affected.
I believe this is just because of very weird target avr stuff, either it is a target bug that can be fixed up in the backend somehow, or perhaps would need some middle-end help, but still it is avr specific. Doesn't seem to have anything to do with PR49330. The problem seems to be that the prologue here looks like: (insn/f 43 7 44 2 (set (mem/c:QI (post_dec:HI (reg/f:HI 32 __SP_L__)) [5 S1 A8]) (reg:QI 28 r28)) pr50063.c:9 -1 (nil)) (insn/f 44 43 45 2 (set (mem/c:QI (post_dec:HI (reg/f:HI 32 __SP_L__)) [5 S1 A8]) (reg:QI 29 r29)) pr50063.c:9 -1 (expr_list:REG_DEAD (reg:QI 29 r29) (nil))) (insn 45 44 46 2 (set (reg/f:HI 28 r28) (reg/f:HI 32 __SP_L__)) pr50063.c:9 -1 (nil)) (insn/f 46 45 47 2 (set (reg/f:HI 28 r28) (plus:HI (reg/f:HI 28 r28) (const_int -12 [0xfffffffffffffff4]))) pr50063.c:9 -1 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:HI 28 r28) (plus:HI (reg/f:HI 32 __SP_L__) (const_int -12 [0xfffffffffffffff4]))) (nil))) (insn 47 46 48 2 (set (reg/f:HI 32 __SP_L__) (reg/f:HI 28 r28)) pr50063.c:9 -1 (nil)) where reg 28 is frame pointer and reg 32 is stack pointer. When canon_true_dependence is called by DSE, one of the mem addresses is r28 plus CONST_INT, the other is a VALUE, which is in the end a VALUE of r28 plus some offset. But r28 (frame pointer) and r32 (stack pointer) have the same VALUE in this case, and because of the initialization of sp from fp r32 is actually before r28 in that VALUE's locs list. So, find_base_term for that VALUE returns (address r32), while find_base_term for r28 plus CONST_INT doesn't use cselib values at all and thus returns (address r28) and base_alias_check just fails. The question is why avr does this, and if it really has to do that, it should make sure somehow that for the alias analysis either REG_BASE_VALUE of r28 is the same as REG_BASE_VALUE of r32. E.g. it could define FIND_BASE_TERM and do something for r28/r32 if they are known to be the same. init_alias_analysis ignores prologue and epilogue insns after reload, which is probably why record_set isn't called here on the r32 = r28 set.
(In reply to comment #12) > I believe this is just because of very weird target avr stuff, either it is a > target bug that can be fixed up in the backend somehow, or perhaps would need > some middle-end help, but still it is avr specific. The insns describe exactly what the machine is doing: insn 43/44: Save FP (r28/29) to Stack. This in done in two QI pushes and not in one HI. SP push is post-decrement and with a HI push the resulting address would be wrong. AVR is 8-bit machine with 16-bit PMODE. insn 45: Initialize FP with SP insn 46: Set up a frame (12 bytes here). AVR's SP cannot be changed directly, not even atomically so changing SP is quite expensive and IRQs must be turned off. Therefore, prologue generation works out two methods of setting up frame/changing SP and picks the most efficient: * For big offsets it is: FP = SP; FP -= const; SP = FP * For small offsets SP is adjusted by dummy pushes/pops, for example: SP -= 2 as of: push(dummy); push(dummy); FP = SP Similar applies to epilogue generation. This example exercises the first approach. The 3rd step is (SP = FP): insn 47: Write back SP If the generic analysis ignores prologue/epilogue but optimizers optimize against prologue/epilogue using that incorrect information based on lazy analysis, then the problem is in generic code.
Well, it is certainly desirable not to process the prologue insns during init_alias_analysis. The fact that stack pointer has the same value as frame pointer after the prologue is not usual and something the generic code isn't prepared for (usually either frame pointer is eliminated (of course then the register can be used for something else) or frame pointer is initialized from stack pointer and then decremented (or incremented) still inside of the prologue). So you need to tell the alias analysis about that, as the prologue isn't scanned, it is the backend responsibility to tell that.
(In reply to comment #14) > Well, it is certainly desirable not to process the prologue insns during > init_alias_analysis. The fact that stack pointer has the same value as frame > pointer after the prologue is not usual and something the generic code isn't > prepared for. So bottom line is that GCC is not ready for AVR? > So you need to tell the alias analysis about that, as the prologue isn't > scanned, it is the backend responsibility to tell that. Ya, I skimmed the hooks once again... What hook needs to be implemented/adjusted by AVR backend to ship that information if prologue and elimination information is not enough already...?
Well, you or whomever wants to fix this bug needs to propose some solution. I'm not familiar with avr, so I don't know if avr is doing something like this (starting function with r32 == r28) always, or always for functions that require frame pointer, conditionally only for some CPUs, ... If it is always that way, you might want to consider just in some backend initialization that is run after init_alias_target call change this_target_rtl->x_static_reg_base_value[STACK_POINTER_REGNUM] to this_target_rtl->x_static_reg_base_value[FRAME_POINTER_REGNUM] (or vice versa), so that the RTL alias analysis doesn't disambiguate r28 from r32 based MEM accesses. Or a target hook somewhere in init_alias_analysis, or #define FIND_BASE_TERM(X) avr_find_base_term (X) where that function would return under the conditions where body starts with r32 == r28 would if (REG_P (x) && REGNO (x) == STACK_POINTER_REGNUM) return find_base_term (frame_pointer_rtx); else return NULL_RTX;
FIND_BASE_TERM is actually supposed just an alternate rtx expression, not its find_base_term value, so something like (perhaps with more conditions, when r28 is never equal to r32 in the body or r32 isn't initialized from r28, it doesn't need to be done): if (reload_completed && frame_pointer_needed && REG_P (x) && REGNO (x) == STACK_POINTER_REGNUM) return frame_pointer_rtx; return NULL_RTX;
From what you wrote the internals documentation need to be fixed, i.e. there should be a disclaimer in expand_prologue documentation that SP=FP is an illegal configuration that breaks GCC. Moreover there is: > FIND_BASE_TERM (x): It is always safe for this macro to not be defined. Which is obviously wrong. I don't know enough of alias internals, but I get more and more the impression that implementing FIND_BASE_TERM is just working around problem in generic code and instead of backend hacking around it the generic code should be made robust. At the moment I tend to deactivate malicous pass(es) in the backend until they use robust approach and don't value performance higher than correctness.
FYI, after updating to SVN 184386 (2012-02-20) I see 14 new FAILs in the avr test suite; all of which can be cured with -fno-dse: FAIL: gcc.c-torture/execute/20020215-1.c execution, -O1 FAIL: gcc.c-torture/execute/20020215-1.c execution, -Os FAIL: gcc.c-torture/execute/930126-1.c execution, -O1 FAIL: gcc.c-torture/execute/991118-1.c execution, -O1 FAIL: gcc.c-torture/execute/991118-1.c execution, -Os FAIL: gcc.c-torture/execute/bf64-1.c execution, -O1 FAIL: gcc.c-torture/execute/bf64-1.c execution, -Os FAIL: gcc.c-torture/execute/pr51877.c execution, -O1 FAIL: gcc.c-torture/execute/pr51877.c execution, -O2 FAIL: gcc.c-torture/execute/pr51877.c execution, -O3 -fomit-frame-pointer FAIL: gcc.c-torture/execute/pr51877.c execution, -O3 -g FAIL: gcc.c-torture/execute/pr51877.c execution, -Os plugin -flto-partition=none FAIL: gcc.c-torture/execute/pr51877.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none FAIL: gcc.c-torture/execute/pr51877.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
Please just fix up your backend, e.g. by turning that sp = hfp move (insn 47 above) into an UNSPEC move.
Author: gjl Date: Wed Feb 22 09:25:35 2012 New Revision: 184461 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184461 Log: PR rtl-optimization/50063 * config/avr/avr.md (movhi_sp_r): Handle -1 (unknown IRQ state) and 2 (8-bit SP) in operand 2. * config/avr/avr.c (avr_prologue_setup_frame): Adjust prologue setup to use movhi_sp_r instead of vanilla move to write SP. Adjust REG_CFA notes to superseed unspec. (expand_epilogue): Adjust epilogue setup to use movhi_sp_r instead of vanilla move. As function body might contain CLI or SEI: Use irq_state 0 (IRQ known to be off) only with TARGET_NO_INTERRUPTS. Never use irq_state 1 (IRQ known to be on) here. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/avr.c trunk/gcc/config/avr/avr.md
GCC 4.6.3 is being released.
Bumping milestone to 4.7.0 where the issue is fixed.