Hi, I have observed a wrong code bug that I judge to be so serious that IMHO one should discourage use of the avr port for 4.x.x until it is resolved. Unfortunately the bug showed up in a deeply embedded code segment of *really* confidential code owned by my employer. I unfortunately cannot pass the test case I am having to the public. I am working hard for reproducing the bug in a non-confidential stripped file. Meanwhile, I have the following diagnosis: ( I, know that this will be difficult, but maybe someone could never the less give me an indication already on this base.) 1.) Background info for those not too familiar with the AVR port: AVR uses the hard reg 28 (also called Y-Register) as frame pointer, when necessary. If a pseudo happens to end up on a stack slot and needs to be loaded into a register of class POINTER_REG (one of the hard regs 26 (reg X), 28 (reg Y) or 30(reg Z)), reload sometimes generates code where the frame pointer 28 (reg Y) itself is used as spill register. GCC, however, still uses the Y register as frame pointer and does not recognize that it no longer points to the stack. 2.) Description of the bug: Excerpt from the debugging dump originating from the reload pass for my testcase (test.c.23.greg): Reload 0: reload_in (HI) = (reg/v/f:HI 65 [ pointer_to_next_C ]) reload_out (HI) = (reg/v/f:HI 65 [ pointer_to_next_C ]) POINTER_REGS, RELOAD_OTHER (opnum = 1) reload_in_reg: (reg/v/f:HI 65 [ pointer_to_next_C ]) reload_out_reg: (reg/v/f:HI 65 [ pointer_to_next_C ]) reload_reg_rtx: (reg:HI 28 r28) The corresponding asm output reads (Problem case plus part of the prologue for illustration. descriptive comments added by me): ;; Section of the prologue allocating space on the stack for a HImode ;; variable. in r28,__SP_L__ in r29,__SP_H__ sbiw r28,2 in __tmp_reg__,__SREG__ cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 ;; Stack pointer and frame pointer r28:r29 are now increased by 2 for ;; making place for a pointer variable. Y+1 and Y+2 point to the ;; lsb and the msb of this pointer variable. ... some complicated code ... ;; Now the variable on the stack slot is needed in a spill register ;; of class "pointer registers" (i.e. r26-r31). ;; GCC chooses to use the Frame pointer r28 as spill register. ;; r28 is overwritten by the value of the variable on the stack ;; (GCC uses special pattern for loading Y from an address Y is ;; pointing to by use of the tmp reg. ldd __tmp_reg__,Y+1 ldd r29,Y+2 mov r28,__tmp_reg__ ; Now the frame pointer no longer points to the stack! ; Use Y for accessing the memory the pointer variable on the stack ; points to. ld r18,Y+ ld r19,Y+ ld r20,Y+ ld r21,Y+ ;; GCC now assumes that it could write Y back to the stack ;; by the following two commands. BUG!!! Also all of the subsequent ;; Accesses to the stack by the frame pointer will fail. std Y+2,r29 std Y+1,r28 The bug exists already in January 05 and in today's head 4.1. It was present in all of the intermediate snapshots I have analyzed so far. I was able to reproduce the problem identically on a cygwin-woe200 host, an i386-linux host and an amd64-linux host. I am working hard to reduce the test case so that it will be possible for me to post it here. Meanwhile, I'd appreciate very much any advice on where to look for the origin. Yours, Björn
Waiting for a testcase, it could be anything really, a target bug or even a reload one.
Hi, here is the promised test case. I unfortunately had to use a number of asm statements in order to reproduce the register contraint combination exposing the bug. The test case compiles only with -morder1 (== best configuration for AVR when using many SImode variables) but ICEs otherwise. Start of testcase: // Testcase to be compiled with avr-gcc -Os -morder1 #include <inttypes.h> typedef struct { uint32_t HLFG; } special_struct; void foo3 (special_struct *a, const special_struct b) { asm ( ";" : "+r" (a->HLFG) : "r" (b.HLFG) ); } uint16_t inline static foo (const uint32_t i, const special_struct f) { uint16_t ui16_resultat; asm volatile ( ";" : "=&r" (ui16_resultat) : "r" (i), "r" (f.HLFG) ); return ui16_resultat; } special_struct inline static foo2 (const special_struct m1, const uint16_t m2) { special_struct resultat; asm volatile ( ";" : "=&r" (resultat.HLFG) : "r" (m1.HLFG), "r" (m2) ); if (m1.HLFG & 0x00020000) { resultat.HLFG = ~resultat.HLFG; } return resultat; } special_struct inline static foo7 (uint16_t t) { special_struct resultat; asm volatile (";" : "=r" (resultat.HLFG):); if (t & 0x020) t = ~(t); if (t & 0x0100) { t = ~t; asm volatile (";" : "+r" (resultat.HLFG):); } asm volatile (";" : "+d" (t):); if (t > 9300) { t = 12743 - ( (int16_t) t); uint16_t ui16_r = 7480; asm volatile ( ";" : "+&r" (resultat.HLFG) : "r" (t), "r" (ui16_r) ); } else if (t > 9211) { uint16_t ui16_r = 1108; asm volatile ( ";" : "+&r" (resultat.HLFG) : "r" (t), "r" (ui16_r) ); } else { uint16_t r = 440; asm volatile ( ";" : "+&d" (resultat.HLFG), "+&r" (r) : "r" (t) ); } return resultat; } typedef struct { uint16_t q; uint16_t a; } lp_t; typedef struct { lp_t l[19]; uint32_t sf; } vmt; vmt m; special_struct sp_C[19]; special_struct problem (uint32_t s) { special_struct *p_C; special_struct ga; ga.HLFG = 0; p_C = &sp_C[0]; uint8_t lz; for (lz=0; lz < 23; lz ++) { special_struct ac; { special_struct C; C = *p_C; p_C ++; uint16_t t; t = m.l [lz].q; ac = foo7 ( foo (s, C) - t); } { uint16_t e; e = m.l [lz].a; foo3 (&ga, foo2 (ac, e)); } } return ga; }
Created attachment 9061 [details] Modified test case. Hi, advancing further, I recognized that all of the asm statements in the test case could be replaced by standard C expression except for a single one. Look at the attached file. It seems that the immediate wrong code issue is triggered by constraint asm volatile ( ";" : "+&d" (resultat.HLFG), "+&r" (r) : "r" (t) ); Reload seems to have difficulties with this "two combined input/output operands" expression. I would consider it to be perfectly acceptable if reload refuses to handle this case (i.e. triggers an ICE). However, what I am observing is that when using above asm statement for generatating stress for reload, reload generates code that alters the frame pointer! This is, IMO, extremely suspicious. I have serious concerns regarding the correctness of the code as soon as frame pointers are used. (There were other reports stating that applications used to work with the 3.4.x series and failed with 4.0.0 as soon as optimizations were switched on.) Yours, Björn
Hi, a couple of weeks after identifying this issue, I have continued working with 4.0.0. I never observed a similar failure in our production code. The issue seems to be strongly linked to the constraint requiring simultaneously two operands with the "+&" modifier. The back-end, itself never generates such a constraint pattern right now. It seems to me that the real problem has to do with register constraint matching. I personally do not have sufficient knowledge so that I would consider to start thinking about daring to eventually touch reload.c and comrades. On the other hand, I still feel uneasy about this bug. For this reason I am thinking about a workaround solely within the back-end. While refraining from requiring that code that contains the mentioned difficult constraits compiles, I presently think that one might already be better of with a situation where it triggers an ICE instead of generating wrong code. IMO, a possible workaround might be to simply disable within the machine description the move patterns that meet the two following two conditions 1.) the move uses a memory reference stored in Y 2.) the move target is the Y register itself My hope is that in absence of this move pattern, reload will no longer attempt to use the frame pointer register Y as spill register. This workaround certainly would be far from ideal, but better than nothing. I'd appreciate comments on this issue. Yours, Björn
I have re-investigated this bug today and observed that it no longer appears in mainline. It is also absent in today's cvs state of the 4.0 branch. Dunno whether the original problem has been fixed or whether something else has changed such that the bug is no longer exposed: Since the register allocator now seems to be smarter on both branches than it used to be, the function exposing the bug no longer needs the frame pointer. Yours, Björn
Hi, I now think that the true origin of this PR is related to the fact that for AVR we need *two* registers for holding the frame pointer: Recently, I have played around with modifying avr-gcc by choosing different registers for the frame pointer and the stack pointer. Namely I have experimented with the calling convention used by the IAR system (use two seperate stacks for the return addresses and data). It turned out that the stack pointer and frame pointer register number during my experiments that was used by gcc was not 28 but 29! Possibly one would have to re-investigate avr_hard_regno_mode_ok(REGNO, MODE) function in avr.c that implements the HARD_REGNO_MODE_OK target macro. IIUC, there have been a couple of problems in the past with similar results: clobbered frame pointer. Maybe the fact that this bug does not show up frequently has to do with that the constraints for the Y register implemented in avr_hard_regno_mode_ok are fairly restrictive right now, so that the register allocator and reload almost never are tempted to use it. Important information would be wether avr_hard_regno_mode_ok is evaluated dynamically for each function. In this case one could probably prevent clobbering of the frame pointer by denying any mode in regs 28 and 29 in case that a frame pointer is needed. This way one possibly could also lift the present restriction that only pointers (Pmode objects) may be held in the y register. So I think that we are right now having a hidden problem in avr-gcc that shows up only in very rare occasions. Most probably it's also present in the 3.4.4 series. Yours, Björn
Created attachment 9662 [details] Patch for avr_hard_regno_mode_ok
Indeed IMHO the problem stems from the patch: +2005-01-22 Roger Sayle <roger@eyesopen.com> + + PR middle-end/19378 + * config/avr/avr.c (avr_hard_regno_mode_ok): Rewrite. + IMO the problem stems from the change - /* Reload can use r28:r29 for reload register and for frame pointer - in one insn. It's wrong. We must disable it. */ - if (mode != Pmode && reload_in_progress && frame_pointer_required_p () - && regno <= REG_Y && (regno + GET_MODE_SIZE (mode)) >= (REG_Y + 1)) + /* Otherwise disallow all regno/mode combinations that span r28:r29. */ + if (regno <= (REG_Y + 1) && (regno + GET_MODE_SIZE (mode)) >= (REG_Y + 1)) return 0; Denis Chertykov's older version using "frame_pointer_required_p ()" has now vanished. A problem during register allocation has been removed by Roger Sayle's patch. However, the old frame pointer bug has been re-introduced. My suggestion is to use the following implementation. (according patch to today's head attached) This patch has passed regression tests on head without new regressions on head on the c-testsuite for compilation and simulator excecution for the atmega128. I also verified on an old snapshot that had exposed PR21990 that this patch resolves PR21990. I'll post this message also on gcc-patches. Yours, Bjoern
Roger, want to have a look at this?
Created attachment 9663 [details] Corrected typo. replaces previous patch.
I just realized that the patch posted here had just posted had a <= where should have been a < for two comparisons. The patch on gcc-patches@gcc.gnu.org is already correct.
(In reply to comment #11) > I just realized that the patch posted here had just posted had a <= where > should have been a < for two comparisons. The patch on gcc-patches@gcc.gnu.org > is already correct. Bjoern, 1. Is this patch already applied? 2. If yes, what versions is it applied to? 3. If no, can you post the link to the patch on gcc-patches? Or can you attach the corrected patch to this bug report? Thanks
Hello Eric, IIRC, the bug never was really resolved. The true place to fix the issue was, IMO, the most dreaded source file of the entire GCC source tree: reload. My now quite old patch tried to fix the immediate problem without touching reload and it used a fairly crude method by denying gcc to make use of Y in many situations. My test case generated code that was very difficult to handle by the register allocator. This code triggered the bug. Later on some change in some of the passes just before the 4.0.0 release removed the immediate problem. With 4.0 and afterwards I no longer succeeded to reproduce the bug: The reason might be, that the original reload bug was fixed. The reason might as well be that the reload bug is still there, but that it's no longer exposed due to some other modification in the compiler. The message is: There stand good chances that this bug is resolved, but I cannot prove it. In any case, I'd like to suggest *NOT* to make use of my old crude patch. IMO it's better to just hope that the problem is fixed. Until now, I never again stepped over this bug, so I have good confidence. HTH, Bjoern
Thanks, Bjoern, for responding in detail. If this bug cannot be reproduced, can we go ahead and close this bug report? Eric
Closing bug as WORKSFORME based on Bjoern's observations in comment #13.