Bug 45291 - avr miscompilations related to frame pointer registers
Summary: avr miscompilations related to frame pointer registers
Status: RESOLVED DUPLICATE of bug 46779
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.1
: P3 normal
Target Milestone: 4.6.2
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2010-08-15 20:19 UTC by otaylor
Modified: 2011-07-08 17:52 UTC (History)
4 users (show)

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Patch removing the frame pointer from general regs (2.65 KB, patch)
2010-08-15 20:27 UTC, otaylor
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description otaylor 2010-08-15 20:19:52 UTC
Opening caveats: This comes from a day of digging into why the Arduino environment wasn't working correctly on my Fedora 13 system with avr-gcc-4.5. I'm not a GCC hacker and not intending to become one, so forgive sloppy terminology and if the right fix is one of the larger things I mention below, someone else is going to have to do it. I didn't turn up any previous work on this other than people on the Arduino forums recommending downgrading to gcc-4.3, but I also didn't actively ask around, so I'm 3/4's expecting a fix for this to already exist somewhere on gcc-patches or on someone's hard drive. If so, at least I learned a bit more about GCC internals. :-)

OK, with that out of the way, the following code is miscompiling on AVR (-O1 or -O2) with gcc-4.5.1:

===
unsigned char func1(void);
void funct2(int l);

void dummy(void) {
  int l = 256 * func1() + func1();
  if (l > 256) {
      return;
  }
  func1();
  funct2(l);
}
===

Producing the assembly for the body of the function:

===
        rcall func1
        rcall func1
        ldi r18,lo8(0)
        mov r28,r18
        mov r29,r19
        add r28,r24
        adc r29,__zero_reg__
        ldi r19,hi8(257)
        cpi r28,lo8(257)
        cpc r29,r19
        brge .L1
        rcall func1
        mov r24,r28
        mov r25,r29
        rcall funct2
===

The obvious place where things go wrong is in the IRA pass. THe insns:

===
(insn 35 10 33 2 foo.c:5 (set (reg:HI 50 [ D.1955 ])
        (const_int 0 [0x0])) 10 {*movhi} (nil))

(insn 33 35 34 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 1)
        (reg:QI 41 [ D.1955 ])) 4 {*movqi} (expr_list:REG_DEAD (reg:QI 41 [ D.1955 ])
        (nil)))

(insn 34 33 13 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 0)
        (const_int 0 [0x0])) 4 {*movqi} (nil))
===

Get rewritten after register assignment (r41 => r17, r50 => r28) to:

===
(insn 35 10 37 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50])
        (const_int 0 [0x0])) 10 {*movhi} (expr_list:REG_EQUAL (const_int 0 [0x0])
        (nil)))

(insn 37 35 33 2 foo.c:5 (set (reg:HI 14 r14)
        (reg:HI 28 r28 [orig:50 D.1955 ] [50])) 10 {*movhi} (nil))

(insn 33 37 39 2 foo.c:5 (set (reg:QI 18 r18)
        (reg:QI 17 r17 [orig:41 D.1955 ] [41])) 4 {*movqi} (nil))

(insn 39 33 38 2 foo.c:5 (set (reg:QI 15 r15 [+1 ])
        (reg:QI 18 r18)) 4 {*movqi} (nil))

(insn 38 39 34 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50])
        (reg:HI 14 r14)) 10 {*movhi} (nil))

(insn 34 38 40 2 foo.c:5 (set (reg:QI 18 r18)
        (const_int 0 [0x0])) 4 {*movqi} (nil))

(insn 40 34 13 2 foo.c:5 (set (reg:HI 28 r28 [orig:50 D.1955 ] [50])
        (reg:HI 18 r18)) 10 {*movhi} (nil))
===

Note the the way that insn 34 is transformed into insn 34 and 40, which
at first glance aren't doing the same thing at all and overwrite the
earlier computation of r29.

There are basically two problems here:

 Problem 1: there are severe restrictions on the use of r28/r29 (which
 is the frame pointer) from checks in avr_hard_regno_mode_ok() and
 in simplify_gen_subreg. These checks prevent subregister access to
 it by splitting it into two pieces from being allowed.

 See:

    http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00834.html

 for some background information. But the register allocator happily
 puts stuff into r28/r29 and reload has to bend over backwards to
 try and sort things out.

 Problem 2: supposedly, according to the description of (set...) in rtl.texi

    [...], if @var{lval} is a @code{subreg} whose machine mode is
    narrower than the mode of the register, the rest of the register can
    be changed in an undefined way.

 So actually the transform from insn 34 to insn 34/40 is correct. It's the
 generation of insn 40 by lower-subreg.c:resolve_shift_zext() that was
 wrong; for dest_zero it generates a (subreg ...) expression when setting
 the low part after setting the high part.

 This may be the only place where there is a problem, but considering
 the enormous amount of simplify_gen_subreg(), and the very limited use
 of (strict-lower-part...) my guess is that there are a bunch of similar
 problems.

 The insn 34 => insn 34/40 transformation comes from code in push_reload() which
 turns a subreg set into a set of the full parent register if that would
 be valid according to the above rule and direct subreg access isn't
 allowed.

 (The slightly smaller of the two gigantic if statements in that function,
 with the comment "Similarly for paradoxical and problematical SUBREGs
 on the output." I think this is the "problematic" part of that, and
 is modelled by the '!HARD_REGNO_MODE_OK (subreg_regno (out), outmode)'
 check within that function.)

So, ideas about fixing:

 1) Remove r28/r29 (the framepointer) from GENERAL_REGS on AVR.
    Then (almost?) nothing will ever be allocated to it, and it should
    be at least very hard to trigger the problem. I'll attach a patch
    implementing this. It seems to work in basic testing; libgcc compiles,
    the test case and the larger code that it comes from now works
    correctly. Code generation may be a bit worse from having only 2/3
    pointer register pairs to work with, but it will at least be correct.

 2) Split apart HImode from Pmode on AVR. The current code in
    avr_hard_regno_mode_ok() seems to be written on the assumption that
    this is the case since it talks about only allowing Pmode.
    I'm not sure how to do this without bloating avr.md a lot.

 3) Try to fix the framepointer situation so that it gets correctly
    accounted for and you can't accidentally write over half of it
    if you use half the register at the wrong time. Sounds like the
    right thing to do. Sounds a project I'd have no idea how to start.

 4) Make the code in resolve_shift_zext() and anywhere else use
    (strict_lower_part ) when it doesn't want the upper part to be
    overwritten. Again, sounds like the right thing to do. I tried to do
    this, was beyond my GCC hacking skills. Also, I suspect that it might
    have gigantic unexpected consequences on generated code
    on various platforms.

 5) Try to split apart 'HARD_REGNO_MODE_OK' into two concepts:

    - Whether it's OK to allocate registers in this mode
    - Whether it's OK to read and write this register as a subreg
      of a larger register.

    The second check could then be used in simplify_subreg_regno()
    and the push_reload() optimization. I don't think GCC really
    needs the extra complexity.

For comparison, With the attached patch the body of the function is the
much more reasonable:

===
        rcall func1
        mov r15,r24
        rcall func1
        mov r17,r15
        ldi r16,lo8(0)
        add r16,r24
        adc r17,__zero_reg__
        ldi r24,hi8(257)
        cpi r16,lo8(257)
        cpc r17,r24
        brge .L1
        rcall func1
        mov r24,r16
        mov r25,r17
        rcall funct2
===
Comment 1 otaylor 2010-08-15 20:27:39 UTC
Created attachment 21484 [details]
Patch removing the frame pointer from general regs

Here's the patch described; I don't have any sort of extensive framework for testing the AVR backend - doing such tests would likely be a good idea before applying this if the general idea is right.

(Patch is git formatted and against 4.5.1 but it should apply to the trunk of svn fine.)
Comment 2 Georg-Johann Lay 2010-11-04 16:04:02 UTC
insn 34 renders insn 33 invalid. This is due to subreg semantics. There are several places in several versions of avr backend that use this technique and will introduce similar bugs that might look different or become visible in different passes. Amongst them are: 

o bytewise shift
o bytewise rotate
o zero-extending
o bytewise insert
o perhaps many more

Without tracking it down, this particular case semms to be generated by a zero-extend.

Moreover, there are several optimization flaws that are connected with this resp. optimization opportunities not implemented yet. As some recrafting will be needed to fix this anyway, I propose to write the backend so that these optimization issues will be kicked out (or will vanish without any effort if patterns are defined appropriately).

As I see Erik CC'ed and I think he has a better overview of bugs/optimization flaws related to this issue, maybe it's best if he picks one bug and marks the others as duplicate of it.
Comment 3 Georg-Johann Lay 2011-04-14 18:50:15 UTC
Author: gjl
Date: Thu Apr 14 18:50:02 2011
New Revision: 172442

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172442
Log:
	PR target/46779
	PR target/45291
	PR target/41894
	* gcc.target/avr/pr46779-1.c: New test case
	* gcc.target/avr/pr46779-2.c: New test case


Added:
    trunk/gcc/testsuite/gcc.target/avr/pr46779-1.c
    trunk/gcc/testsuite/gcc.target/avr/pr46779-2.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 4 Georg-Johann Lay 2011-07-08 17:52:08 UTC
This is same artifact as PR46779: IRA generates subreg of hard-reg that's not handled by reload.

*** This bug has been marked as a duplicate of bug 46779 ***