Bug 51002 - SP_H register is used even on targets that do not have it (eg attiny26)
Summary: SP_H register is used even on targets that do not have it (eg attiny26)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.2
: P5 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on: 51345 52737
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-06 21:05 UTC by Wim Lewis
Modified: 2012-03-28 11:03 UTC (History)
4 users (show)

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-11-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wim Lewis 2011-11-06 21:05:57 UTC
Not all of the AVR parts implement a full 16-bit-wide stack pointer; on (for example) the ATtiny26/ATtiny261, the stack pointer is 8 bits wide and the SP_H register (at 0x3E) is not implemented. GCC 4.6.2 is smart enough to manipulate only the lower 8 bits of the stack pointer in the function prologue/epilogue, but it still attempts to load the upper 8 bits from the nonexistent SP_H register, resulting in the upper 8 bits of any stack-relative pointer containing a garbage value:


extern void bar(char *c);
void foo()
{
    char buf[16];
    bar(buf);
}

$ avr-gcc -Os -fverbose-asm -S -mmcu=attiny26 stkbuf.c  

foo:
        push r28         ; 
        push r29         ; 
        in r28,__SP_L__  ; 
        in r29,__SP_H__  ; 
        subi r28,lo8(-(-16))     ; ,
        out __SP_L__,r28         ; 
/* prologue: function */
/* frame size = 16 */
/* stack size = 18 */
.L__stack_usage = 18
        mov r24,r28      ; ,
        mov r25,r29      ; ,
        adiw r24,1       ; ,
        rcall bar        ; 
[etc]
Comment 1 Georg-Johann Lay 2011-11-26 20:31:49 UTC
Confirmed.

Would "clr r29" be okay instead of "in r29,__SP_H__"?
Comment 2 Wim Lewis 2011-11-26 21:02:55 UTC
I think that would produce correct code, at least.

It seems like it would be more efficient for gcc to treat the stack pointer as mode QI on these parts, and promote it to HI when it needs to use it as a general purpose pointer; there's no reason to allocate r29 since it will always contain 0. But that might not be the minimal bugfix.
Comment 3 Georg-Johann Lay 2011-12-02 19:14:20 UTC
Author: gjl
Date: Fri Dec  2 19:14:15 2011
New Revision: 181936

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181936
Log:
	PR target/51002
	PR target/51345
	* config/avr/libgcc.S (__prologue_saves__, __epilogue_restores__):
	Enclose parts using __SP_H__ in !defined (__AVR_HAVE_8BIT_SP__).
	Add FIXME comments.
	* config/avr/avr.md (movhi_sp_r_irq_off, movhi_sp_r_irq_on): Set
	insn condition to !AVR_HAVE_8BIT_SP.
	* config/avr/avr.c (output_movhi): "clr%B0" instead of "in
	%B0,__SP_H__" if AVR_HAVE_8BIT_SP.
	(avr_file_start): Only print "__SP_H__ = 0x3e" if !AVR_HAVE_8BIT_SP.
	* config/avr/avr-devices.c (avr_mcu_types): ATtiny4313 and
	AT86RF401 have a 16-bit SP (their manual is bogus).


Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/config/avr/avr-devices.c
    branches/gcc-4_6-branch/gcc/config/avr/avr.c
    branches/gcc-4_6-branch/gcc/config/avr/avr.md
    branches/gcc-4_6-branch/gcc/config/avr/libgcc.S
Comment 4 Georg-Johann Lay 2011-12-06 15:04:17 UTC
Author: gjl
Date: Tue Dec  6 15:04:09 2011
New Revision: 182052

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182052
Log:
libgcc/
	Forward-port from gcc-4_6-branch r181936 2011-12-02.

	PR target/51345
	PR target/51002
	* config/avr/lib1funcs.S (__prologue_saves__,
	__epilogue_restores__, __divdi3_moddi3): Enclose parts using
	__SP_H__ in !defined (__AVR_HAVE_8BIT_SP__).  Add FIXME comments.

gcc/
	Forward-port from gcc-4_6-branch r181936 2011-12-02.

	PR target/51002
	* config/avr/avr.md (movhi_sp_r): Set insn condition to
	!AVR_HAVE_8BIT_SP.
	* config/avr/avr.c (output_movhi): Use "clr%B0" instead of "in
	%B0,__SP_H__" if AVR_HAVE_8BIT_SP.
	(avr_file_start): Only print "__SP_H__ = 0x3e" if !AVR_HAVE_8BIT_SP.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/avr/avr.c
    trunk/gcc/config/avr/avr.md
    trunk/libgcc/ChangeLog
    trunk/libgcc/config/avr/lib1funcs.S
Comment 5 Georg-Johann Lay 2012-01-04 15:28:16 UTC
Fixed in 4.6.3+

For a complete fix covering libgcc, however, PR51345 is needed which is fixed in 4.7.0
Comment 6 Claus-Justus Heine 2012-03-18 14:50:15 UTC
Jut a comment: at least I was using -mtiny-stack on targets with 16-bit stack to reduce the size of the prologue/epilogue of a function. This worked quite well, of course only if the actual stack usage really stayed below 256 bytes. With this "fix" this is no longer possible, because now the prologue only loads the lower 8 bits of the stack pointer into the _frame_pointer_.  Then the frame-pointer will refer the wrong address region. As a hack-around I'm using (in my own hacked gcc-4.7 version)

avr_current_device->short_sp instead of AVR_HAVE_8BIT_SP

to decide whether to load SP_H or not into the frame-pointer.

Best,

Claus
Comment 7 Georg-Johann Lay 2012-03-18 20:12:43 UTC
(In reply to comment #6)
> Jut a comment: at least I was using -mtiny-stack on targets with 16-bit stack
> to reduce the size of the prologue/epilogue of a function. This worked quite
> well, of course only if the actual stack usage really stayed below 256 bytes.

I don't think that works reliably, even if the stack usage is below 256 bytes. Crucial point is the crossing of a 256-byte boundary, i.e. changing SP.H, for example while crossing the 0xff/0x100 border.

As there is code in libgcc that has to set the frame pointer, it has to load the high byte of FP, too. libgcc (just as gcc) assumes that SP.H is always zero.

Otherwise, the multilibs had to be split even further, i.e. not only on basis of SP size but also on basis of RAM size.

> With this "fix" this is no longer possible, because now the prologue only
> loads the lower 8 bits of the stack pointer into the _frame_pointer_.

As written above, the high byte of FP must be initialized, too. There is no other meaningful value than 0 for 8-bit FP devices.

> Then the frame-pointer will refer the wrong address region.
> As a hack-around I'm using (in my own hacked gcc-4.7 version)

Maybe the startup-code from AVR-Libc has to be adapted to initialize SP with 0xff instead of with RAMEND, see

http://savannah.nongnu.org/bugs/?35407#comment3

> avr_current_device->short_sp instead of AVR_HAVE_8BIT_SP
> 
> to decide whether to load SP_H or not into the frame-pointer.
Comment 8 Claus-Justus Heine 2012-03-18 20:30:06 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Jut a comment: at least I was using -mtiny-stack on targets with 16-bit stack
> > to reduce the size of the prologue/epilogue of a function. This worked quite
> > well, of course only if the actual stack usage really stayed below 256 bytes.
> 
> I don't think that works reliably, even if the stack usage is below 256 bytes.
> Crucial point is the crossing of a 256-byte boundary, i.e. changing SP.H, for
> example while crossing the 0xff/0x100 border.

It should work reliably if the stack-top is aligned to a 256 bytes, which is the case for many devices, but of course one has to check this individually.

> As there is code in libgcc that has to set the frame pointer, it has to load
> the high byte of FP, too. libgcc (just as gcc) assumes that SP.H is always
> zero.

Yes. Exactly  this is the problem: when artificially restricting the stack-space to 256 bytes, then nevertheless the frame pointer high-byte should be initialized with the high-byte of the stack-pointer.

> Otherwise, the multilibs had to be split even further, i.e. not only on basis
> of SP size but also on basis of RAM size.

No. When -mtiny-stack just restricts the stack usage to 256 bytes, then there is no need for additional multi-lib targets. The ABI is not affected. 

Prerequisites are:

- the application really does not need more than 256 bytes stack space
- stack-top is aligned to 256 bytes

Then: during stack updates no carry will occur from SP_L to SP_H.

I can be wrong but I think in the beginning -mtiny-stack was just meant for this kind of optimization. There is also a related commit, which breaks the distinction between 8-bit hardware and -mtiny-stack-8-bit-restricted-stack-space, this was

http://gcc.gnu.org/viewcvs?view=revision&revision=150801

However, can be that I'm the only fool which uses -mtiny-stack this way. So what. Still I think originally -mtiny-stack was just introduced for this: reduce the prologue size if one knows that stack usage stays below 256 bytes (and the stack-top is aligned properly).

Best,

cH
Comment 9 Georg-Johann Lay 2012-03-18 21:37:43 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > Jut a comment: at least I was using -mtiny-stack on targets with 16-bit
> > > stack to reduce the size of the prologue/epilogue of a function.
> > > This worked quite well, of course only if the actual stack usage
> > > really stayed below 256 bytes.
> > I don't think that works reliably, even if the stack usage is below
> > 256 bytes. Crucial point is the crossing of a 256-byte boundary, i.e.
> > changing SP.H, for example while crossing the 0xff/0x100 border.
> 
> It should work reliably if the stack-top is aligned to a 256 bytes,

No, that's not sufficient. The reason is that gcc does not only change
SP. In some situations the cmopiler has to set up a frame pointer and
build a 16-bit pointer. If the stack pointer in only 8 bits wide, there
is no other sensible value than 0.

SP.H is a reserved register so that libgcc cannot tell if accessing SP.H
is prohibited because SP.H is reserved or if hard SP is actually 16 bits
wide and accessing SP.H would be in order.

See __prologue_saves in ./libgcc/config/avr/lib1funcs.S

> which is the case for many devices, but of course one has to check
> this individually.
>
> > As there is code in libgcc that has to set the frame pointer, it has to load
> > the high byte of FP, too. libgcc (just as gcc) assumes that SP.H is always
> > zero.
> 
> Yes. Exactly  this is the problem: when artificially restricting the
> stack-space to 256 bytes, then nevertheless the frame pointer high-byte should
> be initialized with the high-byte of the stack-pointer.

As written above there is no SP.H as the multilibs do not differenciate
between SP.H=reserved and SP.H=not reserved but never changes
 
> > Otherwise, the multilibs had to be split even further, i.e. not
> > only on basis of SP size but also on basis of RAM size.
> 
> No. When -mtiny-stack just restricts the stack usage to 256 bytes, then there
> is no need for additional multi-lib targets. The ABI is not affected. 
> 
> Prerequisites are:
> 
> - the application really does not need more than 256 bytes stack space
> - stack-top is aligned to 256 bytes
> 
> Then: during stack updates no carry will occur from SP_L to SP_H.

As written above: It's not only about changing SP.H, it's also about initializing the 16-bit FP from an 8-bit SP.

> I can be wrong but I think in the beginning -mtiny-stack was just meant for
> this kind of optimization. There is also a related commit, which breaks the
> distinction between 8-bit hardware and
> -mtiny-stack-8-bit-restricted-stack-space, this was
> 
> http://gcc.gnu.org/viewcvs?view=revision&revision=150801

I think it's never a good idea to lie to the compiler.

Let's see what Dmitry writes concerning crt*.o for the case of fake 8-bit SP.

This should fix the problem and your code work smooth again.
Comment 10 Claus-Justus Heine 2012-03-18 21:56:35 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > (In reply to comment #6)
> > > > Jut a comment: at least I was using -mtiny-stack on targets with 16-bit
> > > > stack to reduce the size of the prologue/epilogue of a function.
> > > > This worked quite well, of course only if the actual stack usage
> > > > really stayed below 256 bytes.
> > > I don't think that works reliably, even if the stack usage is below
> > > 256 bytes. Crucial point is the crossing of a 256-byte boundary, i.e.
> > > changing SP.H, for example while crossing the 0xff/0x100 border.
> > 
> > It should work reliably if the stack-top is aligned to a 256 bytes,
> 
> No, that's not sufficient. The reason is that gcc does not only change
> SP. In some situations the cmopiler has to set up a frame pointer and
> build a 16-bit pointer. If the stack pointer in only 8 bits wide, there
> is no other sensible value than 0.

Sorry, but my comments are just about this: loading the frame pointer with the complete 16bit value of the stack pointer. I will stop the discussion at this point, because I have the impression talk at cross-purposes. I never suggested to tweak the stack pointer, but just realized that recent gcc versions stopped to load the frame pointer with the complete stack pointer value in the presence of -mtiny-stack. I still think that the origin of the "problem" was to "unify" the -mtiny-stack switch with the architectural hardware-constraint of having only an 8 bit stack pointer.
Comment 11 Georg-Johann Lay 2012-03-27 18:08:19 UTC
Rostpone this until 4.7.1 where proper SP_H handling is available as of PR51345 and PR52737.
Comment 12 Georg-Johann Lay 2012-03-28 10:25:27 UTC
Fixed in 4.7.1