Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 33970
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Mike Henning <henning.m@insightbb.com>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
test.i Preprocessed testcase. text/plain 2007-11-01 17:28 635 bytes Edit
test.s Assembly output of test case using 4.1.2. text/plain 2007-11-01 17:45 1.61 KB Edit
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 33970 depends on: Show dependency tree
Show dependency graph
Bug 33970 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2007-11-04 23:28 Opened: 2007-11-01 13:44
When using an unsigned char in a loop with limited range the variable is
promoted to a 16 bit integer if the variable is passed to another function
within the loop by value. 


Tested using -Os on ATMEGA162:

#include <avr/io.h>


int sub2(uint8_t);

int main(void) {

static uint8_t  x;
volatile uint8_t y;

while(1)
 {

   for(x=0;x<128; x++)
     {
        y+=sub2(x);
     }
  }

  return 0;
}



int sub2(uint8_t xyz) {

volatile uint8_t abc;

while(xyz < 64)
 {
    xyz++;
    abc+=xyz;
 }
  return abc;
}

------- Comment #1 From Richard Guenther 2007-11-01 14:10 -------
I guess you want -fno-tree-loop-ivcanon.

------- Comment #2 From Andrew Pinski 2007-11-01 17:09 -------
I don't see it being promoted on x86-linux-gnu at the tree level.

------- Comment #3 From Eric Weddington 2007-11-01 17:28 -------
Created an attachment (id=14454) [edit]
Preprocessed testcase.

------- Comment #4 From Eric Weddington 2007-11-01 17:45 -------
Created an attachment (id=14455) [edit]
Assembly output  of test case using 4.1.2.

Maybe I'm wrong, but I don't even see it being promoted on the AVR. See
attached assembly output from 4.1.2 with -dP.

------- Comment #5 From Eric Weddington 2007-11-01 17:47 -------
Mike, can you provide additional information as to where the bug is?

------- Comment #6 From Mike Henning 2007-11-01 21:26 -------
(In reply to comment #5)
> Mike, can you provide additional information as to where the bug is?
> 
This is the assembly output I get:

Note that r14,r15 is being reserved for variable x when only a single reg is
needed. Strange enough if call sub2 with (x+1) then only one reg is used and
the code decreases by 12 bytes.

000000e8 <main>:
  e8:   ef 92           push    r14
  ea:   ff 92           push    r15
  ec:   1f 93           push    r17
  ee:   cf 93           push    r28
  f0:   df 93           push    r29
  f2:   cd b7           in      r28, 0x3d       ; 61
  f4:   de b7           in      r29, 0x3e       ; 62
  f6:   21 97           sbiw    r28, 0x01       ; 1
  f8:   0f b6           in      r0, 0x3f        ; 63
  fa:   f8 94           cli
  fc:   de bf           out     0x3e, r29       ; 62
  fe:   0f be           out     0x3f, r0        ; 63
 100:   cd bf           out     0x3d, r28       ; 61
 102:   ee 24           eor     r14, r14
 104:   ff 24           eor     r15, r15
 106:   19 81           ldd     r17, Y+1        ; 0x01
 108:   8e 2d           mov     r24, r14
 10a:   0e 94 57 00     call    0xae    ; 0xae <sub2>
 10e:   18 0f           add     r17, r24
 110:   19 83           std     Y+1, r17        ; 0x01
 112:   08 94           sec
 114:   e1 1c           adc     r14, r1
 116:   f1 1c           adc     r15, r1
 118:   80 e8           ldi     r24, 0x80       ; 128
 11a:   e8 16           cp      r14, r24
 11c:   f1 04           cpc     r15, r1
 11e:   99 f7           brne    .-26            ; 0x106 <main+0x1e>
 120:   f0 cf           rjmp    .-32            ; 0x102 <main+0x1a>

00000122 <_exit>:
 122:   ff cf           rjmp    .-2             ; 0x122 <_exit>

------- Comment #7 From Eric Weddington 2007-11-04 23:28 -------
With Mike's description in comment #6, confirmed on 4.1.2 and 4.2.2. AVR GCC
4.2.2 is worse than 4.1.2, in that even if sub2 is called with (x+1), the
variable is still 16 bits.

------- Comment #8 From Wouter van Gulik 2007-11-05 22:48 -------
(In reply to comment #7)
> With Mike's description in comment #6, confirmed on 4.1.2 and 4.2.2. AVR GCC
> 4.2.2 is worse than 4.1.2, in that even if sub2 is called with (x+1), the
> variable is still 16 bits.
> 

There is something more going on, this is the assembler output when sub2 is not
in the same file, and calling sub2(x), i.e not x+1:
===================================================================

        push r17
        push r28
        push r29
        in r28,__SP_L__
        in r29,__SP_H__
        sbiw r28,1
        in __tmp_reg__,__SREG__
        cli
        out __SP_H__,r29
        out __SREG__,__tmp_reg__
        out __SP_L__,r28
/* prologue end (size=11) */
.L3:
        sts x.1261,__zero_reg__
        ldi r24,lo8(0)
.L4:
        ldd r17,Y+1
        call sub2
        add r17,r24
        std Y+1,r17
        lds r24,x.1261
        subi r24,lo8(-(1))
        sts x.1261,r24
        sbrs r24,7
        rjmp .L4
        rjmp .L3
===========================================================

Note that x is now living in memory (as one would expect when it's static in a
function). However it's not when looking at the original report. Is this ok?
The original post assembler places x in r14,r15.  Does this still adhere the
static keyword? I don't know, since it's value is always set at the start
what's the point of being static? Unless main is called twice? But then x may
not be in r14/r15!

So it seems to me this is some inline optimation problem.

Note that moving the sub2 function above main also gives other results?!? Is
this standard behaviour?

Also note that when making sub2 static will completely remove it but also
remove x as static, is this allowed?

Tested against avr-gcc 4.1.2

------- Comment #9 From Mike Henning 2007-11-06 12:37 -------
(In reply to comment #8)
> (In reply to comment #7)
> > With Mike's description in comment #6, confirmed on 4.1.2 and 4.2.2. AVR GCC
> > 4.2.2 is worse than 4.1.2, in that even if sub2 is called with (x+1), the
> > variable is still 16 bits.
> > 
> 
> There is something more going on, this is the assembler output when sub2 is not
> in the same file, and calling sub2(x), i.e not x+1:
> ===================================================================

I think you will also find that if x is changed from ststic to auto the same
problem appears.

------- Comment #10 From Wouter van Gulik 2007-11-06 19:34 -------
(In reply to comment #9)
> 
> I think you will also find that if x is changed from ststic to auto the same
> problem appears.
> 

Ok, I tried to find the minimum test case.
And it has nothing todo with static/volatile/inline etc.

==============================================
int sub2(unsigned char); // external function

void foo(void) {
  unsigned char x;
  for(x=0;x<128; x++)
  {
    //sub2(x); //x is becomes a int (16bit)
    sub2(x+1); //x is char (8bit)
  }
}

All is compiled using 4.1.2 and  "avr-gcc -Wall -Os -mmcu=avr5 -S test.c"

The output when compiling with "sub2(x)"
=================================================
/* prologue: frame size=0 */
        push r28
        push r29
/* prologue end (size=2) */
        ldi r28,lo8(0)
        ldi r29,hi8(0)
.L2:
        mov r24,r28 ; << loads as a byte!
        call sub2
        adiw r28,1           ; << increment as a int
        cpi r28,128          ; << compare as a int
        cpc r29,__zero_reg__
        brne .L2
/* epilogue: frame size=0 */
        pop r29
        pop r28
        ret

The output when compiling with "sub2(x+1)"
================================================================
/* prologue: frame size=0 */
        push r17
/* prologue end (size=1) */
        ldi r17,lo8(0)
.L2:
        subi r17,lo8(-(1))
        mov r24,r17
        call sub2
        cpi r17,lo8(-128)
        brne .L2
/* epilogue: frame size=0 */
        pop r17
        ret
===========================================================


From compiling with x as a int I got a sort of hint. When using x+1, x is
actually loaded with 1 (and not zero). 

So i tried:
======================================================
void foo(void) {
  unsigned char x;
  for(x=1;x<129; x++)
    sub2(x);
}


And it gives the assembler loop check as for the "sub(x+1)" variant of course
the loop is now effectivley the same as "sub2(x+1)". But the the incrementing
of x is now done after the call, so it proofs that this is not the problem. I
also tried different loop lengths, thinking 128 is the nasty one, but it did
not help.

======================================================
/* prologue: frame size=0 */
        push r17
/* prologue end (size=1) */
        ldi r17,lo8(1)
.L2:
        mov r24,r17
        call sub2
        subi r17,lo8(-(1))
        cpi r17,lo8(-127)
        brne .L2
/* epilogue: frame size=0 */
        pop r17
        ret


I am out of knowledge now, I do not know how to further debug GCC on this. Just
hoping this will help someone to pinpoint the problem.

------- Comment #11 From Wouter van Gulik 2007-11-06 21:01 -------
I just realised I did not tried hard enough to find the smallest case:

===========================================
volatile unsigned char bar;
void foo(void) {
  unsigned char x;
  for(x=0;x<128; x++) {
    //bar = x+1;
    bar = x;
  }
}

Looks like using the loop counter inside the loop causes the problem. Because
when not using x, but just e.g. calling a function is also gives the smallest
loop.

Looking at the RTL output of test.c.00.expand. it allready is a QI vs HI
there... so it's going wrong some where inside gcc I really have no knowledge
off...

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug