Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 29524
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Francesco Sacchi <batt@develer.com>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 29524 depends on: Show dependency tree
Show dependency graph
Bug 29524 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2007-07-23 22:58 Opened: 2006-10-20 12:26
I noticed that the amount of RAM used, compared to the code generated by gcc
4.1.1, is increased by 256 bytes and found that this it's due to the __clz_tab
array linked in at RAM start.

------- Comment #1 From Richard Guenther 2006-10-20 14:39 -------
Confirmed.

------- Comment #2 From Andrew Pinski 2006-10-20 15:58 -------
First off this should not matter as it should not be linked in as it is not
used at all.  Do you have a testcase which it links it in?

------- Comment #3 From Andrew Pinski 2006-10-20 16:19 -------
In fact this works correctly on the SPU target so I think avr is broken or you
are really using __builtin_clz.

------- Comment #4 From Francesco Sacchi 2006-10-23 19:09 -------
I have 3 projects involving gcc and avr, and all of these have an increased RAM
usage due to __clz_tab linking after switching from gcc 4.1.1 to 4.2.
I will try as soon as possible to find a suitable testcase.

------- Comment #5 From Francesco Sacchi 2006-11-07 11:10 -------
Here's the testcase:

======================================
int main(int argc, char *argv[])
{
       float O1 = argc;
       float V1 = argc+2;

       int ret = O1 * V1;
       return ret;
}
======================================

batt@murphy:~/src$ /usr/local/avr-3.4.4-install/bin/avr-gcc -Os
-Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map

batt@murphy:~/src$ /usr/local/avr-4.1.1-batt/bin/avr-gcc -Os
-Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map

batt@murphy:~/src$ /usr/local/avr-4.2-20061014-install/bin/avr-gcc -Os
-Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map
                0x00000434                __clzsi2
                0x00800068                __clz_tab


These are the three compilers:

batt@murphy:~/src$ /usr/local/avr-3.4.4-install/bin/avr-gcc -v
Reading specs from /usr/local/avr-3.4.4-install/bin/../lib/gcc/avr/3.4.4/specs
Configured with: ../configure --prefix=/usr/local/avr-3.4.4-install/
--target=avr --enable-languages=c,c++ --enable-multilib --with-dwarf2
--disable-libmudflap --enable-target-optspace --enable-threads=single
--with-gnu-ld --enable-install-libbfd --disable-werror --disable-gdbtk
--disable-libmudflap --disable-nls --disable-__cxa_atexit --disable-clocale
--disable-c-mbchar --disable-long-long --without-newlib
Thread model: single
gcc version 3.4.4

batt@murphy:~/src$ /usr/local/avr-4.1.1-batt/bin/avr-gcc -v
Using built-in specs.
Target: avr
Configured with: ../configure --prefix=/usr/local/avr --target=avr
--enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2 :
(reconfigured) ../configure --prefix=/usr/local/avr --target=avr --disable-nls
--disable-libssp --with-dwarf2 --enable-languages=c,c++ --no-create
--no-recursion
Thread model: single
gcc version 4.1.2 20061030 (prerelease)-batt-avr1281

batt@murphy:~/src$ /usr/local/avr-4.2-20061014-install/bin/avr-gcc -v
Using built-in specs.
Target: avr
Configured with: ../configure --prefix=/usr/local/avr --target=avr
--enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.2.0 20061014 (experimental)


As you can see, the clz_tab is linked in only with the SVN version of 4.2, and
it's not linked in with 4.1.1. I also used -Os (which is the compiler switch I
care about).

------- Comment #6 From Andrew Pinski 2006-12-03 21:41 -------
Someone else is going to have to look into this as this works just fine on
spu-elf.

------- Comment #7 From Giovanni Bajo 2007-04-02 22:47 -------
Anatoly, can you have a look? It's a regression in 4.2 for AVR!

------- Comment #8 From Mark Mitchell 2007-05-14 22:28 -------
Will not be fixed in 4.2.0; retargeting at 4.2.1.

------- Comment #9 From Eric Weddington 2007-07-23 22:57 -------
Here's what I see:

The array __clz_tab is used in a macro, count_leading_zeros, which is called in
the function __clzSI2 in libgcc2.c, which (AFAICT) gets compiled to the
function __clzsi2 and aggregated in libgcc. The __clzsi2 function is called
from the function clzusi() (in fp-bit.c) which is also included in libgcc. The
clzusi() function is called from si_to_float() and usi_to_float() (also in
fp-bit.c and included in libgcc). AFAICT, these two functions are used to
convert an int or unsigned int to float. 

The test case does exactly this type of conversion in main() in comment #5.
Testing shows that with gcc 4.2.1, and all int-to-float conversions removed,
that __clz_tab is correctly not linked into the application.

The clzusi() function was created in revision 107345, on Nov 22, 2005:
http://gcc.gnu.org/viewcvs?view=rev&revision=107345

This seems like it was an intended change. However, it is unfortunate that a
256-byte array is used in the count_leading_zeros macro. While using a table is
fast and the size is neglible on larger platforms, using up 256 bytes is very
significant on the AVR where 4K, 2K or even 1K of RAM is common. What is really
needed is an alternative implementation (non-array) that is perhaps specific to
the AVR.

------- Comment #10 From Andrew Patrikalakis 2007-09-08 21:44 -------
(In reply to comment #9)
> Here's what I see:
> 
> The array __clz_tab is used in a macro, count_leading_zeros, which is called in
> the function __clzSI2 in libgcc2.c, which (AFAICT) gets compiled to the
> function __clzsi2 and aggregated in libgcc. The __clzsi2 function is called
> from the function clzusi() (in fp-bit.c) which is also included in libgcc. The
> clzusi() function is called from si_to_float() and usi_to_float() (also in
> fp-bit.c and included in libgcc). AFAICT, these two functions are used to
> convert an int or unsigned int to float. 
> 
> The test case does exactly this type of conversion in main() in comment #5.
> Testing shows that with gcc 4.2.1, and all int-to-float conversions removed,
> that __clz_tab is correctly not linked into the application.
> 
> The clzusi() function was created in revision 107345, on Nov 22, 2005:
> http://gcc.gnu.org/viewcvs?view=rev&revision=107345
> 
> This seems like it was an intended change. However, it is unfortunate that a
> 256-byte array is used in the count_leading_zeros macro. While using a table is
> fast and the size is neglible on larger platforms, using up 256 bytes is very
> significant on the AVR where 4K, 2K or even 1K of RAM is common. What is really
> needed is an alternative implementation (non-array) that is perhaps specific to
> the AVR.

Here's an untested (I'm going to try to figure out how to get it to build into
the AVR build) function that replaces the definition of clz_tab with a 6
instruction bit of code:

; r2 in, r3 out
; r2 clobbered
; Z, C, N. V clobbered
clz_compute:
        ldi r3, 0x09           ; preload output
        clc                    ; clear C (guarentees termination with 8 loops)
clz_compute_loop1:
        rol r2                 ; push MSB into C
        dec r3                 ; dec output
        brcs clz_end           ; if C is set (msb was set), we're done
        rjmp clz_compute_loop1 ; otherwise, repeat
clz_end:

------- Comment #11 From Andrew Patrikalakis 2007-09-08 21:48 -------
(In reply to comment #10)
> Here's an untested (I'm going to try to figure out how to get it to build into
> the AVR build) function that replaces the definition of clz_tab with a 6
> instruction bit of code:
> 
> ; r2 in, r3 out
> ; r2 clobbered
> ; Z, C, N. V clobbered
> clz_compute:
>         ldi r3, 0x09           ; preload output
>         clc                    ; clear C (guarentees termination with 8 loops)
> clz_compute_loop1:
>         rol r2                 ; push MSB into C
>         dec r3                 ; dec output
>         brcs clz_end           ; if C is set (msb was set), we're done
>         rjmp clz_compute_loop1 ; otherwise, repeat
> clz_end:
> 

And the first bug of the day, clc should be sec. brcs will only jump out if C
is set. On to prodding gcc...

------- Comment #12 From Mark Mitchell 2007-10-09 19:22 -------
Change target milestone to 4.2.3, as 4.2.2 has been released.

------- Comment #13 From Wouter van Gulik 2007-10-24 11:16 -------
(In reply to comment #10)

Something like this is smaller, faster and works for all registers (no need for
LD_regs). And could easily be writtin in to a insn:

; rOut: output register
; rIn:  input register
; rIn, Z, N are clobbered, C is set
clzqi_init:
    clr rOut           ; clear to zero
    neg rOut           ; make -1, and set C (C used for garanteed termination)
clzqi_loop1:
    inc rOut           ; inc output (C not touched)
    rol rIn            ; push MSB into C
    brcc clz_loop1     ; if C is cleared (msb was not set), continue loop
clzqi_end:

A clz on a hi/si/di would be almost the same. Extend the "rol rIn" to a rol per
sub_reg.
Of course there can be speed optimisation for hi/si/di, but for the AVR the
optimizer is in most cases set for size.
A library call to this is shorter but it may impose extra mov instruction to
fit the register constraints.

------- Comment #14 From Wouter van Gulik 2007-11-30 14:59 -------
Note that the use of clz for the avr is avoided by using avr-libc's math
library.
See http://lists.gnu.org/archive/html/avr-libc-dev/2007-11/msg00048.html for
more details.

------- Comment #15 From Joerg Wunsch 2007-12-22 17:15 -------
(In reply to comment #14)

> Note that the use of clz for the avr is avoided by using avr-libc's math
> library.

Not confirmed.  A simple test program using a floating point number:

#include <avr/io.h>
#include <math.h>

volatile float    a;

int main (void) 
{
 a=ADCH; 
}

results in 256 bytes of RAM allocation for __clz_tab[].

------- Comment #16 From Wouter van Gulik 2007-12-23 20:15 -------
> (In reply to comment #14)
> 
> > Note that the use of clz for the avr is avoided by using avr-libc's math
> > library.
> 
> Not confirmed.  A simple test program using a floating point number:
> 

This is probably due to somne naming problems of the latest avr-libc (1.6.x)
concerning  __floatunsisf/undisf.
I tested against the 1.4.x version of the library which does not have this
problem.

------- Comment #17 From Paulo Marques 2008-01-18 17:30 -------
I just found out what's causing this confusion. If you compile your program
like this:

avr-gcc -Os -mmcu=atmega168 -lm main.c -o main.elf

__clz_tab gets included. But if you compile like this:

avr-gcc -Os -mmcu=atmega168 main.c -lm -o main.elf

it doesn't!!!

So, the order you pass -lm matters to the final outcome.

Tested with gcc 4.2.2, libc 1.4.6 and libc 1.6.1.

------- Comment #18 From Joseph S. Myers 2008-02-01 16:53 -------
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.

------- Comment #19 From Joseph S. Myers 2008-05-19 20:22 -------
4.2.4 is being released, changing milestones to 4.2.5.

------- Comment #20 From Joseph S. Myers 2009-03-31 19:48 -------
Closing 4.2 branch.

------- Comment #21 From Richard Guenther 2009-08-04 12:28 -------
GCC 4.3.4 is being released, adjusting target milestone.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug