29524 – [4.3/4.4/4.5/4.6 Regression] Too much RAM used: __clz_tab[] linked

Bug 29524 - [4.3/4.4/4.5/4.6 Regression] Too much RAM used: __clz_tab[] linked

Summary: [4.3/4.4/4.5/4.6 Regression] Too much RAM used: __clz_tab[] linked

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.2.0

Importance:	P5 normal
Target Milestone:	4.7.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2006-10-20 12:26 UTC by Francesco Sacchi
Modified:	2011-11-07 15:07 UTC (History)
CC List:	9 users (show)

See Also:
Host:
Target:	avr--
Build:
Known to work:	4.7.0
Known to fail:
Last reconfirmed:	2007-07-23 22:58:29

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Francesco Sacchi 2006-10-20 12:26:38 UTC

I noticed that the amount of RAM used, compared to the code generated by gcc 4.1.1, is increased by 256 bytes and found that this it's due to the __clz_tab array linked in at RAM start.

Comment 1 Richard Biener 2006-10-20 14:39:55 UTC

Confirmed.

Comment 2 Andrew Pinski 2006-10-20 15:58:55 UTC

First off this should not matter as it should not be linked in as it is not used at all.  Do you have a testcase which it links it in?

Comment 3 Andrew Pinski 2006-10-20 16:19:49 UTC

In fact this works correctly on the SPU target so I think avr is broken or you are really using __builtin_clz.

Comment 4 Francesco Sacchi 2006-10-23 19:09:36 UTC

I have 3 projects involving gcc and avr, and all of these have an increased RAM usage due to __clz_tab linking after switching from gcc 4.1.1 to 4.2.
I will try as soon as possible to find a suitable testcase.

Comment 5 Francesco Sacchi 2006-11-07 11:10:48 UTC

Here's the testcase:

======================================
int main(int argc, char *argv[])
{
       float O1 = argc;
       float V1 = argc+2;

       int ret = O1 * V1;
       return ret;
}
======================================

batt@murphy:~/src$ /usr/local/avr-3.4.4-install/bin/avr-gcc -Os -Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map

batt@murphy:~/src$ /usr/local/avr-4.1.1-batt/bin/avr-gcc -Os -Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map

batt@murphy:~/src$ /usr/local/avr-4.2-20061014-install/bin/avr-gcc -Os -Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map
                0x00000434                __clzsi2
                0x00800068                __clz_tab


These are the three compilers:

batt@murphy:~/src$ /usr/local/avr-3.4.4-install/bin/avr-gcc -v
Reading specs from /usr/local/avr-3.4.4-install/bin/../lib/gcc/avr/3.4.4/specs
Configured with: ../configure --prefix=/usr/local/avr-3.4.4-install/ --target=avr --enable-languages=c,c++ --enable-multilib --with-dwarf2 --disable-libmudflap --enable-target-optspace --enable-threads=single --with-gnu-ld --enable-install-libbfd --disable-werror --disable-gdbtk --disable-libmudflap --disable-nls --disable-__cxa_atexit --disable-clocale --disable-c-mbchar --disable-long-long --without-newlib
Thread model: single
gcc version 3.4.4

batt@murphy:~/src$ /usr/local/avr-4.1.1-batt/bin/avr-gcc -v
Using built-in specs.
Target: avr
Configured with: ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2 : (reconfigured) ../configure --prefix=/usr/local/avr --target=avr --disable-nls --disable-libssp --with-dwarf2 --enable-languages=c,c++ --no-create --no-recursion
Thread model: single
gcc version 4.1.2 20061030 (prerelease)-batt-avr1281

batt@murphy:~/src$ /usr/local/avr-4.2-20061014-install/bin/avr-gcc -v
Using built-in specs.
Target: avr
Configured with: ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.2.0 20061014 (experimental)


As you can see, the clz_tab is linked in only with the SVN version of 4.2, and it's not linked in with 4.1.1. I also used -Os (which is the compiler switch I care about).

Comment 6 Andrew Pinski 2006-12-03 21:41:00 UTC

Someone else is going to have to look into this as this works just fine on spu-elf.

Comment 7 Giovanni Bajo 2007-04-02 22:47:40 UTC

Anatoly, can you have a look? It's a regression in 4.2 for AVR!

Comment 8 Mark Mitchell 2007-05-14 22:28:42 UTC

Will not be fixed in 4.2.0; retargeting at 4.2.1.

Comment 9 Eric Weddington 2007-07-23 22:57:27 UTC

Here's what I see:

The array __clz_tab is used in a macro, count_leading_zeros, which is called in the function __clzSI2 in libgcc2.c, which (AFAICT) gets compiled to the function __clzsi2 and aggregated in libgcc. The __clzsi2 function is called from the function clzusi() (in fp-bit.c) which is also included in libgcc. The clzusi() function is called from si_to_float() and usi_to_float() (also in fp-bit.c and included in libgcc). AFAICT, these two functions are used to convert an int or unsigned int to float. 

The test case does exactly this type of conversion in main() in comment #5. Testing shows that with gcc 4.2.1, and all int-to-float conversions removed, that __clz_tab is correctly not linked into the application.

The clzusi() function was created in revision 107345, on Nov 22, 2005:
http://gcc.gnu.org/viewcvs?view=rev&revision=107345

This seems like it was an intended change. However, it is unfortunate that a 256-byte array is used in the count_leading_zeros macro. While using a table is fast and the size is neglible on larger platforms, using up 256 bytes is very significant on the AVR where 4K, 2K or even 1K of RAM is common. What is really needed is an alternative implementation (non-array) that is perhaps specific to the AVR.

Comment 10 Andrew Patrikalakis 2007-09-08 21:44:44 UTC

(In reply to comment #9)
> Here's what I see:
> 
> The array __clz_tab is used in a macro, count_leading_zeros, which is called in
> the function __clzSI2 in libgcc2.c, which (AFAICT) gets compiled to the
> function __clzsi2 and aggregated in libgcc. The __clzsi2 function is called
> from the function clzusi() (in fp-bit.c) which is also included in libgcc. The
> clzusi() function is called from si_to_float() and usi_to_float() (also in
> fp-bit.c and included in libgcc). AFAICT, these two functions are used to
> convert an int or unsigned int to float. 
> 
> The test case does exactly this type of conversion in main() in comment #5.
> Testing shows that with gcc 4.2.1, and all int-to-float conversions removed,
> that __clz_tab is correctly not linked into the application.
> 
> The clzusi() function was created in revision 107345, on Nov 22, 2005:
> http://gcc.gnu.org/viewcvs?view=rev&revision=107345
> 
> This seems like it was an intended change. However, it is unfortunate that a
> 256-byte array is used in the count_leading_zeros macro. While using a table is
> fast and the size is neglible on larger platforms, using up 256 bytes is very
> significant on the AVR where 4K, 2K or even 1K of RAM is common. What is really
> needed is an alternative implementation (non-array) that is perhaps specific to
> the AVR.

Here's an untested (I'm going to try to figure out how to get it to build into the AVR build) function that replaces the definition of clz_tab with a 6 instruction bit of code:

; r2 in, r3 out
; r2 clobbered
; Z, C, N. V clobbered
clz_compute:
        ldi r3, 0x09           ; preload output
        clc                    ; clear C (guarentees termination with 8 loops)
clz_compute_loop1:
        rol r2                 ; push MSB into C
        dec r3                 ; dec output
        brcs clz_end           ; if C is set (msb was set), we're done
        rjmp clz_compute_loop1 ; otherwise, repeat
clz_end:

Comment 11 Andrew Patrikalakis 2007-09-08 21:48:03 UTC

(In reply to comment #10)
> Here's an untested (I'm going to try to figure out how to get it to build into
> the AVR build) function that replaces the definition of clz_tab with a 6
> instruction bit of code:
> 
> ; r2 in, r3 out
> ; r2 clobbered
> ; Z, C, N. V clobbered
> clz_compute:
>         ldi r3, 0x09           ; preload output
>         clc                    ; clear C (guarentees termination with 8 loops)
> clz_compute_loop1:
>         rol r2                 ; push MSB into C
>         dec r3                 ; dec output
>         brcs clz_end           ; if C is set (msb was set), we're done
>         rjmp clz_compute_loop1 ; otherwise, repeat
> clz_end:
> 

And the first bug of the day, clc should be sec. brcs will only jump out if C is set. On to prodding gcc...

Comment 12 Mark Mitchell 2007-10-09 19:22:24 UTC

Change target milestone to 4.2.3, as 4.2.2 has been released.

Comment 13 Wouter van Gulik 2007-10-24 11:16:47 UTC

(In reply to comment #10)

Something like this is smaller, faster and works for all registers (no need for LD_regs). And could easily be writtin in to a insn:

; rOut: output register
; rIn:  input register
; rIn, Z, N are clobbered, C is set
clzqi_init:
    clr rOut           ; clear to zero
    neg rOut           ; make -1, and set C (C used for garanteed termination)
clzqi_loop1:
    inc rOut           ; inc output (C not touched)
    rol rIn            ; push MSB into C
    brcc clz_loop1     ; if C is cleared (msb was not set), continue loop
clzqi_end:

A clz on a hi/si/di would be almost the same. Extend the "rol rIn" to a rol per sub_reg.
Of course there can be speed optimisation for hi/si/di, but for the AVR the optimizer is in most cases set for size.
A library call to this is shorter but it may impose extra mov instruction to fit the register constraints.

Comment 14 Wouter van Gulik 2007-11-30 14:59:00 UTC

Note that the use of clz for the avr is avoided by using avr-libc's math library.
See http://lists.gnu.org/archive/html/avr-libc-dev/2007-11/msg00048.html for more details.

Comment 15 Joerg Wunsch 2007-12-22 17:15:35 UTC

(In reply to comment #14)

> Note that the use of clz for the avr is avoided by using avr-libc's math
> library.

Not confirmed.  A simple test program using a floating point number:

#include <avr/io.h>
#include <math.h>

volatile float    a;

int main (void) 
{
 a=ADCH; 
}

results in 256 bytes of RAM allocation for __clz_tab[].

Comment 16 Wouter van Gulik 2007-12-23 20:15:01 UTC

> (In reply to comment #14)
> 
> > Note that the use of clz for the avr is avoided by using avr-libc's math
> > library.
> 
> Not confirmed.  A simple test program using a floating point number:
> 

This is probably due to somne naming problems of the latest avr-libc (1.6.x) concerning  __floatunsisf/undisf.
I tested against the 1.4.x version of the library which does not have this problem.

Comment 17 Paulo Marques 2008-01-18 17:30:40 UTC

I just found out what's causing this confusion. If you compile your program like this:

avr-gcc -Os -mmcu=atmega168 -lm main.c -o main.elf

__clz_tab gets included. But if you compile like this:

avr-gcc -Os -mmcu=atmega168 main.c -lm -o main.elf

it doesn't!!!

So, the order you pass -lm matters to the final outcome.

Tested with gcc 4.2.2, libc 1.4.6 and libc 1.6.1.

Comment 18 Joseph S. Myers 2008-02-01 16:53:33 UTC

4.2.3 is being released now, changing milestones of open bugs to 4.2.4.

Comment 19 Joseph S. Myers 2008-05-19 20:22:58 UTC

4.2.4 is being released, changing milestones to 4.2.5.

Comment 20 Joseph S. Myers 2009-03-31 19:48:19 UTC

Closing 4.2 branch.

Comment 21 Richard Biener 2009-08-04 12:28:00 UTC

GCC 4.3.4 is being released, adjusting target milestone.

Comment 22 Richard Biener 2010-05-22 18:11:22 UTC

GCC 4.3.5 is being released, adjusting target milestone.

Comment 23 Georg-Johann Lay 2010-11-06 11:50:23 UTC

longlong.h is plain vanilla for avr. Putting some inline assemler magic there will do the job.

Comment 24 Georg-Johann Lay 2011-06-16 09:06:50 UTC

Author: gjl
Date: Thu Jun 16 09:06:44 2011
New Revision: 175097

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175097
Log:
gcc/
	PR target/49313
	PR target/29524
	* longlong.h: Add AVR support:
	(count_leading_zeros): New macro.
	(count_trailing_zeros): New macro.
	(COUNT_LEADING_ZEROS_0): New macro.
	* config/avr/t-avr (LIB1ASMFUNCS): Add
	_ffssi2, _ffshi2, _loop_ffsqi2,
	_ctzsi2, _ctzhi2, _clzdi2, _clzsi2, _clzhi2, 
	_paritydi2, _paritysi2, _parityhi2,
	_popcounthi2,_popcountsi2, _popcountdi2, _popcountqi2,
	_bswapsi2, _bswapdi2,
	_ashldi3, _ashrdi3, _lshrdi3
	(LIB2FUNCS_EXCLUDE): Add _clz.
	* config/avr/libgcc.S (XCALL): Move up in file.
	(XJMP): New C Macro.
	(DEFUN): New asm macro.
	(ENDF): New asm macro.
	(__ffssi2): New function.
	(__ffshi2): New function.
	(__loop_ffsqi2): New function.
	(__ctzsi2): New function.
	(__ctzhi2): New function.
	(__clzdi2): New function.
	(__clzsi2): New function.
	(__clzhi2): New function.
	(__paritydi2): New function.
	(__paritysi2): New function.
	(__parityhi2): New function.
	(__popcounthi2): New function.
	(__popcountsi2): New function.
	(__popcountdi2): New function.
	(__popcountqi2): New function.
	(__bswapsi2): New function.
	(__bswapdi2): New function.
	(__ashldi3): New function.
	(__ashrdi3): New function.
	(__lshrdi3): New function.
	Fix suspicous lines.

libgcc/
	PR target/49313
	PR target/29524
	* config/avr/t-avr: Fix line endings.
	(intfuncs16): Remove _ffsXX2,  _clzXX2, _ctzXX2, _popcountXX2,
	_parityXX2.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/avr/libgcc.S
    trunk/gcc/config/avr/t-avr
    trunk/gcc/longlong.h
    trunk/libgcc/ChangeLog
    trunk/libgcc/config/avr/t-avr

Comment 25 Georg-Johann Lay 2011-06-16 09:47:09 UTC

Closed as resolved+fixed as using __clz_tab is avoided altogether now.