I noticed that the amount of RAM used, compared to the code generated by gcc 4.1.1, is increased by 256 bytes and found that this it's due to the __clz_tab array linked in at RAM start.
Confirmed.
First off this should not matter as it should not be linked in as it is not used at all. Do you have a testcase which it links it in?
In fact this works correctly on the SPU target so I think avr is broken or you are really using __builtin_clz.
I have 3 projects involving gcc and avr, and all of these have an increased RAM usage due to __clz_tab linking after switching from gcc 4.1.1 to 4.2. I will try as soon as possible to find a suitable testcase.
Here's the testcase: ====================================== int main(int argc, char *argv[]) { float O1 = argc; float V1 = argc+2; int ret = O1 * V1; return ret; } ====================================== batt@murphy:~/src$ /usr/local/avr-3.4.4-install/bin/avr-gcc -Os -Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map batt@murphy:~/src$ /usr/local/avr-4.1.1-batt/bin/avr-gcc -Os -Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map batt@murphy:~/src$ /usr/local/avr-4.2-20061014-install/bin/avr-gcc -Os -Wl,-Map=bug29524.map,--cref bug29524.c && grep -P "0x0.*__clz" bug29524.map 0x00000434 __clzsi2 0x00800068 __clz_tab These are the three compilers: batt@murphy:~/src$ /usr/local/avr-3.4.4-install/bin/avr-gcc -v Reading specs from /usr/local/avr-3.4.4-install/bin/../lib/gcc/avr/3.4.4/specs Configured with: ../configure --prefix=/usr/local/avr-3.4.4-install/ --target=avr --enable-languages=c,c++ --enable-multilib --with-dwarf2 --disable-libmudflap --enable-target-optspace --enable-threads=single --with-gnu-ld --enable-install-libbfd --disable-werror --disable-gdbtk --disable-libmudflap --disable-nls --disable-__cxa_atexit --disable-clocale --disable-c-mbchar --disable-long-long --without-newlib Thread model: single gcc version 3.4.4 batt@murphy:~/src$ /usr/local/avr-4.1.1-batt/bin/avr-gcc -v Using built-in specs. Target: avr Configured with: ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2 : (reconfigured) ../configure --prefix=/usr/local/avr --target=avr --disable-nls --disable-libssp --with-dwarf2 --enable-languages=c,c++ --no-create --no-recursion Thread model: single gcc version 4.1.2 20061030 (prerelease)-batt-avr1281 batt@murphy:~/src$ /usr/local/avr-4.2-20061014-install/bin/avr-gcc -v Using built-in specs. Target: avr Configured with: ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2 Thread model: single gcc version 4.2.0 20061014 (experimental) As you can see, the clz_tab is linked in only with the SVN version of 4.2, and it's not linked in with 4.1.1. I also used -Os (which is the compiler switch I care about).
Someone else is going to have to look into this as this works just fine on spu-elf.
Anatoly, can you have a look? It's a regression in 4.2 for AVR!
Will not be fixed in 4.2.0; retargeting at 4.2.1.
Here's what I see: The array __clz_tab is used in a macro, count_leading_zeros, which is called in the function __clzSI2 in libgcc2.c, which (AFAICT) gets compiled to the function __clzsi2 and aggregated in libgcc. The __clzsi2 function is called from the function clzusi() (in fp-bit.c) which is also included in libgcc. The clzusi() function is called from si_to_float() and usi_to_float() (also in fp-bit.c and included in libgcc). AFAICT, these two functions are used to convert an int or unsigned int to float. The test case does exactly this type of conversion in main() in comment #5. Testing shows that with gcc 4.2.1, and all int-to-float conversions removed, that __clz_tab is correctly not linked into the application. The clzusi() function was created in revision 107345, on Nov 22, 2005: http://gcc.gnu.org/viewcvs?view=rev&revision=107345 This seems like it was an intended change. However, it is unfortunate that a 256-byte array is used in the count_leading_zeros macro. While using a table is fast and the size is neglible on larger platforms, using up 256 bytes is very significant on the AVR where 4K, 2K or even 1K of RAM is common. What is really needed is an alternative implementation (non-array) that is perhaps specific to the AVR.
(In reply to comment #9) > Here's what I see: > > The array __clz_tab is used in a macro, count_leading_zeros, which is called in > the function __clzSI2 in libgcc2.c, which (AFAICT) gets compiled to the > function __clzsi2 and aggregated in libgcc. The __clzsi2 function is called > from the function clzusi() (in fp-bit.c) which is also included in libgcc. The > clzusi() function is called from si_to_float() and usi_to_float() (also in > fp-bit.c and included in libgcc). AFAICT, these two functions are used to > convert an int or unsigned int to float. > > The test case does exactly this type of conversion in main() in comment #5. > Testing shows that with gcc 4.2.1, and all int-to-float conversions removed, > that __clz_tab is correctly not linked into the application. > > The clzusi() function was created in revision 107345, on Nov 22, 2005: > http://gcc.gnu.org/viewcvs?view=rev&revision=107345 > > This seems like it was an intended change. However, it is unfortunate that a > 256-byte array is used in the count_leading_zeros macro. While using a table is > fast and the size is neglible on larger platforms, using up 256 bytes is very > significant on the AVR where 4K, 2K or even 1K of RAM is common. What is really > needed is an alternative implementation (non-array) that is perhaps specific to > the AVR. Here's an untested (I'm going to try to figure out how to get it to build into the AVR build) function that replaces the definition of clz_tab with a 6 instruction bit of code: ; r2 in, r3 out ; r2 clobbered ; Z, C, N. V clobbered clz_compute: ldi r3, 0x09 ; preload output clc ; clear C (guarentees termination with 8 loops) clz_compute_loop1: rol r2 ; push MSB into C dec r3 ; dec output brcs clz_end ; if C is set (msb was set), we're done rjmp clz_compute_loop1 ; otherwise, repeat clz_end:
(In reply to comment #10) > Here's an untested (I'm going to try to figure out how to get it to build into > the AVR build) function that replaces the definition of clz_tab with a 6 > instruction bit of code: > > ; r2 in, r3 out > ; r2 clobbered > ; Z, C, N. V clobbered > clz_compute: > ldi r3, 0x09 ; preload output > clc ; clear C (guarentees termination with 8 loops) > clz_compute_loop1: > rol r2 ; push MSB into C > dec r3 ; dec output > brcs clz_end ; if C is set (msb was set), we're done > rjmp clz_compute_loop1 ; otherwise, repeat > clz_end: > And the first bug of the day, clc should be sec. brcs will only jump out if C is set. On to prodding gcc...
Change target milestone to 4.2.3, as 4.2.2 has been released.
(In reply to comment #10) Something like this is smaller, faster and works for all registers (no need for LD_regs). And could easily be writtin in to a insn: ; rOut: output register ; rIn: input register ; rIn, Z, N are clobbered, C is set clzqi_init: clr rOut ; clear to zero neg rOut ; make -1, and set C (C used for garanteed termination) clzqi_loop1: inc rOut ; inc output (C not touched) rol rIn ; push MSB into C brcc clz_loop1 ; if C is cleared (msb was not set), continue loop clzqi_end: A clz on a hi/si/di would be almost the same. Extend the "rol rIn" to a rol per sub_reg. Of course there can be speed optimisation for hi/si/di, but for the AVR the optimizer is in most cases set for size. A library call to this is shorter but it may impose extra mov instruction to fit the register constraints.
Note that the use of clz for the avr is avoided by using avr-libc's math library. See http://lists.gnu.org/archive/html/avr-libc-dev/2007-11/msg00048.html for more details.
(In reply to comment #14) > Note that the use of clz for the avr is avoided by using avr-libc's math > library. Not confirmed. A simple test program using a floating point number: #include <avr/io.h> #include <math.h> volatile float a; int main (void) { a=ADCH; } results in 256 bytes of RAM allocation for __clz_tab[].
> (In reply to comment #14) > > > Note that the use of clz for the avr is avoided by using avr-libc's math > > library. > > Not confirmed. A simple test program using a floating point number: > This is probably due to somne naming problems of the latest avr-libc (1.6.x) concerning __floatunsisf/undisf. I tested against the 1.4.x version of the library which does not have this problem.
I just found out what's causing this confusion. If you compile your program like this: avr-gcc -Os -mmcu=atmega168 -lm main.c -o main.elf __clz_tab gets included. But if you compile like this: avr-gcc -Os -mmcu=atmega168 main.c -lm -o main.elf it doesn't!!! So, the order you pass -lm matters to the final outcome. Tested with gcc 4.2.2, libc 1.4.6 and libc 1.6.1.
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.
4.2.4 is being released, changing milestones to 4.2.5.
Closing 4.2 branch.
GCC 4.3.4 is being released, adjusting target milestone.
GCC 4.3.5 is being released, adjusting target milestone.
longlong.h is plain vanilla for avr. Putting some inline assemler magic there will do the job.
Author: gjl Date: Thu Jun 16 09:06:44 2011 New Revision: 175097 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175097 Log: gcc/ PR target/49313 PR target/29524 * longlong.h: Add AVR support: (count_leading_zeros): New macro. (count_trailing_zeros): New macro. (COUNT_LEADING_ZEROS_0): New macro. * config/avr/t-avr (LIB1ASMFUNCS): Add _ffssi2, _ffshi2, _loop_ffsqi2, _ctzsi2, _ctzhi2, _clzdi2, _clzsi2, _clzhi2, _paritydi2, _paritysi2, _parityhi2, _popcounthi2,_popcountsi2, _popcountdi2, _popcountqi2, _bswapsi2, _bswapdi2, _ashldi3, _ashrdi3, _lshrdi3 (LIB2FUNCS_EXCLUDE): Add _clz. * config/avr/libgcc.S (XCALL): Move up in file. (XJMP): New C Macro. (DEFUN): New asm macro. (ENDF): New asm macro. (__ffssi2): New function. (__ffshi2): New function. (__loop_ffsqi2): New function. (__ctzsi2): New function. (__ctzhi2): New function. (__clzdi2): New function. (__clzsi2): New function. (__clzhi2): New function. (__paritydi2): New function. (__paritysi2): New function. (__parityhi2): New function. (__popcounthi2): New function. (__popcountsi2): New function. (__popcountdi2): New function. (__popcountqi2): New function. (__bswapsi2): New function. (__bswapdi2): New function. (__ashldi3): New function. (__ashrdi3): New function. (__lshrdi3): New function. Fix suspicous lines. libgcc/ PR target/49313 PR target/29524 * config/avr/t-avr: Fix line endings. (intfuncs16): Remove _ffsXX2, _clzXX2, _ctzXX2, _popcountXX2, _parityXX2. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/libgcc.S trunk/gcc/config/avr/t-avr trunk/gcc/longlong.h trunk/libgcc/ChangeLog trunk/libgcc/config/avr/t-avr
Closed as resolved+fixed as using __clz_tab is avoided altogether now.