This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch,AVR]: PR50447: Tweak addhi3
Denis Chertykov schrieb:
> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>> Denis Chertykov schrieb:
>>> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>>>> Denis Chertykov schrieb:
>>>>> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>>>>>> This patch do some tweaks to addhi3 like adding QI scratch register.
>>>>>>
>>>>>> The original *addhi3 insn is still there and located prior to new
>>>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this
>>>>>> note) so that there is a version with and a version without scratch register.
>>>>>>
>>>>>> Patch passes without regressions.
>>>>>>
>>>>> Which improvements added by this patch ?
>>>>>
>>>>> Denis.
>>>> If the addhi3 is expanded early, the addition happens with QI scratch which
>>>> avoids reload of constant if target register is in NO_LD. And reduce register
>>>> pressure as only QI is needed and not reload of constant to HI.
>>>>
>>>> Otherwise, there might be sequences like
>>>>
>>>> ldi r31, 2 ; *reload_inhi
>>>> mov r12, r31
>>>> clr r13
>>>>
>>>> add r14, r12 ; *addhi3
>>>> adc r15, r13
>>>>
>>>> which now will be
>>>>
>>>> ldi r31, 2 ; addhi3_clobber
>>>> add r14, r31
>>>> adc r15, __zero_reg__
>>>>
>>>> Similar applies if the reload of the constant happens to LD regs:
>>>>
>>>> ldi r30, 2 ; *movhi
>>>> clr r31
>>>>
>>>> add r14, r12 ; *addhi3
>>>> adc r15, r13
>>>>
>>>> will become
>>>>
>>>> ldi r30, 2 ; addhi3_clobber
>>>> add r14, r30
>>>> adc r15, __zero_reg__
>>>>
>>>> For *addhi3 insns the register pressure is not reduced but the insn sequence
>>>> might be smarter if peep2 comes up with a QI scratch or if it detects a
>>>> *reload_inhi insn just prior to the addition (and the reg that holds the
>>>> reloaded constant dies after the addition).
>>>>
>>>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn
>>>> without scratch. This is easier because, e.g. prologue and epilogue generation
>>>> generate add insns (not by means of addhi3 expander but by explicit
>>>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an
>>>> addhi3 insn is to be generated via addhi3 expander late in the compilation process
>>> Please provide any real world example.
>>>
>>> Denis.
>> Consider avr-libc (under the assumption that it is "real world" code):
>>
>> In avr-libc's build directory, and with the patch integrated:
>>
>> $ cd avr/lib/avr4
>> $ make clean && make CFLAGS='-save-temps -dp -Os'
>> $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment)
>> $ grep 'addhi3_clobber\/2' *.s | wc -l
>> 33
>>
>> This shows that the insns are already there before peep2 and thus no reload of
>> 16-bit constant is needed; an 8-bit scratch is sufficient.
>>
>> Alternatively, the implementation could omit the expansion to addhi3_clobber in
>> addhi3 expander and instead rely completely on peep2. However, that does not
>> reduce register pressure because a 16-bit register will be allocated and the
>> peep2 just prints things smarter and needs just a QI scratch to call
>> avr_out_plus_clobber.
>>
>> For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess.
>> as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but
>> better because it sets cc0.
>>
>> So you like this patch?
>> Or prefer a patch that is neutral with respect to register allocator and just
>> uses peep2 to print things smarter?
>
> I'm interested in code improvements.
> What difference in size of avr-libc ?
>
> Denis.
I have to tool for smart size analysis, so here is just a diff:
After rebuilding avr-libc with respective compiler version, did respectively:
$ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt
$ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt
and then
$ diff -U 0 size-orig.txt size-patch.txt > size.diff
As far as I can see, there is not a big gain but no object increases in size.
For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%.
For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one
instruction better.
Johann
--- size-orig.txt 2011-10-18 19:59:52.000000000 +0200
+++ size-patch.txt 2011-10-18 19:50:59.000000000 +0200
@@ -7 +7 @@
- 750 0 0 750 2ee dtoa_prf.o (ex ./avr/lib/avr51/libc.a)
+ 724 0 0 724 2d4 dtoa_prf.o (ex ./avr/lib/avr51/libc.a)
@@ -11 +11 @@
- 722 6 0 728 2d8 malloc.o (ex ./avr/lib/avr51/libc.a)
+ 720 6 0 726 2d6 malloc.o (ex ./avr/lib/avr51/libc.a)
@@ -15,2 +15,2 @@
- 510 0 0 510 1fe realloc.o (ex ./avr/lib/avr51/libc.a)
- 747 0 0 747 2eb strtod.o (ex ./avr/lib/avr51/libc.a)
+ 506 0 0 506 1fa realloc.o (ex ./avr/lib/avr51/libc.a)
+ 739 0 0 739 2e3 strtod.o (ex ./avr/lib/avr51/libc.a)
@@ -18 +18 @@
- 536 0 0 536 218 strtoul.o (ex ./avr/lib/avr51/libc.a)
+ 530 0 0 530 212 strtoul.o (ex ./avr/lib/avr51/libc.a)
@@ -246,2 +246,2 @@
- 1042 0 0 1042 412 vfprintf_std.o (ex ./avr/lib/avr51/libc.a)
- 1490 0 0 1490 5d2 vfscanf_std.o (ex ./avr/lib/avr51/libc.a)
+ 1026 0 0 1026 402 vfprintf_std.o (ex ./avr/lib/avr51/libc.a)
+ 1488 0 0 1488 5d0 vfscanf_std.o (ex ./avr/lib/avr51/libc.a)
@@ -423 +423 @@
- 688 0 0 688 2b0 dtoa_prf.o (ex ./avr/lib/avr35/libc.a)
+ 670 0 0 670 29e dtoa_prf.o (ex ./avr/lib/avr35/libc.a)
@@ -427 +427 @@
- 708 6 0 714 2ca malloc.o (ex ./avr/lib/avr35/libc.a)
+ 706 6 0 712 2c8 malloc.o (ex ./avr/lib/avr35/libc.a)
@@ -431,3 +431,3 @@
- 440 0 0 440 1b8 realloc.o (ex ./avr/lib/avr35/libc.a)
- 733 0 0 733 2dd strtod.o (ex ./avr/lib/avr35/libc.a)
- 564 0 0 564 234 strtol.o (ex ./avr/lib/avr35/libc.a)
+ 436 0 0 436 1b4 realloc.o (ex ./avr/lib/avr35/libc.a)
+ 725 0 0 725 2d5 strtod.o (ex ./avr/lib/avr35/libc.a)
+ 562 0 0 562 232 strtol.o (ex ./avr/lib/avr35/libc.a)
@@ -662,2 +662,2 @@
- 964 0 0 964 3c4 vfprintf_std.o (ex ./avr/lib/avr35/libc.a)
- 1352 0 0 1352 548 vfscanf_std.o (ex ./avr/lib/avr35/libc.a)
+ 948 0 0 948 3b4 vfprintf_std.o (ex ./avr/lib/avr35/libc.a)
+ 1350 0 0 1350 546 vfscanf_std.o (ex ./avr/lib/avr35/libc.a)
@@ -815 +815 @@
- 682 0 0 682 2aa dtoa_prf.o (ex ./avr/lib/avr25/libc.a)
+ 664 0 0 664 298 dtoa_prf.o (ex ./avr/lib/avr25/libc.a)
@@ -819 +819 @@
- 704 6 0 710 2c6 malloc.o (ex ./avr/lib/avr25/libc.a)
+ 702 6 0 708 2c4 malloc.o (ex ./avr/lib/avr25/libc.a)
@@ -823,3 +823,3 @@
- 426 0 0 426 1aa realloc.o (ex ./avr/lib/avr25/libc.a)
- 713 0 0 713 2c9 strtod.o (ex ./avr/lib/avr25/libc.a)
- 554 0 0 554 22a strtol.o (ex ./avr/lib/avr25/libc.a)
+ 422 0 0 422 1a6 realloc.o (ex ./avr/lib/avr25/libc.a)
+ 705 0 0 705 2c1 strtod.o (ex ./avr/lib/avr25/libc.a)
+ 552 0 0 552 228 strtol.o (ex ./avr/lib/avr25/libc.a)
@@ -1054,2 +1054,2 @@
- 930 0 0 930 3a2 vfprintf_std.o (ex ./avr/lib/avr25/libc.a)
- 1286 0 0 1286 506 vfscanf_std.o (ex ./avr/lib/avr25/libc.a)
+ 914 0 0 914 392 vfprintf_std.o (ex ./avr/lib/avr25/libc.a)
+ 1284 0 0 1284 504 vfscanf_std.o (ex ./avr/lib/avr25/libc.a)
@@ -1447 +1447 @@
- 758 0 0 758 2f6 dtoa_prf.o (ex ./avr/lib/avr31/libc.a)
+ 734 0 0 734 2de dtoa_prf.o (ex ./avr/lib/avr31/libc.a)
@@ -1451 +1451 @@
- 752 6 0 758 2f6 malloc.o (ex ./avr/lib/avr31/libc.a)
+ 750 6 0 756 2f4 malloc.o (ex ./avr/lib/avr31/libc.a)
@@ -1455,4 +1455,4 @@
- 464 0 0 464 1d0 realloc.o (ex ./avr/lib/avr31/libc.a)
- 811 0 0 811 32b strtod.o (ex ./avr/lib/avr31/libc.a)
- 634 0 0 634 27a strtol.o (ex ./avr/lib/avr31/libc.a)
- 616 0 0 616 268 strtoul.o (ex ./avr/lib/avr31/libc.a)
+ 466 0 0 466 1d2 realloc.o (ex ./avr/lib/avr31/libc.a)
+ 809 0 0 809 329 strtod.o (ex ./avr/lib/avr31/libc.a)
+ 630 0 0 630 276 strtol.o (ex ./avr/lib/avr31/libc.a)
+ 614 0 0 614 266 strtoul.o (ex ./avr/lib/avr31/libc.a)
@@ -1686,2 +1686,2 @@
- 1064 0 0 1064 428 vfprintf_std.o (ex ./avr/lib/avr31/libc.a)
- 1582 0 0 1582 62e vfscanf_std.o (ex ./avr/lib/avr31/libc.a)
+ 1046 0 0 1046 416 vfprintf_std.o (ex ./avr/lib/avr31/libc.a)
+ 1580 0 0 1580 62c vfscanf_std.o (ex ./avr/lib/avr31/libc.a)
@@ -1791 +1791 @@
- 750 0 0 750 2ee dtoa_prf.o (ex ./avr/lib/avr6/libc.a)
+ 724 0 0 724 2d4 dtoa_prf.o (ex ./avr/lib/avr6/libc.a)
@@ -1795 +1795 @@
- 722 6 0 728 2d8 malloc.o (ex ./avr/lib/avr6/libc.a)
+ 720 6 0 726 2d6 malloc.o (ex ./avr/lib/avr6/libc.a)
@@ -1799,2 +1799,2 @@
- 508 0 0 508 1fc realloc.o (ex ./avr/lib/avr6/libc.a)
- 747 0 0 747 2eb strtod.o (ex ./avr/lib/avr6/libc.a)
+ 504 0 0 504 1f8 realloc.o (ex ./avr/lib/avr6/libc.a)
+ 739 0 0 739 2e3 strtod.o (ex ./avr/lib/avr6/libc.a)
@@ -1802 +1802 @@
- 536 0 0 536 218 strtoul.o (ex ./avr/lib/avr6/libc.a)
+ 530 0 0 530 212 strtoul.o (ex ./avr/lib/avr6/libc.a)
@@ -2030,2 +2030,2 @@
- 1042 0 0 1042 412 vfprintf_std.o (ex ./avr/lib/avr6/libc.a)
- 1490 0 0 1490 5d2 vfscanf_std.o (ex ./avr/lib/avr6/libc.a)
+ 1026 0 0 1026 402 vfprintf_std.o (ex ./avr/lib/avr6/libc.a)
+ 1488 0 0 1488 5d0 vfscanf_std.o (ex ./avr/lib/avr6/libc.a)
@@ -2135 +2135 @@
- 758 0 0 758 2f6 dtoa_prf.o (ex ./avr/lib/avr3/libc.a)
+ 734 0 0 734 2de dtoa_prf.o (ex ./avr/lib/avr3/libc.a)
@@ -2139 +2139 @@
- 752 6 0 758 2f6 malloc.o (ex ./avr/lib/avr3/libc.a)
+ 750 6 0 756 2f4 malloc.o (ex ./avr/lib/avr3/libc.a)
@@ -2143,4 +2143,4 @@
- 464 0 0 464 1d0 realloc.o (ex ./avr/lib/avr3/libc.a)
- 811 0 0 811 32b strtod.o (ex ./avr/lib/avr3/libc.a)
- 634 0 0 634 27a strtol.o (ex ./avr/lib/avr3/libc.a)
- 616 0 0 616 268 strtoul.o (ex ./avr/lib/avr3/libc.a)
+ 466 0 0 466 1d2 realloc.o (ex ./avr/lib/avr3/libc.a)
+ 809 0 0 809 329 strtod.o (ex ./avr/lib/avr3/libc.a)
+ 630 0 0 630 276 strtol.o (ex ./avr/lib/avr3/libc.a)
+ 614 0 0 614 266 strtoul.o (ex ./avr/lib/avr3/libc.a)
@@ -2374,2 +2374,2 @@
- 1064 0 0 1064 428 vfprintf_std.o (ex ./avr/lib/avr3/libc.a)
- 1582 0 0 1582 62e vfscanf_std.o (ex ./avr/lib/avr3/libc.a)
+ 1046 0 0 1046 416 vfprintf_std.o (ex ./avr/lib/avr3/libc.a)
+ 1580 0 0 1580 62c vfscanf_std.o (ex ./avr/lib/avr3/libc.a)
@@ -2527 +2527 @@
- 688 0 0 688 2b0 dtoa_prf.o (ex ./avr/lib/avr5/libc.a)
+ 670 0 0 670 29e dtoa_prf.o (ex ./avr/lib/avr5/libc.a)
@@ -2531 +2531 @@
- 708 6 0 714 2ca malloc.o (ex ./avr/lib/avr5/libc.a)
+ 706 6 0 712 2c8 malloc.o (ex ./avr/lib/avr5/libc.a)
@@ -2535,2 +2535,2 @@
- 440 0 0 440 1b8 realloc.o (ex ./avr/lib/avr5/libc.a)
- 719 0 0 719 2cf strtod.o (ex ./avr/lib/avr5/libc.a)
+ 436 0 0 436 1b4 realloc.o (ex ./avr/lib/avr5/libc.a)
+ 711 0 0 711 2c7 strtod.o (ex ./avr/lib/avr5/libc.a)
@@ -2538 +2538 @@
- 492 0 0 492 1ec strtoul.o (ex ./avr/lib/avr5/libc.a)
+ 486 0 0 486 1e6 strtoul.o (ex ./avr/lib/avr5/libc.a)
@@ -2766,2 +2766,2 @@
- 960 0 0 960 3c0 vfprintf_std.o (ex ./avr/lib/avr5/libc.a)
- 1352 0 0 1352 548 vfscanf_std.o (ex ./avr/lib/avr5/libc.a)
+ 944 0 0 944 3b0 vfprintf_std.o (ex ./avr/lib/avr5/libc.a)
+ 1350 0 0 1350 546 vfscanf_std.o (ex ./avr/lib/avr5/libc.a)
@@ -3855 +3855 @@
- 682 0 0 682 2aa dtoa_prf.o (ex ./avr/lib/avr4/libc.a)
+ 664 0 0 664 298 dtoa_prf.o (ex ./avr/lib/avr4/libc.a)
@@ -3859 +3859 @@
- 704 6 0 710 2c6 malloc.o (ex ./avr/lib/avr4/libc.a)
+ 702 6 0 708 2c4 malloc.o (ex ./avr/lib/avr4/libc.a)
@@ -3863,2 +3863,2 @@
- 426 0 0 426 1aa realloc.o (ex ./avr/lib/avr4/libc.a)
- 697 0 0 697 2b9 strtod.o (ex ./avr/lib/avr4/libc.a)
+ 422 0 0 422 1a6 realloc.o (ex ./avr/lib/avr4/libc.a)
+ 689 0 0 689 2b1 strtod.o (ex ./avr/lib/avr4/libc.a)
@@ -3866 +3866 @@
- 482 0 0 482 1e2 strtoul.o (ex ./avr/lib/avr4/libc.a)
+ 476 0 0 476 1dc strtoul.o (ex ./avr/lib/avr4/libc.a)
@@ -4094,2 +4094,2 @@
- 930 0 0 930 3a2 vfprintf_std.o (ex ./avr/lib/avr4/libc.a)
- 1286 0 0 1286 506 vfscanf_std.o (ex ./avr/lib/avr4/libc.a)
+ 914 0 0 914 392 vfprintf_std.o (ex ./avr/lib/avr4/libc.a)
+ 1284 0 0 1284 504 vfscanf_std.o (ex ./avr/lib/avr4/libc.a)
@@ -4379 +4379 @@
- 752 0 0 752 2f0 dtoa_prf.o (ex ./avr/lib/avr2/libc.a)
+ 728 0 0 728 2d8 dtoa_prf.o (ex ./avr/lib/avr2/libc.a)
@@ -4383 +4383 @@
- 748 6 0 754 2f2 malloc.o (ex ./avr/lib/avr2/libc.a)
+ 746 6 0 752 2f0 malloc.o (ex ./avr/lib/avr2/libc.a)
@@ -4387,4 +4387,4 @@
- 450 0 0 450 1c2 realloc.o (ex ./avr/lib/avr2/libc.a)
- 791 0 0 791 317 strtod.o (ex ./avr/lib/avr2/libc.a)
- 624 0 0 624 270 strtol.o (ex ./avr/lib/avr2/libc.a)
- 606 0 0 606 25e strtoul.o (ex ./avr/lib/avr2/libc.a)
+ 452 0 0 452 1c4 realloc.o (ex ./avr/lib/avr2/libc.a)
+ 789 0 0 789 315 strtod.o (ex ./avr/lib/avr2/libc.a)
+ 620 0 0 620 26c strtol.o (ex ./avr/lib/avr2/libc.a)
+ 604 0 0 604 25c strtoul.o (ex ./avr/lib/avr2/libc.a)
@@ -4618,2 +4618,2 @@
- 1030 0 0 1030 406 vfprintf_std.o (ex ./avr/lib/avr2/libc.a)
- 1516 0 0 1516 5ec vfscanf_std.o (ex ./avr/lib/avr2/libc.a)
+ 1012 0 0 1012 3f4 vfprintf_std.o (ex ./avr/lib/avr2/libc.a)
+ 1514 0 0 1514 5ea vfscanf_std.o (ex ./avr/lib/avr2/libc.a)