This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch,AVR]: PR50447: Tweak addhi3
Denis Chertykov schrieb:
> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>> Denis Chertykov schrieb:
>>> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>>>> This patch do some tweaks to addhi3 like adding QI scratch register.
>>>>
>>>> The original *addhi3 insn is still there and located prior to new
>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this
>>>> note) so that there is a version with and a version without scratch register.
>>>>
>>>> Patch passes without regressions.
>>>>
>>> Which improvements added by this patch ?
>>>
>>> Denis.
>> If the addhi3 is expanded early, the addition happens with QI scratch which
>> avoids reload of constant if target register is in NO_LD. And reduce register
>> pressure as only QI is needed and not reload of constant to HI.
>>
>> Otherwise, there might be sequences like
>>
>> ldi r31, 2 ; *reload_inhi
>> mov r12, r31
>> clr r13
>>
>> add r14, r12 ; *addhi3
>> adc r15, r13
>>
>> which now will be
>>
>> ldi r31, 2 ; addhi3_clobber
>> add r14, r31
>> adc r15, __zero_reg__
>>
>> Similar applies if the reload of the constant happens to LD regs:
>>
>> ldi r30, 2 ; *movhi
>> clr r31
>>
>> add r14, r12 ; *addhi3
>> adc r15, r13
>>
>> will become
>>
>> ldi r30, 2 ; addhi3_clobber
>> add r14, r30
>> adc r15, __zero_reg__
>>
>> For *addhi3 insns the register pressure is not reduced but the insn sequence
>> might be smarter if peep2 comes up with a QI scratch or if it detects a
>> *reload_inhi insn just prior to the addition (and the reg that holds the
>> reloaded constant dies after the addition).
>>
>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn
>> without scratch. This is easier because, e.g. prologue and epilogue generation
>> generate add insns (not by means of addhi3 expander but by explicit
>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an
>> addhi3 insn is to be generated via addhi3 expander late in the compilation process
>
> Please provide any real world example.
>
> Denis.
Consider avr-libc (under the assumption that it is "real world" code):
In avr-libc's build directory, and with the patch integrated:
$ cd avr/lib/avr4
$ make clean && make CFLAGS='-save-temps -dp -Os'
$ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment)
$ grep 'addhi3_clobber\/2' *.s | wc -l
33
This shows that the insns are already there before peep2 and thus no reload of
16-bit constant is needed; an 8-bit scratch is sufficient.
Alternatively, the implementation could omit the expansion to addhi3_clobber in
addhi3 expander and instead rely completely on peep2. However, that does not
reduce register pressure because a 16-bit register will be allocated and the
peep2 just prints things smarter and needs just a QI scratch to call
avr_out_plus_clobber.
For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess.
as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but
better because it sets cc0.
So you like this patch?
Or prefer a patch that is neutral with respect to register allocator and just
uses peep2 to print things smarter?
Johann
dtoa_prf.s: ldi r31,3 ; , ; 338 addhi3_clobber/2 [length = 3]
dtoa_prf.s- add r12,r31 ; s,
dtoa_prf.s- adc r13,__zero_reg__ ; s
--
dtoa_prf.s: ldi r31,3 ; , ; 447 addhi3_clobber/2 [length = 3]
dtoa_prf.s- add r12,r31 ; s,
dtoa_prf.s- adc r13,__zero_reg__ ; s
--
fgets.s: ldi r31,1 ; , ; 70 addhi3_clobber/2 [length = 3]
fgets.s- sub r14,r31 ; ivtmp.9,
fgets.s- sbc r15,__zero_reg__ ; ivtmp.9
--
realloc.s: ldi r17,2 ; , ; 80 addhi3_clobber/2 [length = 3]
realloc.s- add r12,r17 ; tmp83,
realloc.s- adc r13,__zero_reg__ ;
--
realloc.s: ldi r18,2 ; , ; 85 addhi3_clobber/2 [length = 3]
realloc.s- add r12,r18 ; tmp84,
realloc.s- adc r13,__zero_reg__ ;
--
strtod.s: ldi r31,1 ; , ; 101 addhi3_clobber/2 [length = 3]
strtod.s- sub r14,r31 ; D.2581,
strtod.s- sbc r15,__zero_reg__ ; D.2581
--
strtod.s: ldi r18,2 ; , ; 110 addhi3_clobber/2 [length = 3]
strtod.s- add r14,r18 ; nptr,
strtod.s- adc r15,__zero_reg__ ; nptr
--
strtod.s: ldi r21,7 ; , ; 120 addhi3_clobber/2 [length = 3]
strtod.s- add r14,r21 ; nptr,
strtod.s- adc r15,__zero_reg__ ; nptr
--
strtod.s: ldi r31,255 ; , ; 175 addhi3_clobber/2 [length = 3]
strtod.s- sub r14,r31 ; exp,
strtod.s- sbc r15,r31 ; exp,
--
strtod.s: ldi r18,1 ; , ; 185 addhi3_clobber/2 [length = 3]
strtod.s- sub r14,r18 ; exp,
strtod.s- sbc r15,__zero_reg__ ; exp
--
strtod.s: ldi r31,24 ; , ; 376 addhi3_clobber/2 [length = 3]
strtod.s- sub r8,r31 ; D.2735,
strtod.s- sbc r9,__zero_reg__ ; D.2735
--
strtol.s: ldi r31,2 ; , ; 128 addhi3_clobber/2 [length = 3]
strtol.s- add r6,r31 ; nptr,
strtol.s- adc r7,__zero_reg__ ; nptr
--
strtol.s: ldi r31,1 ; , ; 242 addhi3_clobber/2 [length = 3]
strtol.s- sub r6,r31 ; tmp117,
strtol.s- sbc r7,__zero_reg__ ;
--
strtol.s: ldi r31,2 ; , ; 252 addhi3_clobber/2 [length = 3]
strtol.s- sub r6,r31 ; tmp119,
strtol.s- sbc r7,__zero_reg__ ;
--
strtoul.s: ldi r31,2 ; , ; 126 addhi3_clobber/2 [length = 3]
strtoul.s- add r14,r31 ; nptr,
strtoul.s- adc r15,__zero_reg__ ; nptr
--
strtoul.s: ldi r31,1 ; , ; 229 addhi3_clobber/2 [length = 3]
strtoul.s- sub r14,r31 ; tmp113,
strtoul.s- sbc r15,__zero_reg__ ;
--
strtoul.s: ldi r31,2 ; , ; 239 addhi3_clobber/2 [length = 3]
strtoul.s- sub r14,r31 ; tmp115,
strtoul.s- sbc r15,__zero_reg__ ;
--
vfprintf.s: ldi r24,4 ; , ; 399 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r24 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r21,10 ; , ; 850 addhi3_clobber/2 [length = 3]
vfprintf.s- sub r10,r21 ; exp,
vfprintf.s- sbc r11,__zero_reg__ ; exp
--
vfprintf.s: ldi r30,2 ; , ; 882 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r30 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r31,2 ; , ; 892 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r31 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r31,2 ; , ; 919 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r31 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r31,1 ; , ; 987 addhi3_clobber/2 [length = 3]
vfprintf.s- sub r8,r31 ; size,
vfprintf.s- sbc r9,__zero_reg__ ; size
--
vfprintf.s: ldi r18,4 ; , ; 1012 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r18 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r31,2 ; , ; 1019 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r31 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r30,4 ; , ; 1109 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r30 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfprintf.s: ldi r31,2 ; , ; 1116 addhi3_clobber/2 [length = 3]
vfprintf.s- add r4,r31 ; ap,
vfprintf.s- adc r5,__zero_reg__ ; ap
--
vfscanf.s: ldi r27,1 ; , ; 213 addhi3_clobber/2 [length = 3]
vfscanf.s- sub r10,r27 ; width,
vfscanf.s- sbc r11,__zero_reg__ ; width
--
vfscanf.s: ldi r25,255 ; , ; 163 addhi3_clobber/2 [length = 3]
vfscanf.s- sub r12,r25 ; exp,
vfscanf.s- sbc r13,r25 ; exp,
--
vfscanf.s: ldi r30,1 ; , ; 173 addhi3_clobber/2 [length = 3]
vfscanf.s- sub r12,r30 ; exp,
vfscanf.s- sbc r13,__zero_reg__ ; exp
--
vfscanf.s: ldi r25,24 ; , ; 354 addhi3_clobber/2 [length = 3]
vfscanf.s- sub r6,r25 ; D.3471,
vfscanf.s- sbc r7,__zero_reg__ ; D.3471
--
vfscanf.s: ldi r31,1 ; , ; 235 addhi3_clobber/2 [length = 3]
vfscanf.s- sub r12,r31 ; width,
vfscanf.s- sbc r13,__zero_reg__ ; width
--
vfscanf.s: ldi r31,1 ; , ; 334 addhi3_clobber/2 [length = 3]
vfscanf.s- sub r12,r31 ; width,
vfscanf.s- sbc r13,__zero_reg__ ; width