This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch,AVR]: PR50447: Tweak addhi3


Denis Chertykov schrieb:
> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>> Denis Chertykov schrieb:
>>> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>:
>>>> This patch do some tweaks to addhi3 like adding QI scratch register.
>>>>
>>>> The original *addhi3 insn is still there and located prior to new
>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this
>>>> note) so that there is a version with and a version without scratch register.
>>>>
>>>> Patch passes without regressions.
>>>>
>>> Which improvements added by this patch ?
>>>
>>> Denis.
>> If the addhi3 is expanded early, the addition happens with QI scratch which
>> avoids reload of constant if target register is in NO_LD. And reduce register
>> pressure as only QI is needed and not reload of constant to HI.
>>
>> Otherwise, there might be sequences like
>>
>> ldi r31, 2    ; *reload_inhi
>> mov r12, r31
>> clr r13
>>
>> add r14, r12  ; *addhi3
>> adc r15, r13
>>
>> which now will be
>>
>> ldi r31, 2    ; addhi3_clobber
>> add r14, r31
>> adc r15, __zero_reg__
>>
>> Similar applies if the reload of the constant happens to LD regs:
>>
>> ldi r30, 2    ; *movhi
>> clr r31
>>
>> add r14, r12  ; *addhi3
>> adc r15, r13
>>
>> will become
>>
>> ldi r30, 2    ; addhi3_clobber
>> add r14, r30
>> adc r15, __zero_reg__
>>
>> For *addhi3 insns the register pressure is not reduced but the insn sequence
>> might be smarter if peep2 comes up with a QI scratch or if it detects a
>> *reload_inhi insn just prior to the addition (and the reg that holds the
>> reloaded constant dies after the addition).
>>
>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn
>> without scratch. This is easier because, e.g. prologue and epilogue generation
>> generate add insns (not by means of addhi3 expander but by explicit
>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an
>> addhi3 insn is to be generated via addhi3 expander late in the compilation process
> 
> Please provide any real world example.
> 
> Denis.

Consider avr-libc (under the assumption that it is "real world" code):

In avr-libc's build directory, and with the patch integrated:

$ cd avr/lib/avr4
$ make clean && make CFLAGS='-save-temps -dp -Os'
$ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment)
$ grep 'addhi3_clobber\/2' *.s | wc -l
33

This shows that the insns are already there before peep2 and thus no reload of
16-bit constant is needed; an 8-bit scratch is sufficient.

Alternatively, the implementation could omit the expansion to addhi3_clobber in
addhi3 expander and instead rely completely on peep2. However, that does not
reduce register pressure because a 16-bit register will be allocated and the
peep2 just prints things smarter and needs just a QI scratch to call
avr_out_plus_clobber.

For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess.
 as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but
better because it sets cc0.

So you like this patch?
Or prefer a patch that is neutral with respect to register allocator and just
uses peep2 to print things smarter?

Johann


dtoa_prf.s:	ldi r31,3	 ; ,	 ;  338	addhi3_clobber/2	[length = 3]
dtoa_prf.s-	add r12,r31	 ;  s,
dtoa_prf.s-	adc r13,__zero_reg__	 ;  s
--
dtoa_prf.s:	ldi r31,3	 ; ,	 ;  447	addhi3_clobber/2	[length = 3]
dtoa_prf.s-	add r12,r31	 ;  s,
dtoa_prf.s-	adc r13,__zero_reg__	 ;  s
--
fgets.s:	ldi r31,1	 ; ,	 ;  70	addhi3_clobber/2	[length = 3]
fgets.s-	sub r14,r31	 ;  ivtmp.9,
fgets.s-	sbc r15,__zero_reg__	 ;  ivtmp.9
--
realloc.s:	ldi r17,2	 ; ,	 ;  80	addhi3_clobber/2	[length = 3]
realloc.s-	add r12,r17	 ;  tmp83,
realloc.s-	adc r13,__zero_reg__	 ; 
--
realloc.s:	ldi r18,2	 ; ,	 ;  85	addhi3_clobber/2	[length = 3]
realloc.s-	add r12,r18	 ;  tmp84,
realloc.s-	adc r13,__zero_reg__	 ; 
--
strtod.s:	ldi r31,1	 ; ,	 ;  101	addhi3_clobber/2	[length = 3]
strtod.s-	sub r14,r31	 ;  D.2581,
strtod.s-	sbc r15,__zero_reg__	 ;  D.2581
--
strtod.s:	ldi r18,2	 ; ,	 ;  110	addhi3_clobber/2	[length = 3]
strtod.s-	add r14,r18	 ;  nptr,
strtod.s-	adc r15,__zero_reg__	 ;  nptr
--
strtod.s:	ldi r21,7	 ; ,	 ;  120	addhi3_clobber/2	[length = 3]
strtod.s-	add r14,r21	 ;  nptr,
strtod.s-	adc r15,__zero_reg__	 ;  nptr
--
strtod.s:	ldi r31,255	 ; ,	 ;  175	addhi3_clobber/2	[length = 3]
strtod.s-	sub r14,r31	 ;  exp,
strtod.s-	sbc r15,r31	 ;  exp,
--
strtod.s:	ldi r18,1	 ; ,	 ;  185	addhi3_clobber/2	[length = 3]
strtod.s-	sub r14,r18	 ;  exp,
strtod.s-	sbc r15,__zero_reg__	 ;  exp
--
strtod.s:	ldi r31,24	 ; ,	 ;  376	addhi3_clobber/2	[length = 3]
strtod.s-	sub r8,r31	 ;  D.2735,
strtod.s-	sbc r9,__zero_reg__	 ;  D.2735
--
strtol.s:	ldi r31,2	 ; ,	 ;  128	addhi3_clobber/2	[length = 3]
strtol.s-	add r6,r31	 ;  nptr,
strtol.s-	adc r7,__zero_reg__	 ;  nptr
--
strtol.s:	ldi r31,1	 ; ,	 ;  242	addhi3_clobber/2	[length = 3]
strtol.s-	sub r6,r31	 ;  tmp117,
strtol.s-	sbc r7,__zero_reg__	 ; 
--
strtol.s:	ldi r31,2	 ; ,	 ;  252	addhi3_clobber/2	[length = 3]
strtol.s-	sub r6,r31	 ;  tmp119,
strtol.s-	sbc r7,__zero_reg__	 ; 
--
strtoul.s:	ldi r31,2	 ; ,	 ;  126	addhi3_clobber/2	[length = 3]
strtoul.s-	add r14,r31	 ;  nptr,
strtoul.s-	adc r15,__zero_reg__	 ;  nptr
--
strtoul.s:	ldi r31,1	 ; ,	 ;  229	addhi3_clobber/2	[length = 3]
strtoul.s-	sub r14,r31	 ;  tmp113,
strtoul.s-	sbc r15,__zero_reg__	 ; 
--
strtoul.s:	ldi r31,2	 ; ,	 ;  239	addhi3_clobber/2	[length = 3]
strtoul.s-	sub r14,r31	 ;  tmp115,
strtoul.s-	sbc r15,__zero_reg__	 ; 
--
vfprintf.s:	ldi r24,4	 ; ,	 ;  399	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r24	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r21,10	 ; ,	 ;  850	addhi3_clobber/2	[length = 3]
vfprintf.s-	sub r10,r21	 ;  exp,
vfprintf.s-	sbc r11,__zero_reg__	 ;  exp
--
vfprintf.s:	ldi r30,2	 ; ,	 ;  882	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r30	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r31,2	 ; ,	 ;  892	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r31	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r31,2	 ; ,	 ;  919	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r31	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r31,1	 ; ,	 ;  987	addhi3_clobber/2	[length = 3]
vfprintf.s-	sub r8,r31	 ;  size,
vfprintf.s-	sbc r9,__zero_reg__	 ;  size
--
vfprintf.s:	ldi r18,4	 ; ,	 ;  1012	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r18	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r31,2	 ; ,	 ;  1019	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r31	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r30,4	 ; ,	 ;  1109	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r30	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfprintf.s:	ldi r31,2	 ; ,	 ;  1116	addhi3_clobber/2	[length = 3]
vfprintf.s-	add r4,r31	 ;  ap,
vfprintf.s-	adc r5,__zero_reg__	 ;  ap
--
vfscanf.s:	ldi r27,1	 ; ,	 ;  213	addhi3_clobber/2	[length = 3]
vfscanf.s-	sub r10,r27	 ;  width,
vfscanf.s-	sbc r11,__zero_reg__	 ;  width
--
vfscanf.s:	ldi r25,255	 ; ,	 ;  163	addhi3_clobber/2	[length = 3]
vfscanf.s-	sub r12,r25	 ;  exp,
vfscanf.s-	sbc r13,r25	 ;  exp,
--
vfscanf.s:	ldi r30,1	 ; ,	 ;  173	addhi3_clobber/2	[length = 3]
vfscanf.s-	sub r12,r30	 ;  exp,
vfscanf.s-	sbc r13,__zero_reg__	 ;  exp
--
vfscanf.s:	ldi r25,24	 ; ,	 ;  354	addhi3_clobber/2	[length = 3]
vfscanf.s-	sub r6,r25	 ;  D.3471,
vfscanf.s-	sbc r7,__zero_reg__	 ;  D.3471
--
vfscanf.s:	ldi r31,1	 ; ,	 ;  235	addhi3_clobber/2	[length = 3]
vfscanf.s-	sub r12,r31	 ;  width,
vfscanf.s-	sbc r13,__zero_reg__	 ;  width
--
vfscanf.s:	ldi r31,1	 ; ,	 ;  334	addhi3_clobber/2	[length = 3]
vfscanf.s-	sub r12,r31	 ;  width,
vfscanf.s-	sbc r13,__zero_reg__	 ;  width

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]