[Patch,AVR]: PR49687 (better widening 32-bit mul)
Georg-Johann Lay
avr@gjlay.de
Mon Jul 25 17:06:00 GMT 2011
Weddington, Eric wrote:
>
>> Eric, can you review the assembler routines and say if such reuse is ok or if you'd prefer a
>> speed-optimized version of __mulsi3 like in the current libgcc?
>
> Hi Johann,
>
> Typically a penalty on speed is preferred over a penalty on code size. Do you already have
> information on how it compares on code size with the old routines?
>
> Eric
The old sizes are
62 __mulsi3
26 __mulhisi3
22 __umulhisi3
10 __xmulhisi3
where the __[u]mulhisi3 will drag in __xmulhisi3 and the insns don't combine
with constants.
The new implementation has more fragments, the indented modules are dragged
in i.e. used by respective function:
12 __mulhisi3
__umulhisi3
__usmulhisi3_tail
30 __umulhisi3
02 __usmulhisi3
10 __usmulhisi3_tail
20 __muluhisi3
__umulhisi3
08 __mulohisi3
04 __mulshisi3
__muluhisi3
30 __mulsi3
__muluhisi3
This means that a pure __mulsi3 will have 30+30+20 = 80 bytes (+18).
If all functions are used they occupy 116 bytes (-4), so they actually
save a little space if they are used all with the benefit that they also
can one-extend, extend 32 = 16*32 as well as 32=16*16 and work for
small (17 bit signed) constants.
__umulhisi3 reads:
DEFUN __umulhisi3
mul A0, B0
movw C0, r0
mul A1, B1
movw C2, r0
mul A0, B1
add C1, r0
adc C2, r1
clr __zero_reg__
adc C3, __zero_reg__
mul A1, B0
add C1, r0
adc C2, r1
clr __zero_reg__
adc C3, __zero_reg__
ret
ENDF __umulhisi3
It could be compressed to the following sequence, i.e.
24 bytes instead of 30, but I think that's too much of
quenching the last byte out of the code:
DEFUN __umulhisi3
mul A0, B0
movw C0, r0
mul A1, B1
movw C2, r0
mul A0, B1
rcall 1f
mul A1, B0
1: add C1, r0
adc C2, r1
clr __zero_reg__
adc C3, __zero_reg__
ret
ENDF __umulhisi3
In that lack of real-world-code that uses 32-bit arithmetic I trust
my intuition that code size will decrease in general ;-)
Tiny examples are sometimes misleading because of additional moves from
unpleasant register allocation, bit that's a different story...
Johann
More information about the Gcc-patches
mailing list