[Patch,AVR]: ad PR49313: 64-bit division
Denis Chertykov
chertykov@gmail.com
Mon Nov 21 10:31:00 GMT 2011
2011/11/20 Georg-Johann Lay <avr@gjlay.de>:
> This implements assembler drop-in replacement for 64-bit division/remainder.
>
> The original libgcc implementation is extremely resource gulping because it
> uses inline in several places and DImode is resource gulping, anyway.
>
> With the patch the sizes (accumulated over all modules with same name) are:
>
> Col #2 = Original libgcc (C-Code)
> Col #3 = New implementation in Asm
> Col #4 = Change (relative)
> Col #5 = Change (absolute)
>
> _udivmod64.o : 0 1362 100000.0% 1362
> _negdi2.o : 1296 288 -77.8% -1008
> _umoddi3.o : 22326 0 -100.0% -22326
> _udivdi3.o : 26878 246 -99.1% -26632
> _moddi3.o : 28986 0 -100.0% -28986
> _divdi3.o : 33076 840 -97.5% -32236
> :::::: Total ::::::: : 362188 252362 -30.3% -109826
>
> The detailed size report:
>
> avr3/libgcc.a!_udivmod64.o : 0 174 100000.0% 174
> avr31/libgcc.a!_udivmod64.o : 0 174 100000.0% 174
> avr6/libgcc.a!_udivmod64.o : 0 162 100000.0% 162
> avr5/libgcc.a!_udivmod64.o : 0 162 100000.0% 162
> avr35/libgcc.a!_udivmod64.o : 0 162 100000.0% 162
> avr51/libgcc.a!_udivmod64.o : 0 162 100000.0% 162
> avr25/libgcc.a!_udivmod64.o : 0 126 100000.0% 126
> avr4/libgcc.a!_udivmod64.o : 0 126 100000.0% 126
> libgcc.a!_udivmod64.o : 0 114 100000.0% 114
> avr4/libgcc.a!_negdi2.o : 140 32 -77.1% -108
> avr25/libgcc.a!_negdi2.o : 140 32 -77.1% -108
> avr5/libgcc.a!_negdi2.o : 144 32 -77.8% -112
> avr6/libgcc.a!_negdi2.o : 144 32 -77.8% -112
> libgcc.a!_negdi2.o : 144 32 -77.8% -112
> avr35/libgcc.a!_negdi2.o : 144 32 -77.8% -112
> avr51/libgcc.a!_negdi2.o : 144 32 -77.8% -112
> avr31/libgcc.a!_negdi2.o : 148 32 -78.4% -116
> avr3/libgcc.a!_negdi2.o : 148 32 -78.4% -116
> avr4/libgcc.a!_umoddi3.o : 2304 0 -100.0% -2304
> avr6/libgcc.a!_umoddi3.o : 2360 0 -100.0% -2360
> avr51/libgcc.a!_umoddi3.o : 2360 0 -100.0% -2360
> avr5/libgcc.a!_umoddi3.o : 2360 0 -100.0% -2360
> avr25/libgcc.a!_umoddi3.o : 2364 0 -100.0% -2364
> avr35/libgcc.a!_umoddi3.o : 2420 0 -100.0% -2420
> libgcc.a!_umoddi3.o : 2682 0 -100.0% -2682
> avr3/libgcc.a!_umoddi3.o : 2738 0 -100.0% -2738
> avr31/libgcc.a!_umoddi3.o : 2738 0 -100.0% -2738
> avr4/libgcc.a!_udivdi3.o : 2784 26 -99.1% -2758
> avr25/libgcc.a!_udivdi3.o : 2828 26 -99.1% -2802
> avr5/libgcc.a!_udivdi3.o : 2852 28 -99.0% -2824
> avr6/libgcc.a!_udivdi3.o : 2852 28 -99.0% -2824
> avr51/libgcc.a!_udivdi3.o : 2852 28 -99.0% -2824
> avr35/libgcc.a!_udivdi3.o : 2896 28 -99.0% -2868
> avr4/libgcc.a!_moddi3.o : 3016 0 -100.0% -3016
> avr5/libgcc.a!_moddi3.o : 3072 0 -100.0% -3072
> avr6/libgcc.a!_moddi3.o : 3072 0 -100.0% -3072
> avr51/libgcc.a!_moddi3.o : 3072 0 -100.0% -3072
> avr25/libgcc.a!_moddi3.o : 3124 0 -100.0% -3124
> avr35/libgcc.a!_moddi3.o : 3180 0 -100.0% -3180
> libgcc.a!_udivdi3.o : 3226 26 -99.2% -3200
> avr31/libgcc.a!_udivdi3.o : 3294 28 -99.1% -3266
> avr3/libgcc.a!_udivdi3.o : 3294 28 -99.1% -3266
> avr4/libgcc.a!_divdi3.o : 3396 86 -97.5% -3310
> avr5/libgcc.a!_divdi3.o : 3464 98 -97.2% -3366
> avr51/libgcc.a!_divdi3.o : 3464 98 -97.2% -3366
> avr6/libgcc.a!_divdi3.o : 3464 98 -97.2% -3366
> libgcc.a!_moddi3.o : 3446 0 -100.0% -3446
> avr25/libgcc.a!_divdi3.o : 3578 86 -97.6% -3492
> avr31/libgcc.a!_moddi3.o : 3502 0 -100.0% -3502
> avr3/libgcc.a!_moddi3.o : 3502 0 -100.0% -3502
> avr35/libgcc.a!_divdi3.o : 3646 98 -97.3% -3548
> libgcc.a!_divdi3.o : 3976 78 -98.0% -3898
> avr31/libgcc.a!_divdi3.o : 4044 100 -97.5% -3944
> avr3/libgcc.a!_divdi3.o : 4044 98 -97.6% -3946
> :::::: Total ::::::: : 362188 252362 -30.3% -109826
>
> The implementation is basically the same as the division/modulo already present
> in lib1funcs.S. However, the algorithm does not compute the complement by
> recycling the carry bit, instead it expands directly to the quotient. That way,
> n instructions can be saved when dealing with n-byte values.
>
> The implementation provides speed-up of the algorithm for the case when there
> is enough flash. The assumption is that speed for arithmetic matters.
>
> As you can see above, the size of __udivmod64 varies from 114 to 174 bytes:
>
> 114 = small devices without MOVW (no speed-up: SPEED_DIV = 0)
> 126 = small devices with MOVW (small speed-up: SPEED_DIV = 16)
> 162, 174 = devices >= 16k (best speed-up: SPEED_DIV = 8)
>
> Passed without regressions.
>
> Moreover, the algorithm is individually tested against the old implementation.
> The only difference I observed was for divisor = 0.
>
> Ok for trunk?
>
> Johann
>
> libgcc/
> PR target/49313
> * config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _moddi3, _umoddi3.
> (LIB1ASMFUNCS): Add _divdi3, _udivdi3, _udivmod64, _negdi2.
>
> * config/avr/lib1funcs.S (wmov): New assembler macro.
> (__umoddi3, __udivdi3, __udivdi3_umoddi3): New functions.
> (__moddi3, __divdi3, __divdi3_moddi3): New functions.
> (__udivmod64): New function.
> (__negdi2): New function.
>
Approved.
Denis.
More information about the Gcc-patches
mailing list