[PATCH v6 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0
Daniel Engel
gnu@danielengel.com
Mon Dec 27 19:04:56 GMT 2021
Hi Richard,
I am re-submitting my libgcc patch from last year:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
I clearly missed the stage1 window again. However, since the patch rebased
cleanly onto gcc-12 with no regressions, and it's not quite stage4 yet, I
figured submission is worth a chance.
Regards,
Daniel
---
Changes since v5:
* Rebased and tested with gcc-12
Regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}, clean master:
# of expected passes 513596
# of unexpected failures 38829
# of unexpected successes 16
# of expected failures 3450
# of unresolved testcases 1108
# of unsupported tests 28224
Patched master:
# of expected passes 513596
# of unexpected failures 38829
# of unexpected successes 16
# of expected failures 3450
# of unresolved testcases 1108
# of unsupported tests 28224
---
This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture. There
are improvements to most of the EABI integer functions as well. This is the
ibgcc component of a larger library project originally proposed in 2018:
https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.
I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.
There may be some follow-on projects worth discussing:
* The library is currently integrated into the ARM v6s-m multilib only. It
is likely that some other architectures would benefit from these routines.
However, I have NOT profiled the existing implementations (ieee754-sf.S) to
estimate where improvements may be found.
* GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
There may be useful bits in [1] that can be integrated.
On Cortex M0, the library has (approximately) the following properties:
Function(s) Size (bytes) Cycles Stack Accuracy
__clzsi2 50 20 0 exact
__clzsi2 (OPTIMIZE_SIZE) 22 51 0 exact
__clzdi2 8+__clzsi2 4+__clzsi2 0 exact
__clrsbsi2 8+__clzsi2 6+__clzsi2 0 exact
__clrsbdi2 18+__clzsi2 (8..10)+__clzsi2 0 exact
__ctzsi2 52 21 0 exact
__ctzsi2 (OPTIMIZE_SIZE) 24 52 0 exact
__ctzdi2 8+__ctzsi2 5+__ctzsi2 0 exact
__ffssi2 8 6..(5+__ctzsi2) 0 exact
__ffsdi2 14+__ctzsi2 9..(8+__ctzsi2) 0 exact
__popcountsi2 52 25 0 exact
__popcountsi2 (OPTIMIZE_SIZE) 14 9..201 0 exact
__popcountdi2 34+__popcountsi2 46 0 exact
__popcountdi2 (OPTIMIZE_SIZE) 12+__popcountsi2 17..401 0 exact
__paritysi2 24 14 0 exact
__paritysi2 (OPTIMIZE_SIZE) 16 38 0 exact
__paritydi2 2+__paritysi2 1+__paritysi2 0 exact
__umulsidi3 44 24 0 exact
__mulsidi3 30+__umulsidi3 24+__umulsidi3 8 exact
__muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 exact
__ashldi3 (__aeabi_llsl) 22 13 0 exact
__lshrdi3 (__aeabi_llsr) 22 13 0 exact
__ashrdi3 (__aeabi_lasr) 22 13 0 exact
__aeabi_lcmp 20 13 0 exact
__aeabi_ulcmp 16 10 0 exact
__udivsi3 (__aeabi_uidiv) 56 72..385 0 < 1 lsb
__divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 < 1 lsb
__udivdi3 (__aeabi_uldiv) 164 103..1394 16 < 1 lsb
__udivdi3 (OPTIMIZE_SIZE) 142 120..1392 16 < 1 lsb
__divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 32 < 1 lsb
__shared_float 178
__shared_float (OPTIMIZE_SIZE) 154
__addsf3 (__aeabi_fadd) 116+__shared_float 31..76 8 <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 <= 0.5 ulp
__subsf3 (__aeabi_fsub) 6+__addsf3 3+__addsf3 8 <= 0.5 ulp
__aeabi_frsub 8+__addsf3 6+__addsf3 8 <= 0.5 ulp
__mulsf3 (__aeabi_fmul) 112+__shared_float 73..97 8 <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 <= 0.5 ulp
__divsf3 (__aeabi_fdiv) 132+__shared_float 83..361 8 <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263..359 8 <= 0.5 ulp
__cmpsf2/__lesf2/__ltsf2 72 33 0 exact
__eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 exact
__gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 exact
__unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 exact
__floatundisf (__aeabi_ul2f) 14+__shared_float 40..81 8 <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40..237 8 <= 0.5 ulp
__floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 <= 0.5 ulp
__floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 <= 0.5 ulp
__floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 <= 0.5 ulp
__fixsfdi (__aeabi_f2lz) 74 27..33 0 exact
__fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 exact
__fixsfsi (__aeabi_f2iz) 52 19 0 exact
__fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 exact
__fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 exact
__extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 exact
__truncsfdf2 (__aeabi_f2d) 88 34 8 exact
__aeabi_d2f 56+__shared_float 54..58 8 <= 0.5 ulp
__aeabi_h2f 34+__shared_float 34 8 exact
__aeabi_f2h 84 23..34 0 <= 0.5 ulp
Copyright assignment is on file with the FSF.
Thanks,
Daniel Engel
[1] // Test program for size comparison
extern int main (void)
{
volatile int x = 1;
volatile unsigned long long int y = 10;
volatile long long int z = x / y; // 64-bit division
volatile float a = x; // 32-bit casting
volatile float b = y; // 64 bit casting
volatile float c = z / b; // float division
volatile float d = a + c; // float addition
volatile float e = c * b; // float multiplication
volatile float f = d - e - c; // float subtraction
if (f != c) // float comparison
y -= (long long int)d; // float casting
}
[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html
--
2.25.1
More information about the Gcc-patches
mailing list