[PATCH] libgcc: Thumb-1 Floating-Point Library for Cortex M0

Daniel Engel libgcc@danielengel.com
Thu Nov 12 23:04:01 GMT 2020


Hi, 

This patch adds an efficient assembly-language implementation of IEEE-754 compliant floating point routines for Cortex M0 EABI (v6m, thumb-1).  This is the libgcc portion of a larger library originally described in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

Since that time, I've separated the libm functions for submission to newlib.  The remaining libgcc functions in the attached patch have the following characteristics:

    Function(s)                     Size (bytes)        Cycles          Stack   Accuracy
    __clzsi2                        42                  23              0       exact
    __clzsi2 (OPTIMIZE_SIZE)        22                  55              0       exact
    __clzdi2                        8+__clzsi2          4+__clzsi2      0       exact
   
    __umulsidi3                     44                  24              0       exact
    __mulsidi3                      30+__umulsidi3      24+__umulsidi3  8       exact
    __muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3   0       exact
    __ashldi3 (__aeabi_llsl)        22                  13              0       exact
    __lshrdi3 (__aeabi_llsr)        22                  13              0       exact
    __ashrdi3 (__aeabi_lasr)        22                  13              0       exact
   
    __aeabi_lcmp                    20                   13             0       exact
    __aeabi_ulcmp                   16                  10              0       exact
   
    __udivsi3 (__aeabi_uidiv)       56                  72 – 385        0       < 1 lsb
    __divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3    8       < 1 lsb
    __udivdi3 (__aeabi_uldiv)       164                 103 – 1394      16      < 1 lsb
    __udivdi3 (OPTIMIZE_SIZE)       142                 120 – 1392      16      < 1 lsb
    __divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3    32      < 1 lsb
   
    __shared_float                  178        
    __shared_float (OPTIMIZE_SIZE)  154   
        
    __addsf3 (__aeabi_fadd)         116+__shared_float  31 – 76         8       <= 0.5 ulp
    __addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74              8       <= 0.5 ulp
    __subsf3 (__aeabi_fsub)         8+__addsf3          6+__addsf3      8       <= 0.5 ulp
    __aeabi_frsub                   8+__addsf3          6+__addsf3      8       <= 0.5 ulp
    __mulsf3 (__aeabi_fmul)         112+__shared_float  73 – 97         8       <= 0.5 ulp
    __mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93              8       <= 0.5 ulp
    __divsf3 (__aeabi_fdiv)         132+__shared_float  83 – 361        8       <= 0.5 ulp
    __divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263 – 359       8       <= 0.5 ulp
   
    __cmpsf2/__lesf2/__ltsf2        72                  33              0       exact
    __eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2      0       exact
    __gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2      0       exact
    __unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2      0       exact
   
    __floatundisf (__aeabi_ul2f)    14+__shared_float   40 – 81         8       <= 0.5 ulp
    __floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40 – 237        8       <= 0.5 ulp
    __floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf 8       <= 0.5 ulp
    __floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf 8       <= 0.5 ulp
    __floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf   8       <= 0.5 ulp
   
    __fixsfdi (__aeabi_f2lz)        74                  27 – 33         0       exact
    __fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi     0       exact
    __fixsfsi (__aeabi_f2iz)        52                  19              0       exact
    __fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi     0       exact
    __fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi     0       exact
     
    __extendsfdf2 (__aeabi_f2d)     42+__shared_float 38             8     exact
    __aeabi_d2f                     56+__shared_float 54 – 58     8     <= 0.5 ulp
    __aeabi_h2f                     34+__shared_float 34             8     exact
    __aeabi_f2h                     84                 23 – 34         0     <= 0.5 ulp

Copyright assignment is on file with the FSF.  

I've built the gcc-arm-none-eabi cross-compiler using the 20201108 snapshot of GCC plus this patch, and successfully compiled a test program:

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division
      
        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction
      
        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

As one point of comparison, the test program links to 876 bytes of libgcc code from the patched toolchain, vs 10276 bytes from the latest released gcc-arm-none-eabi-9-2020-q2 toolchain.    That's a 90% size reduction.  

I have extensive test vectors, and have passed these tests on an STM32F051.  These vectors were derived from UCB [1], Testfloat [2], and IEEECC754 [3] sources, plus some of my own creation.  Unfortunately, I'm not sure how "make check" should work for a cross compiler run time library.  

Although I believe this patch can be incorporated as-is, there are at least two points that might bear discussion: 

* I'm not sure where or how they would be integrated, but I would be happy to provide sources for my test vectors.  

* The library is currently built for the ARM v6m architecture only.  It is likely that some of the other Cortex variants would benefit from these routines.  However, I would need some guidance on this to proceed without introducing regressions.  I do not currently have a test strategy for architectures beyond Cortex M0, and I have NOT profiled the existing thumb-2 implementations (ieee754-sf.S) for comparison.

I'm naturally hoping for some action on this patch before the Nov 16th deadline for GCC-11 stage 3.  Please review and advise.  

Thanks,
Daniel Engel

[1] http://www.netlib.org/fp/ucbtest.tgz
[2] http://www.jhauser.us/arithmetic/TestFloat.html
[3] http://win-www.uia.ac.be/u/cant/ieeecc754.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cortex-m0-fplib-20201112.patch
Type: application/octet-stream
Size: 133513 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20201112/c613f607/attachment-0001.obj>


More information about the Gcc-patches mailing list