[Bug target/100627] missing optimization

Sun May 16 19:44:06 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a target issue dealing with how uint64_t ->float/double conversions are
done.
On aarch64 for cvt_f64_std we get good code at -O3:
cvt_f64_std(std::array<double, 16ul>&, std::array<unsigned long, 16ul> const&):
        ldp     q7, q6, [x1]
        ldp     q5, q4, [x1, 32]
        ldp     q3, q2, [x1, 64]
        ldp     q1, q0, [x1, 96]
        ucvtf   v7.2d, v7.2d
        ucvtf   v6.2d, v6.2d
        ucvtf   v5.2d, v5.2d
        ucvtf   v4.2d, v4.2d
        ucvtf   v3.2d, v3.2d
        ucvtf   v2.2d, v2.2d
        stp     q7, q6, [x0]
        ucvtf   v1.2d, v1.2d
        stp     q5, q4, [x0, 32]
        ucvtf   v0.2d, v0.2d
        stp     q3, q2, [x0, 64]
        stp     q1, q0, [x0, 96]
        ret

The other function is:
cvt_f32_std(std::array<float, 16ul>&, std::array<unsigned long, 16ul> const&):
        ldp     x3, x2, [x1]
        ucvtf   s7, x2
        ucvtf   s3, x3
        ldp     x3, x2, [x1, 32]
        ins     v3.s[1], v7.s[0]
        ucvtf   s6, x2
        ucvtf   s2, x3
        ldp     x3, x2, [x1, 64]
        ins     v2.s[1], v6.s[0]
        ucvtf   s5, x2
        ucvtf   s1, x3
        ldp     x3, x2, [x1, 96]
        ins     v1.s[1], v5.s[0]
        ucvtf   s4, x2
        ucvtf   s0, x3
        ldr     x2, [x1, 48]
        ldr     x3, [x1, 16]
        ucvtf   s17, x2
        ldr     x2, [x1, 112]
        ucvtf   s18, x3
        ldr     x3, [x1, 80]
        ins     v0.s[1], v4.s[0]
        ucvtf   s4, x2
        ucvtf   s16, x3
        ldr     x2, [x1, 24]
        ldr     x3, [x1, 56]
        ucvtf   s7, x2
        ldr     x2, [x1, 88]
        ucvtf   s6, x3
        ldr     x1, [x1, 120]
        ucvtf   s5, x2
        ins     v3.s[2], v18.s[0]
        ins     v2.s[2], v17.s[0]
        ins     v1.s[2], v16.s[0]
        ins     v0.s[2], v4.s[0]
        ucvtf   s4, x1
        ins     v3.s[3], v7.s[0]
        ins     v2.s[3], v6.s[0]
        ins     v1.s[3], v5.s[0]
        ins     v0.s[3], v4.s[0]
        stp     q3, q2, [x0]
        stp     q1, q0, [x0, 32]
        ret