[Bug tree-optimization/82074] New: [aarch64] vmlsq_f32 compiled into 2 instructions

gcc.account at lemaitre dot re gcc-bugzilla@gcc.gnu.org
Fri Sep 1 14:28:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82074

            Bug ID: 82074
           Summary: [aarch64] vmlsq_f32 compiled into 2 instructions
           Product: gcc
           Version: 7.2.0
               URL: https://godbolt.org/g/jWvmxS
            Status: UNCONFIRMED
          Keywords: TREE
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gcc.account at lemaitre dot re
  Target Milestone: ---

Created attachment 42100
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42100&action=edit
simplest example showing the bug

On aarch64, the Neon intrinsic "vmlsq_f32" is compiled into:
    fneg  v1.4s, v1.4s
    fmla  v0.4s, v1.4s, v2.4s

instead of:
    fmls  v0.4s, v1.4s, v2.4s

The same output is produced by all the following expressions:
    vmlsq_f32(a, b, c)
    a - b*c
    vsubq_f32(a, vmulq_f32(b, c))


The example has been compiled with gcc -O3
I tested on GCC 4.8.5, GCC 6.3.0 and GCC 7.2.0. All of them has the bug.
The bug is also present at -O1, but with a slightly different output:
    fmul    v1.4s, v1.4s, v2.4s
    fsub    v0.4s, v0.4s, v1.4s

If it can help, here is a godbolt link that shows the bug:
https://godbolt.org/g/jWvmxS

Sometimes, depending on the surrounding, it is successfully converted into the
FMLS instruction, but never on the attached example.


More information about the Gcc-bugs mailing list