[Bug tree-optimization/82074] New: [aarch64] vmlsq_f32 compiled into 2 instructions
gcc.account at lemaitre dot re
gcc-bugzilla@gcc.gnu.org
Fri Sep 1 14:28:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82074
Bug ID: 82074
Summary: [aarch64] vmlsq_f32 compiled into 2 instructions
Product: gcc
Version: 7.2.0
URL: https://godbolt.org/g/jWvmxS
Status: UNCONFIRMED
Keywords: TREE
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gcc.account at lemaitre dot re
Target Milestone: ---
Created attachment 42100
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42100&action=edit
simplest example showing the bug
On aarch64, the Neon intrinsic "vmlsq_f32" is compiled into:
fneg v1.4s, v1.4s
fmla v0.4s, v1.4s, v2.4s
instead of:
fmls v0.4s, v1.4s, v2.4s
The same output is produced by all the following expressions:
vmlsq_f32(a, b, c)
a - b*c
vsubq_f32(a, vmulq_f32(b, c))
The example has been compiled with gcc -O3
I tested on GCC 4.8.5, GCC 6.3.0 and GCC 7.2.0. All of them has the bug.
The bug is also present at -O1, but with a slightly different output:
fmul v1.4s, v1.4s, v2.4s
fsub v0.4s, v0.4s, v1.4s
If it can help, here is a godbolt link that shows the bug:
https://godbolt.org/g/jWvmxS
Sometimes, depending on the surrounding, it is successfully converted into the
FMLS instruction, but never on the attached example.
More information about the Gcc-bugs
mailing list