This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][ARM] Rewrite vc<cond> NEON patterns to use RTL operations rather than UNSPECs


btw, sorry if the diff looks hard to parse. Some patterns are deleted and replaced with similar-looking ones, which makes the diffs look weird. I've tried a few diff algorithms but this is the best I got.


On 04/02/15 12:12, Kyrill Tkachov wrote:
Hi all,

This patch improves the vc<cond> patterns in to use proper RTL
operations rather than UNSPECS.
It is done in a similar way to the analogous aarch64 operations i.e.
vceq is expressed as
(neg (eq (...) (...)))
since we want to write all 1s to the result element when 'eq' holds and
0s otherwise.

The catch is that the floating-point comparisons can only be expanded to
the RTL codes when -funsafe-math-optimizations is given and they must
continue to use the UNSPECS otherwise.
For this I've created a define_expand that generates
the correct RTL depending on -funsafe-math-optimizations and two
define_insns to match the result: one using the RTL codes and one using

I've also compressed some of the patterns together using iterators for
the [eq gt ge le lt] cases.
NOTE: for le and lt before this patch we would never generate
'vclt.<type> dm, dn, dp' instructions, only 'vclt.<type> dm, dn, #0'.
With this patch we can now generate 'vclt.<type> dm, dn, dp' assembly.
According to the ARM ARM this is just a pseudo-instruction that mapps to
vcgt with the operands swapped around.
I've confirmed that gas supports this code.

The vcage and vcagt patterns are rewritten to use the form:
      (abs (...))
      (abs (...))))

and condensed together using iterators as well.

Bootstrapped and tested on arm-none-linux-gnueabihf, made sure that the
advanced-simd-intrinsics testsuite is passing
(it did catch some bugs during development of this patch) and tried out
other NEON intrinsics codebases.

The test now generates 'vclt.<type> dn,
dm, #0' instructions where appropriate instead of the previous vmov of
#0 into a temp and then a 'vcgt.<type> dn, temp, dm'.
I think that is correct behaviour since the test was trying to make sure
that we didn't generate a .u<size>-typed comparison with #0, which is
what the PR was talking about (from what I can gather).

What do people think of this approach?
I'm proposing this for next stage1, of course.


2015-02-04  Kyrylo Tkachov  <>

      * config/arm/ (GTGE, GTUGEU, COMPARISONS): New code
      (cmp_op, cmp_type): New code attributes.
      (NEON_VCMP, NEON_VACMP): New int iterators.
      (cmp_op_unsp): New int attribute.
      * config/arm/ (neon_vc<cmp_op><mode>): New define_expand.
      (neon_vceq<mode>): Delete.
      (neon_vc<cmp_op><mode>_insn): New pattern.
      (neon_vc<cmp_op_unsp><mode>_insn_unspec): Likewise.
      (neon_vcgeu<mode>): Delete.
      (neon_vcle<mode>): Likewise.
      (neon_vclt<mode>: Likewise.
      (neon_vcage<mode>): Likewise.
      (neon_vcagt<mode>): Likewise.
      (neon_vca<cmp_op><mode>): New define_expand.
      (neon_vca<cmp_op><mode>_insn): New pattern.
      (neon_vca<cmp_op_unsp><mode>_insn_unspec): Likewise.

2015-02-04  Kyrylo Tkachov  <>

      * Update vcg* scan-assembly patterns
      to look for vcl* where appropriate.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]