[PATCH v2 0/16][RFC][AArch64/Arm/SVE/SVE2/MVE]middle-end Add support for SLP vectorization of complex number instructions.

Mon Sep 28 11:55:21 GMT 2020

On Fri, 25 Sep 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch series adds support for SLP vectorization of complex instructions [1].
> 
> These instructions exist only in their vector forms and require you to recognize
> two statements in parallel.  Complex operations usually require a permute due to
> the fact that the real and imaginary numbers are stored intermixed but these vector
> instructions expect this and no longer need the compiler to generate a permute.
> 
> For this reason the pass also re-orders the loads in the SLP tree such that they
> become contiguous and no longer need the permutes.  The Basic Blocks are left
> untouched such that the scalar loop will still correctly issue permutes.
> 
> The instructions also support rotations along the Argand plane, as such the operands
> have to be re-ordered to coincide with their load group.
> 
> For now, this patch only adds support for:
> 
>   * Complex Addition with rotation of 0 and 180.
>   * Complex Multiplication and Multiplication where one operand is conjucated.
>   * Complex FMA and FMA where one operand is conjucated.
>   * Complex FMS and FMS where one operand is conjucated.
>   
> Complex dot-product is not currently supported in this patch set as build_slp fails
> for it.  This will be provided as a future patch.
>   
> These are supported for both integer and floating point and as such these don't look
> for real or imaginary pairs but instead rely on the early lowering of complex
> numbers by GCC and canonicazation of the operations such that it just recognizes any
> instruction sequence matching the operations requested.
> 
> To be safe when the it is not sure it can support the operation or if it finds something it
> does not understand it backs off.
> 
> This patch is an RFC and I am looking on feedback on the approach.  Particularly
> this series has one problem which is when it is decided that SLP is not viable
> and that the normal loop vectorizer is to be used.
> 
> In this case I dissolve the changes but the compiler crashes because the use of
> pattern matcher essentially undoes two_operands.  This means that the number of
> copies needed when using the patterns and when not are different.  When using
> the patterns the two operands become the same and so are treated as manually
> unrolled loops.  The problem is that because nunits has already been decided
> along with the unroll factor.  When the dissolved statements are then analyzed
> they fail.  This is also the reason why I cannot analyze both the pattern and
> original statements initially.

That's the same as with "regular" patterns btw., if vectorizing the
pattern fails vectorization fails, we never re-consider and we also
have no way of multiple patterns to choose from.

The way "regular" patterns make this a non-issue is that they try
to only convert things that are likely unhandled/suboptimal and
most likely vectorizable.

That said - the solution to the ICE is to _not_ dissolve the changes and
instead make vectorization fail.

Richard.

> The relavent placed in the source code have comments describing the problem.
> 
> [1] https://developer.arm.com/documentation/ddi0487/fc/
> 
> Thanks,
> Tamar