[PATCH v2 0/16][RFC][AArch64/Arm/SVE/SVE2/MVE]middle-end Add support for SLP vectorization of complex number instructions.
Richard Biener
rguenther@suse.de
Mon Sep 28 11:55:21 GMT 2020
On Fri, 25 Sep 2020, Tamar Christina wrote:
> Hi All,
>
> This patch series adds support for SLP vectorization of complex instructions [1].
>
> These instructions exist only in their vector forms and require you to recognize
> two statements in parallel. Complex operations usually require a permute due to
> the fact that the real and imaginary numbers are stored intermixed but these vector
> instructions expect this and no longer need the compiler to generate a permute.
>
> For this reason the pass also re-orders the loads in the SLP tree such that they
> become contiguous and no longer need the permutes. The Basic Blocks are left
> untouched such that the scalar loop will still correctly issue permutes.
>
> The instructions also support rotations along the Argand plane, as such the operands
> have to be re-ordered to coincide with their load group.
>
> For now, this patch only adds support for:
>
> * Complex Addition with rotation of 0 and 180.
> * Complex Multiplication and Multiplication where one operand is conjucated.
> * Complex FMA and FMA where one operand is conjucated.
> * Complex FMS and FMS where one operand is conjucated.
>
> Complex dot-product is not currently supported in this patch set as build_slp fails
> for it. This will be provided as a future patch.
>
> These are supported for both integer and floating point and as such these don't look
> for real or imaginary pairs but instead rely on the early lowering of complex
> numbers by GCC and canonicazation of the operations such that it just recognizes any
> instruction sequence matching the operations requested.
>
> To be safe when the it is not sure it can support the operation or if it finds something it
> does not understand it backs off.
>
> This patch is an RFC and I am looking on feedback on the approach. Particularly
> this series has one problem which is when it is decided that SLP is not viable
> and that the normal loop vectorizer is to be used.
>
> In this case I dissolve the changes but the compiler crashes because the use of
> pattern matcher essentially undoes two_operands. This means that the number of
> copies needed when using the patterns and when not are different. When using
> the patterns the two operands become the same and so are treated as manually
> unrolled loops. The problem is that because nunits has already been decided
> along with the unroll factor. When the dissolved statements are then analyzed
> they fail. This is also the reason why I cannot analyze both the pattern and
> original statements initially.
That's the same as with "regular" patterns btw., if vectorizing the
pattern fails vectorization fails, we never re-consider and we also
have no way of multiple patterns to choose from.
The way "regular" patterns make this a non-issue is that they try
to only convert things that are likely unhandled/suboptimal and
most likely vectorizable.
That said - the solution to the ICE is to _not_ dissolve the changes and
instead make vectorization fail.
Richard.
> The relavent placed in the source code have comments describing the problem.
>
> [1] https://developer.arm.com/documentation/ddi0487/fc/
>
> Thanks,
> Tamar
More information about the Gcc-patches
mailing list