[unified-autovect: Patch 2/N] Implementation of k-arity promotion/reduction

Sameera Deshpande Sameera.Deshpande@imgtec.com
Mon Feb 6 07:42:00 GMT 2017


Hi Richard,

Sorry for delayed patch submission. I was on maternity leave, so could not post earlier.
Here is the previous mail for your reference: https://gcc.gnu.org/ml/gcc/2016-06/msg00043.html

Please find attached the patch for stage 2: implementation of k-arity promotion/reduction in the series "Improving effectiveness and generality of autovectorization using unified representation".

The permute nodes within primitive reorder tree(PRT) generated from input program can have any arity depending upon stride of accesses. However, the target cannot have instructions to support all arities. Hence, we need to promote or reduce the arity of PRT to enable successful tree tiling.

In classic autovectorization, if vectorization stride > 2, arity reduction is performed by generating cascaded extract and interleave instructions as described by "Auto-vectorization of Interleaved Data for SIMD" by D. Nuzman, I. Rosen and A. Zaks.  

Moreover, to enable SLP across loop, "Loop-aware SLP in GCC" by D. Nuzman, I. Rosen and A. Zaks unrolls loop till stride = vector size.

k-arity reduction/promotion algorithm makes use of modulo arithmetic to generate PRT of desired arity for both above-mentioned cases.

Single ILV node of arity k can be reduced into cascaded ILV nodes with single node of arity m with children of arity k/m such that ith child of original ILV node becomes floor (i/m) th child of (i%m) th child of new parent.

Single EXTR node with k parts and i selector can be reduced into cascaded EXTR nodes such that parent EXTR node has m parts and i/(k/m) selection on child EXTR node with k/m parts and i % (k/m) selection.

Similarly, loop unrolling to get desired arity m can be represented as arity promotion from k to m.

Single ILV node of arity k can be promoted to single ILV node of arity m by adding extraction with m/k parts and selection i/k of i%k the child of original tree as ith child of new ILV node.

To enable loop-aware SLP, we first promote arity of input PRT to maximum vector size permissible on the architecture. This can have impact on vector code size, though performance will be the same. However, to allow variable vector size like SVE in NEON, it is necessary.

Later we apply arity promotion reduction algorithm on the output tree to get tree with desired arity. For now, we are supporting target arity = 2, as most of the architectures have support for that. However, the code can be extended for additional arity supports as well.

I have tested the code with handwritten testcases for correctness.
Do you spot any problem in the logic or arithmetic that I am performing for reduction/promotion? If not, will push this patch on the branch that we have created - unified-autovect.

- Thanks and regards,
  Sameera D.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arity_promotion_reduction.patch
Type: text/x-patch
Size: 30022 bytes
Desc: arity_promotion_reduction.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20170206/268d6e74/attachment.bin>


More information about the Gcc-patches mailing list