[PATCH,rs6000] Add pass to optimize away xxpermdi's from vector computations

David Edelsohn dje.gcc@gmail.com
Sun Aug 17 18:52:00 GMT 2014


On Wed, Aug 13, 2014 at 7:14 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patch adds a PowerPC-specific pass just prior to the first cse RTL
> pass.  The pass runs only when generating little-endian code for Power8
> with VSX enabled, and for -O1 and up.  For this particular subtarget,
> the use of the big-endian-biased vector load and store instructions
> requires permutations to order vector elements for little endian.  To
> reduce the overhead of these permutations, this pass looks for
> computations for which the exact lanes in which computations are
> performed does not matter, so long as the results are returned to
> storage in the proper order.  For such computations we can remove the
> xxpermdi's associated with the vector loads and stores.
>
> This patch relies on another patch posted today that converts a struct
> used by the web pass into a base class that this patch can subclass.  If
> it's determined that the other patch isn't appropriate, then this patch
> will need modifications to duplicate the union-find logic.
>
> A complete description of the new pass appears in rs6000.c (search for
> "Analyze vector computations").  That description also identifies some
> remaining opportunities we can follow up with later.
>
> A number of new tests are added to verify that the pass works as
> expected for some vectorized code samples.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2014-08-13  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (context.h): New include.
>         (tree-pass.h): Likewise.
>         (make_pass_analyze_swaps): New decl.
>         (rs6000_option_override): Register pass_analyze_swaps.
>         (swap_web_entry): New subsclass of web_entry_base (df.h).
>         (special_handling_values): New enum.
>         (union_defs): New function.
>         (union_uses): Likewise.
>         (insn_is_load_p): Likewise.
>         (insn_is_store_p): Likewise.
>         (insn_is_swap_p): Likewise.
>         (rtx_is_swappable_p): Likewise.
>         (insn_is_swappable_p): Likewise.
>         (chain_purpose): New enum.
>         (chain_contains_only_swaps): New function.
>         (mark_swaps_for_removal): Likewise.
>         (swap_const_vector_halves): Likewise.
>         (adjust_subreg_index): Likewise.
>         (permute_load): Likewise.
>         (permute_store): Likewise.
>         (handle_special_swappables): Likewise.
>         (replace_swap_with_copy): Likewise.
>         (dump_swap_insn_table): Likewise.
>         (rs6000_analyze_swaps): Likewise.
>         (pass_data_analyze_swaps): New pass_data.
>         (pass_analyze_swaps): New rtl_opt_pass.
>         (make_pass_analyze_swaps): New function.
>         * config/rs6000/rs6000.opt (moptimize-swaps): New option.
>
> [gcc/testsuite]
>
> 2014-08-13  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/swaps-p8-1.c: New test.
>         * gcc.target/powerpc/swaps-p8-2.c: New test.
>         * gcc.target/powerpc/swaps-p8-3.c: New test.
>         * gcc.target/powerpc/swaps-p8-4.c: New test.
>         * gcc.target/powerpc/swaps-p8-5.c: New test.
>         * gcc.target/powerpc/swaps-p8-6.c: New test.
>         * gcc.target/powerpc/swaps-p8-7.c: New test.
>         * gcc.target/powerpc/swaps-p8-8.c: New test.
>         * gcc.target/powerpc/swaps-p8-9.c: New test.
>         * gcc.target/powerpc/swaps-p8-10.c: New test.
>         * gcc.target/powerpc/swaps-p8-11.c: New test.
>         * gcc.target/powerpc/swaps-p8-12.c: New test.

This looks okay, although I was hoping that other developers with more
DF and web experience would double check.

Why are you specifically gating the pass on POWER8?

Thanks, David



More information about the Gcc-patches mailing list