[PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops
Andre Vieira (lists)
andre.simoesdiasvieira@arm.com
Mon Oct 28 18:37:00 GMT 2019
Hi,
Reworked according to your comments, see inline for clarification.
Is this OK for trunk?
gcc/ChangeLog:
2019-10-28 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR 88915
* tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
and make the valueize function pointer also take a void pointer.
* gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
around vn_valueize, to call it without a context.
(process_bb): Use vn_valueize_wrapper instead of vn_valueize.
* tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
(~_loop_vec_info): Release epilogue_vinfos.
(vect_analyze_loop_costing): Use knowledge of main VF to estimate
number of iterations of epilogue.
(vect_analyze_loop_2): Adapt to analyse main loop for all supported
vector sizes when vect-epilogues-nomask=1. Also keep track of lowest
versioning threshold needed for main loop.
(vect_analyze_loop): Likewise.
(find_in_mapping): New helper function.
(update_epilogue_loop_vinfo): New function.
(vect_transform_loop): When vectorizing epilogues re-use analysis done
on main loop and call update_epilogue_loop_vinfo to update it.
* tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
stmts on loop preheader edge.
(vect_do_peeling): Enable skip-vectors when doing loop versioning if
we decided to vectorize epilogues. Update epilogues NITERS and
construct ADVANCE to update epilogues data references where needed.
* tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
(vect_do_peeling, vect_update_inits_of_drs,
determine_peel_for_niter, vect_analyze_loop): Add or update
declarations.
* tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
created loop_vec_info's for epilogues when available. Otherwise
analyse
epilogue separately.
Cheers,
Andre
On 28/10/2019 14:16, Richard Biener wrote:
> On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>> This is the reworked patch after your comments.
>>
>> I have moved the epilogue check into the analysis form disguised under
>> '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing the
>> "lowest threshold" check there.
>>
>> The only place where we may reject an epilogue_vinfo is when we know the
>> number of scalar iterations and we realize the number of iterations left after
>> the main loop are not enough to enter the vectorized epilogue so we optimize
>> away that code-gen. The only way we know this to be true is if the number of
>> scalar iterations are known and the peeling for alignment is known. So we know
>> we will enter the main loop regardless, so whether the threshold we use is for
>> a lower VF or not it shouldn't matter as much, I would even like to think that
>> check isn't done, but I am not sure... Might be worth checking as an
>> optimization.
>>
>>
>> Is this OK for trunk?
>
> + for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
> + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
> + {
> ..
> + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
> + pattern_worklist.safe_push (stmt_vinfo);
> +
> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> + while (related_vinfo && related_vinfo != stmt_vinfo)
> + {
>
> I think PHIs cannot have patterns. You can assert
> that STMT_VINFO_RELATED_STMT is NULL I think.
Done.
>
> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
> + while (related_vinfo && related_vinfo != stmt_vinfo)
> + {
> + related_worklist.safe_push (related_vinfo);
> + /* Set BB such that the assert in
> + 'get_initial_def_for_reduction' is able to determine that
> + the BB of the related stmt is inside this loop. */
> + gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
> + gimple_bb (new_stmt));
> + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
> + }
>
> do we really keep references to "nested" patterns? Thus, do you
> need this loop?
Changed and added asserts. They didn't trigger so I suppose you are
right, I didn't know at the time whether it was possible, so I just
operated on the side of caution. Can remove the asserts and so on if
you want.
>
> + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the
> + original main loop and thus need to be updated to refer to the
> cloned
> + variables used in the epilogue. */
> + for (unsigned i = 0; i < pattern_worklist.length (); ++i)
> + {
> ...
> + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
> + &find_in_mapping, &mapping);
> + gimple_set_op (seq, j, op);
>
> you do this for the pattern-def seq but not for the related one.
> I guess you ran into this for COND_EXPR conditions. I wondered
> to use a shared worklist for both the def-seq and the main pattern
> stmt or at least to split out the replacement so you can share it.
I think that was it yeah, reworked it now to use the same list. Less
code, thanks!
>
> + /* Data references for gather loads and scatter stores do not use
> the
> + updated offset we set using ADVANCE. Instead we have to make
> sure the
> + reference in the data references point to the corresponding copy
> of
> + the original in the epilogue. */
> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> + {
> + int j;
> + if (TREE_CODE (DR_REF (dr)) == MEM_REF)
> + j = 0;
> + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF)
> + j = 1;
> + else
> + gcc_unreachable ();
> +
> + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j)))
> + {
> + DR_REF (dr) = unshare_expr (DR_REF (dr));
> + TREE_OPERAND (DR_REF (dr), j) = *new_op;
> + }
>
> huh, do you really only ever see MEM_REF or ARRAY_REF here?
> I would guess using simplify_replace_tree is safer.
> There's also DR_BASE_ADDRESS - we seem to leave the DRs partially
> updated, is that correct?
Yeah can use simplify_replace_tree indeed. And I have changed it so it
updates DR_BASE_ADDRESS. I think DR_BASE_ADDRESS never actually changed
in the way we use data_references... Either way, replacing them if they
do change is cleaner and more future proof.
>
> Otherwise looks OK to me.
>
> Thanks,
> Richard.
>
>
>> gcc/ChangeLog:
>> 2019-10-25 Andre Vieira <andre.simoesdiasvieira@arm.com>
>>
>> PR 88915
>> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
>> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
>> and make the valueize function pointer also take a void pointer.
>> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
>> around vn_valueize, to call it without a context.
>> (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
>> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
>> (~_loop_vec_info): Release epilogue_vinfos.
>> (vect_analyze_loop_costing): Use knowledge of main VF to estimate
>> number of iterations of epilogue.
>> (vect_analyze_loop_2): Adapt to analyse main loop for all supported
>> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest
>> versioning threshold needed for main loop.
>> (vect_analyze_loop): Likewise.
>> (find_in_mapping): New helper function.
>> (update_epilogue_loop_vinfo): New function.
>> (vect_transform_loop): When vectorizing epilogues re-use analysis done
>> on main loop and call update_epilogue_loop_vinfo to update it.
>> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
>> stmts on loop preheader edge.
>> (vect_do_peeling): Enable skip-vectors when doing loop versioning if
>> we decided to vectorize epilogues. Update epilogues NITERS and
>> construct ADVANCE to update epilogues data references where needed.
>> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
>> (vect_do_peeling, vect_update_inits_of_drs,
>> determine_peel_for_niter, vect_analyze_loop): Add or update declarations.
>> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
>> created loop_vec_info's for epilogues when available. Otherwise
>> analyse
>> epilogue separately.
>>
>>
>>
>> Cheers,
>> Andre
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: epilogues_1_7.patch
Type: text/x-patch
Size: 37977 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20191028/0751a32b/attachment.bin>
More information about the Gcc-patches
mailing list