[PATCH] Simple optimization for MASK_STORE.

Yuri Rumyantsev ysrumyan@gmail.com
Tue Jul 7 13:55:00 GMT 2015


Hi Richard,

Did you have a chance to look at this updated patch?

Thanks.
Yuri.

2015-06-18 17:32 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> Here is updated patch which does not include your proposal related to
> the target hook deletion.
> You wrote:
>> I still don't understand why you need the new target hook.  If we have a masked
>> load/store then the mask is computed by an assignment with a VEC_COND_EXPR
>> (in your example) and thus a test for a zero mask is expressible as just
>>
>  > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... })
>>
>> or am I missing something?
> Such vector compare produces vector and does not set up cc flags
> required for conditional branch (GIMPLE_COND).
> If we use such comparison for GIMPLE_COND we got error message, so I
> used target hook which does set up cc flags aka ptest instruction and
> I left this part.
> I moved new routines to loop-vectorizer.c file and both they are static.
> I re-wrote is_valid_sink function to use def-use chain as you proposed.
> I also add new parameter to control such transformation.
> Few redundant tests have also been deleted.
> Any comments will be appreciated.
>
> Thanks.
> Yuri.
>
> 2015-06-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * config/i386/i386.c: Include files stringpool.h and tree-ssanames.h.
> (ix86_vectorize_build_zero_vector_test): New function.
> (TARGET_VECTORIZE_BUILD_ZERO_VECTOR_TEST): New target macro
> * doc/tm.texi.in: Add @hook TARGET_VECTORIZE_BUILD_ZERO_VECTOR_TEST.
> * doc/tm.texi: Updated.
> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM.
> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros.
> * target.def (build_zero_vector_test): New DEFHOOK.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c: Include files stringpool.h and tree-ssanames.h.
> (is_valid_sink): New function.
> (optimize_mask_stores): New function.
> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked
> stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>
>
>
> 2015-06-09 15:13 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, May 20, 2015 at 4:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is updated patch to optimize mask stores. The main goal of it is
>>> to avoid execution of mask store if its mask is zero vector since
>>> loads that follow it can be blocked.
>>> The following changes were done:
>>>   1.  A test on sink legality was added - it simply prohibits to cross
>>> statements with non-null vdef or vuse.
>>>   2. New phi node is created in join block for moved MASK_STORE statements.
>>>   3. Test was changed to check that 2 MASK_STORE statements are not
>>> moved to the same block.
>>>   4. New field was added to loop_vec_info structure to mark loops
>>> having MASK_STORE's.
>>>
>>> Any comments will be appreciated.
>>> Yuri.
>>
>> I still don't understand why you need the new target hook.  If we have a masked
>> load/store then the mask is computed by an assignment with a VEC_COND_EXPR
>> (in your example) and thus a test for a zero mask is expressible as just
>>
>>  if (vect__ifc__41.17_167 == { 0, 0, 0, 0... })
>>
>> or am I missing something?  As we dont' want this transform on all targets
>> (or always) we can control it by either a --param or a new flag which default
>> targets can adjust.  [Is the hazard still present with Skylake or other
>> AVX512 implementations for example?]
>>
>> +/* Helper for optimize_mask_stores: returns true if STMT sinking to end
>> +   of BB is valid and false otherwise.  */
>> +
>> +static bool
>> +is_valid_sink (gimple stmt)
>> +{
>>
>> so STMT is either a masked store or a masked load?  You shouldn't
>> walk all stmts here but instead rely on the factored use-def def-use
>> chains of virtual operands.
>>
>> +void
>> +optimize_mask_stores (struct loop *loop)
>> +{
>> +  basic_block bb = loop->header;
>> +  gimple_stmt_iterator gsi;
>> +  gimple stmt;
>> +  auto_vec<gimple> worklist;
>> +
>> +  if (loop->dont_vectorize
>> +      || loop->num_nodes > 2)
>> +    return;
>>
>> looks like no longer needed given the place this is called from now
>> (or does it guard against outer loop vectorization as well?)
>> Also put it into tree-vect-loop.c and make it static.
>>
>> +      /* Loop was not vectorized if mask does not have vector type.  */
>> +      if (!VECTOR_TYPE_P (TREE_TYPE (mask)))
>> +       return;
>>
>> this should be always false
>>
>> +      store_bb->frequency = bb->frequency / 2;
>> +      efalse->probability = REG_BR_PROB_BASE / 2;
>>
>> I think the == 0 case should end up as unlikely.
>>
>> +      if (dom_info_available_p (CDI_POST_DOMINATORS))
>> +       set_immediate_dominator (CDI_POST_DOMINATORS, bb, store_bb);
>>
>> post dominators are not available in the vectorizer.
>>
>> +         /* Put definition statement of stored value in STORE_BB
>> +            if possible.  */
>> +         arg3 = gimple_call_arg (last, 3);
>> +         if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
>> +           {
>> +             def_stmt = SSA_NAME_DEF_STMT (arg3);
>> +             /* Move def_stmt to STORE_BB if it is in the same bb and
>> +                it is legal.  */
>> +             if (gimple_bb (def_stmt) == bb
>> +                 && is_valid_sink (def_stmt))
>>
>> ok, so you move arbitrary statements.  You can always move non-memory
>> statements and if you keep track of the last store you moved you
>> can check if gimple_vuse () is the same as on that last store and be
>> done with that, or if you sink another store (under the same condition)
>> then just update the last store.
>>
>> Otherwise this looks better now.
>>
>> Thus - get rid of the target hook and of is_valid_sink.
>>
>> Thanks,
>> Richard.
>>
>>> 2015-05-20  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * config/i386/i386.c: Include files stringpool.h and tree-ssanames.h.
>>> (ix86_vectorize_is_zero_vector): New function.
>>> (TARGET_VECTORIZE_IS_ZERO_VECTOR): New target macro
>>> * doc/tm.texi.in: Add @hook TARGET_VECTORIZE_IS_ZERO_VECTOR.
>>> * doc/tm.texi: Updated.
>>> * target.def (is_zero_vector): New DEFHOOK.
>>> * tree-vect-stmts.c : Include tree-into-ssa.h.
>>> (vectorizable_mask_load_store): Initialize has_mask_store field.
>>> (is_valid_sink): New function.
>>> (optimize_mask_stores): New function.
>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>> loops having masked stores.
>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>> correspondent macros.
>>> (optimize_mask_stores): Update prototype.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.



More information about the Gcc-patches mailing list