This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC]More conservative interchanging small loops with const initialized simple reduction


On Fri, Dec 8, 2017 at 2:40 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 1:43 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Dec 8, 2017 at 12:17 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Dec 8, 2017 at 12:46 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> This simple patch makes interchange even more conservative for small loops with constant initialized simple reduction.
>>>> The reason is undoing such reduction introduces new data reference and cond_expr, which could cost too much in a small
>>>> loop.
>>>> Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Is it OK if test passes?
>>>
>>> Shouldn't we do this even for non-constant initialzied simple
>>> reduction?  Because for any simple
>>> reduction we add two DRs that are not innermost, for constant
>>> initialized we add an additional
>>> cond-expr.  So ...
>>>
>>> +  /* Conservatively skip interchange in cases only have few data references
>>> +     and constant initialized simple reduction since it introduces new data
>>> +     reference as well as ?: operation.  */
>>> +  if (num_old_inv_drs + num_const_init_simple_reduc * 2 >= datarefs.length ())
>>> +    return false;
>>> +
>>>
>>> can you, instead of carrying num_const_init_simple_reduc simply loop
>>> over m_reductions
>>> and classify them in this function accordingly?  I think we want to
>>> cost non-constant-init
>>> reductions as well.  The :? can eventually count for another DR for
>>> cost purposes.
>> Number of non-constant-init reductions can still be carried in struct
>> loop_cand?  I am not very sure what's the advantage of an additional
>> loop over m_reductions getting the same information.
>> Perhaps the increase of stmts should be counted like:
>>   num_old_inv_drs + num_const_init_simple_reduc * 2 - num_new_inv_drs
>> Question is which number should this be compared against.  (we may
>> need to shift num_new_inv_drs to the other side for wrapping issue).
>>
>>>
>>> It looks like we do count the existing DRs for the reduction?  Is that
>>> why you arrive
>>> at the num_const_init_simple_reduc * 2 figure? (one extra load plus one ?:)
>> Yes.
>>> But we don't really know whether the DR was invariant in the outer
>>> loop (well, I suppose
>> Hmm, I might misunderstand here.  num_old_inv_drs tracks the number of
>> invariant reference with regarding to inner loop, rather than the
>> outer loop.  The same to num_new_inv_drs,
>> which means a reference become invariant after loop interchange with
>> regarding to (the new) inner loop.  This invariant information is
>> always known from data reference, right?
>> As for DRs for reduction, we know it's invariant because we set its
>> inner loop stride to zero.
>>
>>> we could remember the DR in m_reductions).
>>>
>>> Note that the good thing is that the ?: has an invariant condition and
>>> thus vectorization
>>> can hoist the mask generation out of the vectorized loop which means
>>> it boils down to
>>> cheap operations.  My gut feeling is that just looking at the number
>>> of memory references
>>> isn't a good indicator of profitability as the regular stmt workload
>>> has a big impact on
>>> profitability of vectorization.
>> It's not specific to vectorization.  The generated new code also costs
>> too much in small loops without vectorization.  But yes, # of mem_refs
>> may be too inaccurate, maybe we should check against num_stmts.
>
> Not specific to vectorization but the interchange may pay off only when
> vectorizing a loop.  Would the loop in loop-interchange-5.c be still
> interchanged?  If we remove the multiplication and just keep
> c[i][j] = c[i][j] + b[k][j];
> ?  That is, why is the constant init so special?  Even for non-constant init
> we're changing two outer loop DRs to two non-consecutive inner loop DRs.
Hi Richard,
This is updated patch taking stmt cost into consideration.

Firstly stmt cost (from # of stmt)
of loops are recorded.  Then stmt cost of outer loop is adjusted by decreasing
number of IVs, increasing by the number of constant initialized simple
reductions.
Lastly we check stmt cost between inner/outer loops and give up on interchange
if outer loop has too many stmts.

Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Bootstrap and test
on x86_64 andAArch64.  Any comment?

Thanks,
bin
2017-12-12  Bin Cheng  <bin.cheng@arm.com>

    * gimple-loop-interchange.cc (STMT_COST_RATIO): New macro.
    (loop_cand::m_num_stmts, loop_cand::m_const_init_reduc): New members.
    (loop_cand::loop_cand): Initialize above members.
    (loop_cand::supported_operations): Delete.
    (loop_cand::can_interchange_p): Inline above function.
    (loop_cand::classify_simple_reduction): Record number of constant
    initialized simple reductions.
    (should_interchange_loops): New parameters.  Check stmt cost of loops
    to be interchange.
    (tree_loop_interchange::interchange): Prepare stmt cost of outer loop.
    Update call to should_interchange_loops.
    (should_interchange_loop_nest): Update call to
    should_interchange_loops.

Attachment: cost-model-20171212.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]