This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][gimple-linterchange] Rewrite compute_access_stride


On Fri, Dec 1, 2017 at 2:26 PM, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 1 Dec 2017, Bin.Cheng wrote:
>
>> On Fri, Dec 1, 2017 at 12:31 PM, Richard Biener <rguenther@suse.de> wrote:
>> >
>> > This is the access stride computation change.  Apart from the
>> > stride extraction I adjusted the cost model to handle non-constant
>> > strides by checking if either is a multiple of the other and
>> > simply fail interchanging if it's the wrong way around for one
>> > ref or if the simple method using multiple_of_p fails to determine
>> > either case.
>> >
>> > This still handles the bwaves case.
>> >
>> > I think we want additional testcases with variable strides for each
>> > case we add - I believe this is the most conservative way to treat
>> > variable strides.
>> >
>> > It may be inconsistent with the constant stride handling where you
>> > allow for many OK DRs to outweight a few not OK DRs, but as it
>> > worked for bwaves it must be good enough ;)
>> >
>> > Tested on x86_64-unknown-linux-gnu (just the interchange testcases).
>> >
>> > Currently running a bootstrap with -O3 -g -floop-interchange.
>> >
>> > Ok for the branch?
>> Ok.  This actually is closer the motivation: simple/conservative cost
>> model that only transforms code when it's known to be good.
>> I will check the impact on the number of interchange in spec.
>
> Few high-level observations.
>
> In tree_loop_interchange::interchange we try interchanging adjacent
> loops, starting from innermost with outer of innermost.  This way
> the innermost loop will bubble up as much as possible.  But we
> don't seem to handle bulling multiple loops like for
>
>  for (i=0; i<n; ++i)
>   for (j=0; j<n; ++j)
>     for (k=0; k<n ++k)
>       a[j][k][i] = 1.;
>
> because the innermost two loops are correctly ordered so we then
> try interchanging the k and the i loop which succeeds but then
> we stop.  So there's something wrong in the iteration scheme.
> I would have expected it to be quadratic, basically iterating
> the ::interchange loop until we didn't perform any interchange
> (or doing sth more clever).
Yes, I restricted it to be a single pass process in loop nest.
Ideally we could create a vector of loop_cand for the whole loop nest,
then sort/permute loops wrto computed strides.

>
> loop_cand::can_interchange_p seems to perform per BB checks
> (supported_operations, num_stmts) that with the way we interchange
> should disallow any such BB in a loop that we interchange or
> interchange across.  That means it looks like sth we should
> perform early, like during data dependence gathering by for
> example inlining find_data_references_in_bb and doing those
> per-stmt checks there?
Yes.  The only problem is the check on the reduction.  We can build up
all loop_cand earlier, or simply move non-reduction checks earlier.

>
> In prepare_perfect_loop_nest we seem to be somewhat quadratic
> in the way we re-compute dependences if doing so failed
> (we also always just strip the outermost loop while the failing
> DDR could involve a DR that is in an inner loop).  I think it
> should be possible to re-structure this computing dependences
> from inner loop body to outer loop bodies (the ddrs vector
> is, as opposed to the dr vector, unsorted I think).
Even more, we would want interface in tree-data-ref.c so that data
dependence can be computed/assembled level by level in loop nest.
loop distribution can be benefited as well.

> I haven't fully thought this out yet though - a similar
> iteration scheme could improve DR gathering though that's not
> so costly.
I can change this one now.

>
> Overall we should try improving on function names, we have
> valid_data_dependences, can_interchange_loops, should_interchange_loops,
> can_interchange_p which all are related but do slightly different
> things.  My usual approach is to inline all single-use functions
> to improve things (and make program flow more visible).  But
> I guess that's too much implementation detail.
>
> Didn't get to the IV re-mapping stuff yet but you (of course)
> end up with quite some dead IVs when iterating the interchange.
> You seem to add a new canonical IV just to avoid rewriting the
> existing exit test, right?  Defering that to the "final"
> interchange on a nest should avoid those dead IVs.
Hmm, with the help pf new created dce interface, all dead IV/RE will
be deleted after this pass.  Note for current implementation, dead
code can be generated from mapped IV, canonical IV and the reduction.
Deferring adding canonical IV looks not practical wrto current level
by level interchange, because inner loop's IV is needed for
interchange?

>
> Will now leave for the weekend.
Have a nice WE!

Thanks,
bin
>
> Thanks,
> Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]