Move pass_oacc_device_lower after pass_graphite

Richard Biener richard.guenther@gmail.com
Fri Nov 6 12:45:31 GMT 2020


On Fri, Nov 6, 2020 at 12:18 PM Frederik Harwath
<frederik@codesourcery.com> wrote:
>
>
> Hi Richard,
>
> Richard Biener <richard.guenther@gmail.com> writes:
>
> > On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath
>
> > What's on my TODO list (or on the list of things to explore) is to make
> > the dump file names/suffixes explicit in passes.def like via
> >
> >   NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")
> >
> > and we'd get a dump named .ccp_oacc or so.
>
> That would be very helpful for avoiding the drudgery of adapting those
> pass numbers!
>
> > Now, what does oacc_device_lower actually do that you need to
> > re-run complex lowering?  What does cunrolli do at this point that
> > the complete_unroll pass later does not do?
> >
>
> Good spot, "cunrolli" seems to be unnecessary.  The complex lowering is
> necessary to handle the code that gets created by the OpenACC reduction
> lowering during oaccdevlow.  I have attached a test case (a reduced
> version of
> libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which
> shows that the complex instructions are created by
> pass_oacc_device_lower and which leads to an ICE if compiled without the
> new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2").
> The problem is an unlowered addition. This is from a diff of the dump of
> the pass following oaccdevlow1 (ccp4) with disabled and with enabled
> tree-cplxlower2:
>
> <   _91 = VIEW_CONVERT_EXPR<complex float>(_1);
> <   _92 = reduction_var_2 + _91;
> ---
> >   _104 = REALPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;
> >   _105 = IMAGPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;
> >   _91 = COMPLEX_EXPR <_104, _105>;
> >   _106 = reduction_var$real_100 + _104;
> >   _107 = reduction_var$imag_101 + _105;
> >   _92 = COMPLEX_EXPR <_106, _107>;

I wonder if oacc device lowering could handle this itself rather than
requiring another cplxlower pass for presumably just complex add?

> > What's special about oacc_device lower that doesn't also apply
> > to omp_device_lower?
>
> The passes do different things. The goal is to optimize OpenACC
> loops using Graphite. The relevant lowering of the internal OpenACC
> function calls happens in pass_oacc_device_lower.
>
> > Is all this targeted at code compiled exclusively for the offload
> > target?  Thus we're in lto1 here?
>
> The OpenACC outlined functions also get compiled for the host.
>
> > Does it make eventually more sense to have a completely custom pass
> > pipeline for the  offload compilation?  Maybe even per offload target?
> > See how we have a custom pipeline for -Og (pass_all_optimizations_g).
>
> What would be the main benefits of a separate pipeline? Avoiding
> (re-)running passes unneccessarily, less unwanted interactions
> in the test suite (but your suggestion above regarding the fixed
> pass names would also solve this)?

Mainly to avoid (re-)running passes unneccessarily and more
easily tuning towards offload targets without affecting non-offload
code too much.

Can I somehow make you work on that dump-file idea? ;)

Richard.

> >> Ok to include the patch in master?
>
> Best regards,
> Frederik
>
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter


More information about the Gcc-patches mailing list