Move pass_oacc_device_lower after pass_graphite

Frederik Harwath frederik@codesourcery.com
Fri Nov 6 11:18:35 GMT 2020


Hi Richard,

Richard Biener <richard.guenther@gmail.com> writes:

> On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath

> What's on my TODO list (or on the list of things to explore) is to make
> the dump file names/suffixes explicit in passes.def like via
>
>   NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")
>
> and we'd get a dump named .ccp_oacc or so.

That would be very helpful for avoiding the drudgery of adapting those
pass numbers!

> Now, what does oacc_device_lower actually do that you need to
> re-run complex lowering?  What does cunrolli do at this point that
> the complete_unroll pass later does not do?
>

Good spot, "cunrolli" seems to be unnecessary.  The complex lowering is
necessary to handle the code that gets created by the OpenACC reduction
lowering during oaccdevlow.  I have attached a test case (a reduced
version of
libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which
shows that the complex instructions are created by
pass_oacc_device_lower and which leads to an ICE if compiled without the
new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2").
The problem is an unlowered addition. This is from a diff of the dump of
the pass following oaccdevlow1 (ccp4) with disabled and with enabled
tree-cplxlower2:

<   _91 = VIEW_CONVERT_EXPR<complex float>(_1);
<   _92 = reduction_var_2 + _91;
---
>   _104 = REALPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;
>   _105 = IMAGPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;
>   _91 = COMPLEX_EXPR <_104, _105>;
>   _106 = reduction_var$real_100 + _104;
>   _107 = reduction_var$imag_101 + _105;
>   _92 = COMPLEX_EXPR <_106, _107>;

> What's special about oacc_device lower that doesn't also apply
> to omp_device_lower?

The passes do different things. The goal is to optimize OpenACC
loops using Graphite. The relevant lowering of the internal OpenACC
function calls happens in pass_oacc_device_lower.

> Is all this targeted at code compiled exclusively for the offload
> target?  Thus we're in lto1 here?

The OpenACC outlined functions also get compiled for the host.

> Does it make eventually more sense to have a completely custom pass
> pipeline for the  offload compilation?  Maybe even per offload target?
> See how we have a custom pipeline for -Og (pass_all_optimizations_g).

What would be the main benefits of a separate pipeline? Avoiding
(re-)running passes unneccessarily, less unwanted interactions
in the test suite (but your suggestion above regarding the fixed
pass names would also solve this)?

>> Ok to include the patch in master?

Best regards,
Frederik

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cplxlower-test.patch
Type: text/x-diff
Size: 1775 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20201106/2afe8881/attachment.bin>


More information about the Gcc-patches mailing list