[PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

Tom de Vries Tom_deVries@mentor.com
Wed Dec 16 16:02:00 GMT 2015


On 10/12/15 14:14, Tom de Vries wrote:
> [ copy-pasting-with-quote from
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
> reason I didn't get this email ]
>
>> On Thu, 3 Dec 2015, Tom de Vries wrote:
>>> The flag is set here in expand_omp_target:
>>> ...
>>> 12682         /* Prevent IPA from removing child_fn as unreachable,
>>>                  since there are no
>>> 12683            refs from the parent function to child_fn in offload
>>>                  LTO mode.  */
>>> 12684         if (ENABLE_OFFLOADING)
>>> 12685           cgraph_node::get (child_fn)->mark_force_output ();
>>> ...
>>>
>>
>> How are there no refs from the "parent"?  Are there not refs from
>> some kind of descriptor that maps fallback CPU and offloaded variants?
>
> That descriptor is the offload table, which is emitted in
> omp_finish_file. The function iterates over vectors offload_vars and
> offload_funcs.
>
> [ I would guess there's a one-on-one correspondance between
> symtab_node::offloadable and membership of either offload_vars or
> offload_funcs. ]
>
>> I think the above needs sorting out in somw way, making the refs
>> explicit rather than implicit via force_output.
>
> I've tried an approach where I add a test for node->offloadable next to
> each test for node->force_output, except for the test in the nonlocal_p
> def in ipa_pta_execute. But I didn't (yet) manage to make that work.
>
>>> I guess setting forced_by_abi instead would also mean child_fn is not
>>> removed
>>> as unreachable, while still allowing optimizations:
>>> ...
>>>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>>>      to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>>>      symbols promoted to static and it does not inhibit
>>>      optimization.  */
>>>   unsigned forced_by_abi : 1;
>>> ...
>>>
>>> But I suspect that other optimizations (than ipa-pta) might break
>>> things.
>>
>> How so?
>
> Probably it's more accurate to say that I do not understand the
> difference very well between force_output and force_by_abi, and what is
> the class of optimizations enabled by using forced_by_abi instead of
> force_output.'
>
>>> Essentially we have two situations:
>>> - in the host compiler, there is no need for the forced_output flag,
>>>   and it inhibits optimization
>>> - in the accelerator compiler, it (or some equivalent) is needed
>
> Actually, things are slightly more complicated, I realize now. There's
> also the distinction between:
> - symbols declared as offloadable in the source code, and
> - symbols create by the compiler and marked offloadable
>
>>> I wonder if setting the force_output flag only when streaming the
>>> bytecode for
>>> offloading would work. That way, it wouldn't be set in the host
>>> compiler,
>>> while being set in the accelerator compiler.
>>
>> Yeah, that was my original thinking btw.
>
> FTR, I've tried that approach, as attached. It fixed the
> goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
> using an accelerator configuration) without any regressions.

How about this patch?

We remove the setting of force_output when:
- encountering offloadable symbols in the frontend, or
- creating offloadable symbols in expand-omp.

Instead, we set force_output in input_offload_tables.

This is an improvement because:
- it moves the force_output setting to a single location
- it does the force_output setting ALAP

Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0008-Mark-symbols-in-offload-tables-with-force_output-in-read_offload_tables.patch
Type: text/x-patch
Size: 4197 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151216/babfe86a/attachment.bin>


More information about the Gcc-patches mailing list