This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Tom de Vries <Tom_deVries at mentor dot com>
- Cc: Richard Biener <rguenther at suse dot de>, Thomas Schwinge <thomas at codesourcery dot com>, "gcc-patches at gnu dot org" <gcc-patches at gnu dot org>
- Date: Thu, 3 Dec 2015 12:13:24 +0100
- Subject: Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta
- Authentication-results: sourceware.org; auth=none
- References: <565C3CEC dot 9040209 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511301423530 dot 4884 at t29 dot fhfr dot qr> <565C7B09 dot 6000206 at mentor dot com> <565DADE6 dot 8020908 at mentor dot com> <87zixsloli dot fsf at kepler dot schwinge dot homeip dot net> <565F7F68 dot 1080903 at mentor dot com> <565F8881 dot 90609 at mentor dot com> <565F8C69 dot 1070906 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1512030958510 dot 4884 at t29 dot fhfr dot qr> <566022D0 dot 2030906 at mentor dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Dec 03, 2015 at 12:09:04PM +0100, Tom de Vries wrote:
> The flag is set here in expand_omp_target:
> ...
> 12682 /* Prevent IPA from removing child_fn as unreachable,
> since there are no
> 12683 refs from the parent function to child_fn in offload
> LTO mode. */
> 12684 if (ENABLE_OFFLOADING)
> 12685 cgraph_node::get (child_fn)->mark_force_output ();
> ...
>
> I guess setting forced_by_abi instead would also mean child_fn is not
> removed as unreachable, while still allowing optimizations:
> ...
> /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
> to be exported. Unlike FORCE_OUTPUT this flag gets cleared to
> symbols promoted to static and it does not inhibit
> optimization. */
> unsigned forced_by_abi : 1;
> ...
>
> But I suspect that other optimizations (than ipa-pta) might break things.
>
> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
> and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
>
> I wonder if setting the force_output flag only when streaming the bytecode
> for offloading would work. That way, it wouldn't be set in the host
> compiler, while being set in the accelerator compiler.
I believe that the host and offload func (and var) tables need to be in
sync, so there needs to be something both in the host and accel compilers
that prevents the functions and variables that have their accel or host
counterpart in the tables from being optimized away, or say replaced by
a clone with different arguments etc.
Jakub