GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

Tobias Burnus tburnus@baylibre.com
Thu Mar 7 14:07:32 GMT 2024


Hi Thomas,

first, I have the feeling we talk about (more or less) the same code 
region and use the same words – but we talk about rather different 
things. Thus, you confuse me (and possibly Andrew) – and my reply 
confuses you.

Thomas Schwinge wrote:
> On 2024-03-07T12:43:07+0100, Tobias Burnus<tburnus@baylibre.com>  wrote:
>> Thomas Schwinge wrote:
>>> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
>>> is also not really desirable.
> External users probably don't, but certainly all our internal testing is
> setting it,

First, I doubt it – secondly, if it were true, it was broken for the 
last 5 years or so as we definitely did not notice fails due to not 
working offload devices. – Neither for AMD GCN nor ...

> and also implicitly all nvptx offloading testing: simply by
> means of having such knob in the libgomp nvptx plugin.

I did see it at some places set for AMD but I do not see any 
nvptx-specific environment variable which permits to do the same.

However:
>   That is, the
> libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
> (the original meaning of) that flag

I think that's one of the problems here – you talk about 
suppress_host_fallback (implicit, original meaning), while I talk about 
the GCN_SUPPRESS_HOST_FALLBACK environment variable.

Besides all the talk about suppress_host_fallback, 
'init_hsa_runtime_functions' is not fatal' of the subject line seems to 
be something to be considered (beyond the patches you already suggested).


>> If I run on my Linux system the system compiler with nvptx + gcn suppost
>> installed, I get (with a nvptx permission problem):
>>
>> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out
>>
>> libgomp: GCN host fallback has been suppressed
>>
>> And exit code = 1. The same result with '-foffload=disable' or with
>> '-foffload=nvptx-none'.
> I can't tell if that's what you expect to see there, or not?

Well, obviously not that I get this error by default – and as your 
wording indicated that the internal variable will be always true – and 
not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I 
worry that I would get the error any time.

> (For avoidance of doubt: I'm expecting silent host-fallback execution in
> case that libgomp GCN and/or nvptx plugins are available, but no
> corresponding devices.  That's what my patch achieves.)

I concur that the silent host fallback should happen by default (unless 
env vars tell otherwise) - at least when either no code was generated 
for the device (e.g. -foffload=disable) or when the vendor runtime 
library is not available or no device (be it no hardware or no permission).

That's the current behavior and if that remains, my main concern evaporates.

* * *

>> If we want to remove it, we can make it always false - but I am strongly
>> against making it always true.
> I'm confused.  So you want the GCN and nvptx plugins to behave
> differently in that regard?
No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set.
>> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to
>> prevent the host fallback, but don't break somewhat common systems.
> That's an orthogonal concept?

No – It's the same concept of the main use of the 
GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time 
error instead of a silent host fallback.

But I have in the whole thread the feeling that – while talking about 
the same code region and throwing in the same words – we actually talk 
about completely different things.

Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20240307/7467c01d/attachment.htm>


More information about the Gcc-patches mailing list