Bug 85463 - [nvptx] "exit" in offloaded region doesn't terminate process
Summary: [nvptx] "exit" in offloaded region doesn't terminate process
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: libgomp (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: openacc, openmp
Depends on:
Blocks:
 
Reported: 2018-04-19 08:31 UTC by Thomas Schwinge
Modified: 2023-01-20 09:04 UTC (History)
2 users (show)

See Also:
Host:
Target: nvptx
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Schwinge 2018-04-19 08:31:05 UTC
Consider:

    #include <stdlib.h>
    
    int main(int argc, char *argv[])
    {
    #pragma acc parallel
      {
        if (argc != 1)
          exit(35);
      }
    
      return 0;
    }

No matter the argc value, this program always terminates normally (exit code zero).

Is the right solution to have GOACC_parallel etc. detect that exit etc. have been called inside offloaded regions, and then have them terminate the process in the appropriate way?

Can we make use of PTX trap in some way for that, to avoid adding overhead of retval checking for (the majority of) kernel launches that terminate normally?  That is, when C exit is called, set some retval (like currently only done in stand-alone nvptx target testing), and then call PTX trap instead of PTX exit, which will then cause cuStreamSynchronize will indicate launch error, then we can check the retval and do the appropritate thing?

Or, similarly, a scheme where the low-overhead PTX exit is only ever used for normal (exit code zero) C exit(0), and everything else uses the PTX trap variant?

This relates to PR85166 "[nvptx, libgfortran] Libgomp fortran tests using stop in offloaded fns fail to compile", where (still now) a Fortran STOP usage will not actually terminate the process with the appropriate error code.  (The "abort" testcases in the libgomp testsuite might appear to do the right thing nevertheless; see the (unresolved) discussion in <http://mid.mail-archive.com/87inlx3o1a.fsf@hertz.schwinge.homeip.net>.)
Comment 1 Thomas Schwinge 2018-04-19 08:54:09 UTC
Author: tschwinge
Date: Thu Apr 19 08:53:38 2018
New Revision: 259491

URL: https://gcc.gnu.org/viewcvs?rev=259491&root=gcc&view=rev
Log:
PR85463 '[nvptx] "exit" in offloaded region doesn't terminate process'

	libgomp/
	PR libfortran/85166
	* testsuite/libgomp.oacc-fortran/abort-1.f90: Switch back to "call
	abort".
	* testsuite/libgomp.oacc-fortran/abort-2.f90: Likewise.

	libgfortran/
	PR libfortran/85166
	PR libgomp/85463
	* runtime/minimal.c (stop_numeric): Reimplement.
	(stop_string, error_stop_string, error_stop_numeric): New
	functions.
	libgomp/
	PR libgomp/85463
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: New file.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.

Added:
    trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
    trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-2.f
    trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-3.f
    trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-1.f
    trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-2.f
    trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-3.f
Modified:
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/runtime/minimal.c
    trunk/libgomp/ChangeLog
    trunk/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
    trunk/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
Comment 2 Martin Liška 2018-11-20 08:30:10 UTC
Tom: Can the bug be marked as resolved?
Comment 3 Tobias Burnus 2019-12-04 15:14:52 UTC
(In reply to Martin Liška from comment #2)
> Tom: Can the bug be marked as resolved?

One has
! { dg-output "libgomp: cuStreamSynchronize error.*" { target openacc_nvidia_accel_selected } }

That's for the output:
----
  ERROR STOP (possibly followed by a number or message string)

  libgomp: cuStreamSynchronize error: an illegal instruction was encountered
----
Which is not that nice. Also the error code is not propagated. (It is the value of the integer used with 'error stop' or one when using nothing or a string.)

I assume the illegal instruction was a way to cause terminations with non-zero exit code …

In my opinion: I think that's the only issue with the current implementation; if one regards this part as WONTFIX – then the commit in comment 1 has FIXED it. Otherwise, one can leave this bug OPEN until a better solution is found.
Comment 4 Tobias Burnus 2023-01-20 09:04:08 UTC
Crossref: Thomas posted the following newlib patch:

  [PATCH] nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]
  https://sourceware.org/pipermail/newlib/2023/020140.html