92305 – [10 regression] libgomp.fortran/use_device_addr-1.f90 fails starting with r277606

Bug 92305 - [10 regression] libgomp.fortran/use_device_addr-1.f90 fails starting with r277606

Summary: [10 regression] libgomp.fortran/use_device_addr-1.f90 fails starting with r27...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	fortran (show other bugs)
Version:	10.0

Importance:	P3 normal
Target Milestone:	10.0
Assignee:	Jakub Jelinek

URL:
Keywords:	openmp

Depends on:
Blocks:

Reported:	2019-10-31 15:08 UTC by seurer
Modified:	2020-03-02 15:05 UTC (History)
CC List:	5 users (show)

See Also:
Host:	powerpc64*-linux-gnu
Target:	powerpc64*-linux-gnu
Build:	powerpc64*-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:	2019-11-11 00:00:00

Attachments
-fdump-rtl-expand for test case in comment 9, compiled on powerpc64le-unknown-linux-gnu using -O0 (it doesn't fail with -O1) (2.30 KB, text/plain) 2019-11-12 18:03 UTC, Tobias Burnus	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description seurer 2019-10-31 15:08:57 UTC

spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test2/gcc/xgcc -B/home/seurer/gcc/build/gcc-test2/gcc/ /home/seurer/gcc/gcc-test2/libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90 -B/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/ -B/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/.libs -I/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp -I/home/seurer/gcc/gcc-test2/libgomp/testsuite/../../include -I/home/seurer/gcc/gcc-test2/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never -fopenmp -B/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libquadmath/.libs/ -O0 -B/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libgfortran/.libs -fintrinsic-modules-path=/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp -L/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/.libs -L/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libquadmath/.libs/ -L/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libgfortran/.libs -lgfortran -foffload=-lgfortran -lquadmath -lm -o ./use_device_addr-1.exe
PASS: libgomp.fortran/use_device_addr-1.f90   -O0  (test for excess errors)
Setting LD_LIBRARY_PATH to .:/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/.libs:/home/seurer/gcc/build/gcc-test2/gcc:/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libgfortran/.libs:/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libquadmath/.libs:.:/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/.libs:/home/seurer/gcc/build/gcc-test2/gcc:/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libgfortran/.libs:/home/seurer/gcc/build/gcc-test2/powerpc64le-unknown-linux-gnu/./libgomp/../libquadmath/.libs:/home/seurer/gcc/build/gcc-test2/./gmp/.libs:/home/seurer/gcc/build/gcc-test2/./prev-gmp/.libs:/home/seurer/gcc/build/gcc-test2/./mpfr/src/.libs:/home/seurer/gcc/build/gcc-test2/./prev-mpfr/src/.libs:/home/seurer/gcc/build/gcc-test2/./mpc/src/.libs:/home/seurer/gcc/build/gcc-test2/./prev-mpc/src/.libs:/home/seurer/gcc/build/gcc-test2/./isl/.libs:/home/seurer/gcc/build/gcc-test2/./prev-isl/.libs:/home/seurer/gcc/install/gcc-7.4.0/lib64
Execution timeout is: 300
spawn [open ...]
STOP 1
FAIL: libgomp.fortran/use_device_addr-1.f90   -O0  execution test

Comment 1 Tobias Burnus 2019-10-31 15:26:45 UTC

Technically, this patch only adds '{ dg-do run }' which has the effect that the code is not only run once but multiple times with different compiler options (-O1, -O2 etc.).

Your code fails to execute with -O0 and a timeout of 300 (seconds, I guess, which is then 5min). – I think your system does not do any offloading to a GPU. Hence, it would be a host-only run.

It runs here with -O0 in 0.620s and compiles in 2.194s (cold run, re-run: 1.062s). And that's while bootstapping GCC in parallel on this laptop.

As you could nail it down to a single commit, I assume, you could reproduce the problem – still, I am completely lost why it fails for you at -O0. Can you try to debug it a bit more?

Comment 2 Tobias Burnus 2019-10-31 15:32:17 UTC

(In reply to Tobias Burnus from comment #1)
> As you could nail it down to a single commit, I assume, you could reproduce
> the problem – still, I am completely lost why it fails for you at -O0. Can
> you try to debug it a bit more?

Was mislead by the time out message. When updating the patch (changing abort to stop), I missed to enumerate the stops through, hence, there is more than one 'stop 1'. Can you pin-point it to a certain stop/fail, e.g. by running it in the debugger? [I will update the test to have disjunct stop numbers.]

[Still, as the patch only added 'dg-do run', it still does not explain why it worked before and fails now.]

Comment 3 seurer 2019-10-31 18:30:55 UTC

There are 222 stops in there.  Is there an easy way I can catch any of them that fire?  Just running in gdb shows this spawns a bunch of threads and it looks like one of them is what is stopping.

Comment 4 Tobias Burnus 2019-11-04 10:01:54 UTC

Author: burnus
Date: Mon Nov  4 10:01:22 2019
New Revision: 277769

URL: https://gcc.gnu.org/viewcvs?rev=277769&root=gcc&view=rev
Log:
libgomp/testsuite - use unique numbers with Fortran's 'stop'

        PR fortran/92305
        * testsuite/libgomp.fortran/allocatable2.f90: Use
        unique numbers with 'stop'.
        * testsuite/libgomp.fortran/use_device_addr-1.f90: Ditto.
        * testsuite/libgomp.fortran/use_device_addr-2.f90: Ditto.
        * testsuite/libgomp.fortran/use_device_ptr-1.f90: Ditto.
        * testsuite/libgomp.oacc-fortran/lib-15.f90: Ditto.
        * testsuite/libgomp.oacc-fortran/pset-1.f90: Ditto.


Modified:
    trunk/libgomp/ChangeLog
    trunk/libgomp/testsuite/libgomp.fortran/allocatable2.f90
    trunk/libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90
    trunk/libgomp/testsuite/libgomp.fortran/use_device_addr-2.f90
    trunk/libgomp/testsuite/libgomp.fortran/use_device_ptr-1.f90
    trunk/libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90
    trunk/libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90

Comment 5 Tobias Burnus 2019-11-04 10:09:36 UTC

(In reply to seurer from comment #3)
> Is there an easy way I can catch any of them that fire?

Now fixed by using unique numbers in libgomp/testsuite.

But replacing 'stop' by 'error stop' is one option - the exit code and output remains, but additionally a backtrace is shown. (Missed that option before; this just came into my mind right now.)

Comment 6 Thomas Schwinge 2019-11-11 12:09:58 UTC

Same for 'libgomp.fortran/use_device_addr-2.f90', per several <gcc-testresults@gcc.gnu.org> posts.


And indeed I'm seeing that, too, as already reported two weeks ago in <http://mid.mail-archive.com/87mudlrox7.fsf@euler.schwinge.homeip.net>:

| --- a/libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90
| +++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-1.f90
| @@ -0,0 +1 @@
| +! { dg-do run }
| 
| --- a/libgomp/testsuite/libgomp.fortran/use_device_addr-2.f90
| +++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-2.f90
| @@ -0,0 +1 @@
| +! { dg-do run }
| 
| On powerpc64le-unknown-linux-gnu without offloading, I'm seeing (only) the '-O0' execution tests FAIL for both these, with 'STOP 1' message.


Tobias, please have a look at some point.

Comment 7 Tobias Burnus 2019-11-11 16:46:27 UTC

(In reply to Thomas Schwinge from comment #6)
> And indeed I'm seeing that, too, 

Cool. Can you then debug it? Here, it works!

As mentioned several times in the PR, I cannot reproduce it neither locally (without true offloading) nor with nvptx offloading.

Comment 8 Tobias Burnus 2019-11-12 15:51:27 UTC

(In reply to Thomas Schwinge from comment #6)
> On powerpc64le-unknown-linux-gnu

I can reproduce it on a powerpc64le-unknown-linux-gnu w/o real offloading. It fails here for subroutine test_dummy_opt_val_callee_2 for
  if (.not.present(c_aptr) .or. .not.present(c_bptr)) stop 150
as both are marked as not present – that's completely unrelated to OpenMP (and happens with and without -fopenmp).


The callee and caller are:

test_dummy_opt_val_callee_2 (real(kind=8) aa, real(kind=8) bb,
            void * c_aptr, void * c_bptr, real(kind=8) * * aptr, real(kind=8) * * bptr,
            logical(kind=1) _aa, logical(kind=1) _bb, logical(kind=1) _c_aptr, logical(kind=1) _c_bptr)
…
  if ((logical(kind=4)) !_c_aptr || (logical(kind=4)) !_c_bptr)
      _gfortran_stop_numeric (150, 0);

And the call is:
  test_dummy_opt_val_callee_2 (aa, bb, c_aptr, c_bptr, &aptr, &bptr, 1, 1, 1, 1);

At a glance, that looks fine – and a minimal test case also passes. My feeling is that something with passing the arguments goes wrong – which is hidden in the gimple and not visible in the tree dump.

Comment 9 Tobias Burnus 2019-11-12 16:04:39 UTC

(In reply to Tobias Burnus from comment #8)
In gdb [GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1], which is really not the newest, I get:

(gdb) pt c_aptr
type = <data variable, no debug info>

and stepping in, gives (all variables should be unitialized, except for those with leading underscore, which should be 1 alias .TRUE.):

test_dummies_opt_value::test_dummy_opt_val_callee_2 (aa=0, bb=1.3262473693532952e-315, c_aptr=<error reading variable: Attempt to dereference a generic pointer.>, 
    c_bptr=<error reading variable: Attempt to dereference a generic pointer.>, aptr=0x3ffffffff648, bptr=0x1, _aa=.TRUE., _bb=.TRUE., _c_aptr=.FALSE., _c_bptr=96)

Reduced testcase is:

module test_dummies_opt_value
  use iso_c_binding
contains
  subroutine test_dummy_opt_val_call_2()
     real(c_double), target :: aa, bb
     type(c_ptr) :: c_aptr, c_bptr
     real(c_double), pointer :: aptr, bptr
     call test_dummy_opt_val_callee_2(aa, bb, c_aptr, c_bptr, aptr, bptr)
  end subroutine test_dummy_opt_val_call_2
  subroutine test_dummy_opt_val_callee_2(aa, bb, c_aptr, c_bptr, aptr, bptr)
     real(c_double), optional, value, target :: aa, bb
     type(c_ptr), optional, value :: c_aptr, c_bptr
     real(c_double), optional, pointer :: aptr, bptr
     if (.not.present(c_aptr) .or. .not.present(c_bptr)) stop 150
  end subroutine test_dummy_opt_val_callee_2
end module test_dummies_opt_value
program omp_device_addr
  use test_dummies_opt_value
  call test_dummy_opt_val_call_2()
end program omp_device_addr

Comment 10 Tobias Burnus 2019-11-12 16:27:48 UTC

The callee is:
 <var_decl 0x3fffb7f80630 c_aptr
    type <pointer_type 0x3fffb5ac0fc0
        type <void_type 0x3fffb5ac0f18 void VOID
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x3fffb5ac0f18
            pointer_to_this <pointer_type 0x3fffb5ac0fc0>>

and the hidden argument (_c_aptr) is:

 <integer_cst 0x3fffb5a01218 type <boolean_type 0x3fffb5ac0b28 logical(kind=1)> constant 1>

which both look fine.

Comment 11 Tobias Burnus 2019-11-12 16:51:42 UTC

Optimized dump is:

  void * c_bptr;
  void * c_aptr;
  real(kind=8) * bptr;
  real(kind=8) bb;
  real(kind=8) * aptr;
  real(kind=8) aa;
  real(kind=8) aa.1_1;
  real(kind=8) bb.2_2;
<bb 2> :
  aa.1_1 = aa;
  bb.2_2 = bb;
  test_dummy_opt_val_callee_2 (aa.1_1, bb.2_2, c_aptr_4(D), c_bptr_5(D), &aptr, &bptr, 1, 1, 1, 1);

And assembler:

        .cfi_startproc
.LCF1:
0:      addis 2,12,.TOC.-.LCF1@ha
        addi 2,2,.TOC.-.LCF1@l
        .localentry     __test_dummies_opt_value_MOD_test_dummy_opt_val_call_2,.-__test_dummies_opt_value_MOD_test_dummy_opt_val_call_2
        mflr 0           #,
        std 0,16(1)      #,
        std 31,-8(1)     #,
        stdu 1,-112(1)   #,,
        .cfi_def_cfa_offset 112
        .cfi_offset 65, 16
        .cfi_offset 31, -8
        mr 31,1          #,
        .cfi_def_cfa_register 31
 # foo.f90:8:      call test_dummy_opt_val_callee_2(aa, bb, c_aptr, c_bptr, aptr, bptr)
        lfd 0,64(31)     # aa, aa.1_1
        lfd 12,80(31)    # bb, bb.2_2
        addi 8,31,88     # tmp119,,
        addi 7,31,72     # tmp120,,
        li 9,1           # tmp121,
        std 9,40(1)      #, tmp121
        li 9,1           # tmp122,
        std 9,32(1)      #, tmp122
        li 10,1          #,
        li 9,1           #,
        ld 6,56(31)      # c_bptr,
        ld 5,48(31)      # c_aptr,
        fmr 2,12         #, bb.2_2
        fmr 1,0  #, aa.1_1
        bl __test_dummies_opt_value_MOD_test_dummy_opt_val_callee_2

And on the callee side:

test_dummy_opt_val_callee_2 (real(kind=8) aa, real(kind=8) bb, void * c_aptr, void * c_bptr, real(kind=8) * * aptr, real(kind=8) * * bptr, logical(kind=1) _aa, logical(kind=1) _bb, logical(kind=1) _c_aptr, logical(kind=1) _c_bptr)
{
  logical(kind=1) _1;
  logical(kind=4) _2;
  logical(kind=1) _3;
  logical(kind=4) _4;
  logical(kind=4) _5;
  <bb 2> :
  _1 = ~_c_aptr_6(D);
  _2 = (logical(kind=4)) _1;
  _3 = ~_c_bptr_7(D);
  _4 = (logical(kind=4)) _3;
  _5 = _2 | _4;
  if (_5 != 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]
  <bb 3> :
  _gfortran_stop_numeric (150, 0);

Which in assembler is:

        .cfi_startproc
.LCF0:
0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        .localentry     __test_dummies_opt_value_MOD_test_dummy_opt_val_callee_2,.-__test_dummies_opt_value_MOD_test_dummy_opt_val_callee_2
        mflr 0           #,
        std 0,16(1)      #,
        std 31,-8(1)     #,
        stdu 1,-48(1)    #,,
        .cfi_def_cfa_offset 48
        .cfi_offset 65, 16
        .cfi_offset 31, -8
        mr 31,1          #,
        .cfi_def_cfa_register 31
        stfd 1,80(31)    # aa, aa
        stfd 2,88(31)    # bb, bb
        std 5,96(31)     # c_aptr, c_aptr
        std 6,104(31)    # c_bptr, c_bptr
        std 7,112(31)    # aptr, aptr
        std 8,120(31)    # bptr, bptr
        stb 9,128(31)    # _aa, tmp123
        mr 9,10          # tmp125, tmp124
        stb 9,136(31)    # _bb, tmp125
 # foo.f90:16:      if (.not.present(c_aptr) .or. .not.present(c_bptr)) stop 150
        lbz 9,144(31)    # _c_aptr, tmp126
        xori 9,9,0x1     #,, tmp127, tmp126
        rlwinm 9,9,0,0xff        # _1, tmp127
        rlwinm 9,9,0,31,31       # tmp128, _1,,
        rldicl 10,9,0,32         # _2, tmp128
        lbz 9,152(31)    # _c_bptr, tmp129
        xori 9,9,0x1     #,, tmp130, tmp129
        rlwinm 9,9,0,0xff        # _3, tmp130
        rlwinm 9,9,0,31,31       # tmp131, _3,,
        rldicl 9,9,0,32  # _4, tmp131
        or 9,10,9        #, tmp132, _2, _4
        rldicl 9,9,0,32  # _5, tmp132
        cmpdi 0,9,0      #, tmp133, _5
        beq 0,.L3        #
 # foo.f90:16:      if (.not.present(c_aptr) .or. .not.present(c_bptr)) stop 150
        li 4,0           #,
        li 3,150                 #,
        bl _gfortran_stop_numeric        #
        nop

Comment 12 Tobias Burnus 2019-11-12 18:03:42 UTC

Created attachment 47223 [details]
-fdump-rtl-expand for test case in comment 9, compiled on powerpc64le-unknown-linux-gnu using -O0 (it doesn't fail with -O1)

Comment 13 Tobias Burnus 2019-11-12 18:16:24 UTC

19:03 < segher> so this is BE, right?
19:03 < segher> the caller does   li 9,1 ; std 9,32(1)
19:04 < segher> so it stores the bool as a 64-bit number
19:04 < segher> but the callee reads it like   lbz 9,144(31)
19:05 < segher> that's the right address, but the wrong end of that 8-byte thing

Comment 14 Tobias Burnus 2019-11-12 22:31:06 UTC

If the actual argument is itself optional but without value attribute, gfc_conv_expr_present returns a 'logical_type_node' (default-integer size, typically 4 bytes type) instead of a boolean_type_node (1 byte), which might causes problems.

However, for the test case of comment 9, this does not apply as there the boolean_true_node is used (which is 1 byte) – which matches the declaration in trans-decl.c (= boolean_type_node).

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 63559384c1e..267536ddf2f 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -5753 +5753,3 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
-			    vec_safe_push (optionalargs, tmp);
+			    vec_safe_push (optionalargs,
+					   fold_convert (boolean_type_node,
+							 tmp));

Comment 15 Tobias Burnus 2019-11-13 08:14:22 UTC

The mentioned patch (for a related case but not relevant for this bug) is now committed as r278114, see https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00990.html

Regarding the issue in this bug: Janne pointed out that powerpc64le-unknown-linux-gnu is LE and (segher:)
"it is obviously ELFv2 code" / "so, it's LE". Which means the endian issue is not the problem.

The data type matches caller and callee as far as I debugged (looking at the source code and debug_tree). (Namely, all Boolean arguments are one-byte types.) New hypothesis:

00:28 < segher> there are only 8 integer argument registers; everything after that goes in memory
00:28 < segher> in your testcase, two things are passed via memory
00:29 < segher> those are of course suspicious, but it could be something else entirely that is going wrong

I created a C test case, which is very similar (esp. on -O0 -fdump-tree-optimized level, Fortran has more of insn in the -fdump-rtl-expand) and the C one works. Maybe it helps someone to understand the issue better; it didn't help me so far. – Fortran example is in Comment 9, C example is here:

#include <stdbool.h>
#include <stdlib.h>
void callee(double aa, double bb, void *c_aptr, void *c_bptr, double **aptr, double **bptr, bool _aa, bool _bb, bool _c_aptr, bool _c_bptr) {
  if (!_c_aptr | !_c_bptr) abort();
}
void caller() {
  double aa, bb;
  void *c_aptr, *c_bptr;
  double *aptr, *bptr;
  callee (aa, bb, c_aptr, c_bptr, &aptr, &bptr, true, true, true, true);
}
int main() {
  caller();
  return 0;
}

Comment 16 Tobias Burnus 2019-11-22 10:58:02 UTC

Recap (see comment 11): Callee's arguments are
 (aa, bb, c_aptr, c_bptr, aptr, bptr, _aa, _bb, _c_aptr, _c_bptr)
i.e. 2 real(kind=4)/float variables, 4 pointers and 4 Booleans.

And the call is
 (aa.1_1, bb.2_2, c_aptr_4(D), c_bptr_5(D), &aptr, &bptr, 1, 1, 1, 1)
According to the debugger and run-time print debugging, the last two Booleans, i.e. _c_aptr and _c_bptr, are mishandled.

Andrew Jenner did some early analysis for me. (Thanks!) See comment 11 for the assembler code referred below.

On the caller side, the Booleans (tmp121 and tmp122) are stored at 40(1) and 32(1), respectively.

In the callee, the frame size is 48 bytes (stdu 1,-48(1)). So after the update we would expect the booleans to be at 88(1) and 80(1).

But the callee appears to be loading them from 144(31) and 152(31); (r31 is copied from r1).

And the compiler is using 80(31) and 88(31) to stash aa and bb
        stfd 1,80(31)    # aa, aa
        stfd 2,88(31)    # bb, bb

Remains the question why the backend is confused where those parameters live on the stack.

Comment 17 Jakub Jelinek 2020-01-31 18:02:45 UTC

So, to sum up, in #c9 we are (or want to do) in Fortran roughly what in C we would do with:
void foo (double aa, double bb, void *c_aptr, void *c_bptr, double **aptr, double **bptr, _Bool _aa, _Bool _bb, _Bool _c_aptr, _Bool _c_bptr)
{
  if (!_c_aptr | !_c_bptr)
    __builtin_abort ();
}
void bar (double aa, double bb, void *c_aptr, void *c_bptr, double **aptr, double **bptr, unsigned char _aa, unsigned char _bb, unsigned char _c_aptr, unsigned char _c_bptr)
{
  if (!_c_aptr | !_c_bptr)
    __builtin_abort ();
}
int
main ()
{
  double *aptr, *bptr;
  foo (0.0, 0.0, 0, 0, &aptr, &bptr, 1, 1, 1, 1);
  bar (0.0, 0.0, 0, 0, &aptr, &bptr, 1, 1, 1, 1);
  return 0;
}
and that works just fine in C, even at -O0.
Looking at expand dump of foo vs. test_dummy_opt_val_callee_2
I don't see any relevant differences in the arg passing:
-;; _1 = ~_c_aptr_4(D);
+;; _1 = ~_c_aptr_6(D);
 
-(insn 17 16 18 (set (reg:QI 124)
+(insn 17 16 18 (set (reg:QI 126)
         (mem/c:QI (plus:DI (reg/f:DI 111 virtual-incoming-args)
-                (const_int 64 [0x40])) [4 _c_aptr+0 S1 A64])) "pr92305.c":3:7 -1
+                (const_int 64 [0x40])) [4 _c_aptr+0 S1 A64])) "pr92305.f90":14:0 -1
      (nil))
...
-;; _2 = ~_c_bptr_5(D);
+;; _3 = ~_c_bptr_7(D);
 
-(insn 20 19 21 (set (reg:QI 126)
+(insn 22 21 23 (set (reg:QI 129)
         (mem/c:QI (plus:DI (reg/f:DI 111 virtual-incoming-args)
-                (const_int 72 [0x48])) [4 _c_bptr+0 S1 A64])) "pr92305.c":3:18 -1
+                (const_int 72 [0x48])) [4 _c_bptr+0 S1 A64])) "pr92305.f90":14:0 -1
      (nil))

In the caller, in Fortran I see:
    (expr_list (use (reg:DI 2 2))
        (expr_list:DF (use (reg:DF 33 1))
            (expr_list:DF (use (reg:DF 34 2))
                (expr_list:DI (use (reg:DI 5 5))
                    (expr_list:DI (use (reg:DI 6 6))
                        (expr_list:DI (use (reg:DI 7 7))
                            (expr_list:DI (use (reg:DI 8 8))
                                (expr_list:QI (use (reg:DI 9 9))
                                    (expr_list:QI (use (reg:DI 10 10))
                                        (expr_list:QI (use (mem:DI (reg/f:DI 114 virtual-outgoing-args) [0  S1 A64]))
                                            (expr_list:QI (use (mem:DI (plus:DI (reg/f:DI 114 virtual-outgoing-args)
                                                            (const_int 8 [0x8])) [0  S1 A64]))
                                                (nil)))))))))))))
and both DImode slot at virtual-outgoing-args and at +8 are initialized with full DImode store of 1 in there, while in C:
    (expr_list (use (reg:DI 2 2))
        (expr_list:DF (use (reg:DF 33 1))
            (expr_list:DF (use (reg:DF 34 2))
                (expr_list:DI (use (reg:DI 5 5))
                    (expr_list:DI (use (reg:DI 6 6))
                        (expr_list:DI (use (reg:DI 7 7))
                            (expr_list:DI (use (reg:DI 8 8))
                                (expr_list:QI (use (reg:DI 9 9))
                                    (expr_list:QI (use (reg:DI 10 10))
                                        (expr_list:QI (use (mem:DI (plus:DI (reg/f:DI 114 virtual-outgoing-args)
                                                        (const_int 64 [0x40])) [0  S1 A64]))
                                            (expr_list:QI (use (mem:DI (plus:DI (reg/f:DI 114 virtual-outgoing-args)
                                                            (const_int 72 [0x48])) [0  S1 A64]))
                                                (nil)))))))))))))
and again, the +64 and +72 slots are initialized to full DImode values of 1.

Comment 18 Jakub Jelinek 2020-01-31 20:34:10 UTC

I believe this is a FE bug, for these functions TYPE_ARG_TYPES of the FUNCTION_TYPE show just 6 arguments and don't include those 4 extra boolean args for the scalar optional dummy vars, and as the function isn't varargs either, it just confuses the calls.c completely, because it has more arguments than really allowed.

--- gcc/fortran/trans-types.c.jj	2020-01-12 11:54:36.000000000 +0100
+++ gcc/fortran/trans-types.c	2020-01-31 21:26:34.199188677 +0100
@@ -3098,6 +3098,16 @@ gfc_get_function_type (gfc_symbol * sym,
 
 	  vec_safe_push (typelist, type);
 	}
+      /* For noncharacter scalar intrinsic types, VALUE passes the value,
+	 hence, the optional status cannot be transferred via a NULL pointer.
+	 Thus, we will use a hidden argument in that case.  */
+      else if (arg
+	       && arg->attr.optional
+	       && arg->attr.value
+	       && !arg->attr.dimension
+	       && arg->ts.type != BT_CLASS
+	       && !gfc_bt_struct (arg->ts.type))
+	vec_safe_push (typelist, boolean_type_node);
     }
 
   if (!vec_safe_is_empty (typelist)
--- gcc/fortran/trans-decl.c.jj	2020-01-30 09:34:43.207088430 +0100
+++ gcc/fortran/trans-decl.c	2020-01-31 21:28:49.272197084 +0100
@@ -2645,8 +2645,8 @@ create_function_arglist (gfc_symbol * sy
 	      || f->sym->ts.u.cl->backend_decl == length)
 	    {
 	      if (POINTER_TYPE_P (len_type))
-		f->sym->ts.u.cl->backend_decl =
-			build_fold_indirect_ref_loc (input_location, length);
+		f->sym->ts.u.cl->backend_decl
+		  = build_fold_indirect_ref_loc (input_location, length);
 	      else if (f->sym->ts.u.cl->backend_decl == NULL)
 		gfc_create_string_length (f->sym);
 
@@ -2677,6 +2677,8 @@ create_function_arglist (gfc_symbol * sy
           DECL_ARG_TYPE (tmp) = boolean_type_node;
           TREE_READONLY (tmp) = 1;
           gfc_finish_decl (tmp);
+
+	  hidden_typelist = TREE_CHAIN (hidden_typelist);
 	}
 
       /* For non-constant length array arguments, make sure they use

seems to fix this, will see how far I get with that.

Comment 19 GCC Commits 2020-02-01 09:03:39 UTC

The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:add31061ec23e07fdf749dc335308efc81151a3d

commit r10-6393-gadd31061ec23e07fdf749dc335308efc81151a3d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Feb 1 10:02:20 2020 +0100

    fortran: Fix up TYPE_ARG_TYPES of procs with scalar VALUE optional args [PR92305]
    
    The following patch fixes
    -FAIL: libgomp.fortran/use_device_addr-1.f90   -O0  execution test
    -FAIL: libgomp.fortran/use_device_addr-2.f90   -O0  execution test
    that has been FAILing for several months on powerpc64le-linux.
    The problem is in the Fortran FE, which adds the artificial arguments
    for scalar VALUE OPTIONAL dummy args only to DECL_ARGUMENTS where the
    current function can see them, but not to TYPE_ARG_TYPES; if those functions
    aren't varargs, this confuses calls.c to pass the remaining arguments
    (which aren't named (== not covered by TYPE_ARG_TYPES) and aren't varargs
    either) in a different spot from what the callee (which has proper
    DECL_ARGUMENTS for all args) expects.  For the artificial length arguments
    for character dummy args we already put them in both DECL_ARGUMENTS and
    TYPE_ARG_TYPES.
    
    2020-02-01  Jakub Jelinek  <jakub@redhat.com>
    
    	PR fortran/92305
    	* trans-types.c (gfc_get_function_type): Also push boolean_type_node
    	types for non-character scalar VALUE optional dummy arguments.
    	* trans-decl.c (create_function_arglist): Skip those in
    	hidden_typelist.  Formatting fix.

Comment 20 Jakub Jelinek 2020-02-01 09:14:51 UTC

Fixed.

Comment 21 Thomas Schwinge 2020-03-02 15:05:19 UTC

For posterity:

(In reply to CVS Commits from comment #19)
>     The following patch fixes
>     -FAIL: libgomp.fortran/use_device_addr-1.f90   -O0  execution test
>     -FAIL: libgomp.fortran/use_device_addr-2.f90   -O0  execution test
>     that has been FAILing for several months on powerpc64le-linux.

Likewise for the 'libgomp.fortran/use_device_ptr-optional-3.f90' execution test for '-O0', given 'dg-do run' added (I'll commit that later).


And, for nvptx target, this also fixes 'gfortran.dg/optional_absent_3.f90' for '-O0', which was:

    ptxas /tmp/cc5Ae8i1.o, line 72; error   : Arguments mismatch for instruction 'mov'
    [...]
    ptxas /tmp/cc5Ae8i1.o, line 615; error   : Arguments mismatch for instruction 'mov'
    ptxas /tmp/cc5Ae8i1.o, line 627; error   : State space mismatch between instruction and address in instruction 'st'
    [...]
    ptxas /tmp/cc5Ae8i1.o, line 1046; error   : State space mismatch between instruction and address in instruction 'st'
    ptxas /tmp/cc5Ae8i1.o, line 72; error   : Unknown symbol '%ar3'
    ptxas /tmp/cc5Ae8i1.o, line 72; error   : Label expected for forward reference of '%ar3'
    [...]
    ptxas /tmp/cc5Ae8i1.o, line 615; error   : Unknown symbol '%ar12'
    ptxas /tmp/cc5Ae8i1.o, line 615; error   : Label expected for forward reference of '%ar12'
    ptxas /tmp/cc5Ae8i1.o, line 627; error   : Unknown symbol '%stack'
    ptxas /tmp/cc5Ae8i1.o, line 627; error   : Label expected for forward reference of '%stack'
    [...]
    ptxas /tmp/cc5Ae8i1.o, line 1046; error   : Unknown symbol '%stack'
    ptxas /tmp/cc5Ae8i1.o, line 1046; error   : Label expected for forward reference of '%stack'
    ptxas fatal   : Ptx assembly aborted due to errors
    nvptx-as: ptxas returned 255 exit status
    compiler exited with status 1