Bug 90329 - Incompatibility between gfortran and C lapack calls
Summary: Incompatibility between gfortran and C lapack calls
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 9.1.0
: P3 normal
Target Milestone: 7.5
Assignee: Not yet assigned to anyone
URL:
Keywords: ABI
Depends on:
Blocks:
 
Reported: 2019-05-03 14:42 UTC by Thomas Koenig
Modified: 2019-09-16 10:08 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-05-04 00:00:00


Attachments
gcc10-pr90329.patch (2.08 KB, patch)
2019-05-15 07:16 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2019-05-03 14:42:25 UTC
The fix for PR 87689 (ABI violation on POWER) has led to problems
with what some C programs expect as the Fortran calling convention
to be.

It seems that many C codes assume that no hidden string length is
passed for a one-character string.  This is wrong (and has been wrong
since the days of the original Fortran compiler for Unix), but
the code exists, and we should recommend a solution for it.

This is important because C interfaces to LAPACK, as central a piece
of software as you can find, is impacted by this.

This has been quite extensively debugged by the R people, see
https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html . On x86_64,
Tomas Kalibera found that, while things "mostly" worked, a tail
call was causing problems because the stack space for the length
was not allocated.  -fno-optimize-sibling-calls seems to cure it,
but that may not be the only thing affected by this.

Note that this applies to gcc 7, gcc 8, gcc 9 and gcc 10, since the
patch was applied to all open branches at the time (an ABI violation
on a primary platform was deemed important enough).
Comment 1 Thomas Koenig 2019-05-03 19:00:29 UTC
(A reply to https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html ;
I want to have this in the PR's audit trail).

Toon, I have added you to this because of your experience going back
to g77, and also because you're a member of the steering committee.

> I've debugged one of the packages and I 
> confirm that the breakage is related to passing of strings from C to 
> Fortran. Indeed, BLAS and LAPACK define a large number of subroutines 
> that take one or more explicit single-character strings as arguments. 
> Other than that, BLAS has only one function (xerbla), which takes a 
> string of unspecified length, LAPACK only has four (ilaenv, 
> ilaenv2stage, lsamen, xerbla). The C interfaces to BLAS/LAPACK from 
> Netlib depend on the historic behavior that explicit single-character 
> strings are interoperable, concretely CBLAS and LAPACKE provide C 
> interfaces/code that calls into Fortran BLAS/LAPACK without passing the 
> 1s as lengths of the character strings (functions taking a string of 
> unspecified length are trivial and re-implemented in C).

OUCH. So, basically, people have been depending on C undefined
behavior for ages, and this includes recent developments like
LAPACKE.  Only an accident of calling conventions has kept
this "working".  Oh my...

If we are looking at the System V Application Binary Interface
AMD64 Architecture Processor Supplement, I can understand how
this worked.

I suppose this never worked on Windows stdcall, where the
callee cleans up the stack, so it must know the exact number
of bytes, and the functions are therefore decorated with the
number of bytes in the argument list.

(By the way, interoperability has a special meaning in Fortran,
where it is reserved for BIND(C) procedures and variables).

> This has been 
> working fine for very many years as the Fortran code never needed to 
> access the length it knew was 1. R has been using the same practice, 
> which long predates ISO_C_BINDING/BIND(C), and I've seen online 
> discussions where people assumed interoperability of length 1 strings, 
> once mentioning also a citation from Fortran 2003 Handbook that says "A 
> Fortran character string with a length other than 1 is not 
> interoperable" (which invites interpretation that length 1 strings were 
> ).

They are interoperable in the sense that they can be an argument
to a BIND(C) procedure. This is the terminology used by the
Fortran standard.

> I am not an expert to say whether the current Fortran standard 
> requires that interoperability and I assume that it does not given this 
> gfortran change.

What we are discussing here is outside the Fortran standard's scope.

> This gfortran change breaks this interoperability: if a C function calls 
> a Fortran function, passing it a single-character string for a parameter 
> taking explicit single-character Fortran string, it may crash. I've 
> debugged one case with R package BDgraph, this example 
> "library(BDgraph); data.sim <- bdgraph.sim( n = 70, p = 5, size = 7, vis 
> = TRUE )" crashes due to corruption of C stack by Fortran function 
> DPOSV, when compiled with the new gfortran and with -O2. To see the 
> problem, one can just look at the disassembly of DPOSV (LAPACK), neither 
> the package nor R is not necessary:
> 
> SUBROUTINE DPOSV( UPLO, N, NRHS, A, LDA, B, LDB, INFO )
> CHARACTER          UPLO
> 
> In one case, DPOSV calls DPOTRS before returning. The new gfortran with 
> -O2 performs tail-call optimization, jumping to DPOTRS. In the annotated 
> disassembly snippet, at 11747f1, DPOSV tries to ensure that there is 
> constant 1 as string length of UPLO when tail-calling into DPOTRS, so it 
> writes it to stack where there already should have been 1 as length of 
> UPLO passed to DPOSV. But this argument has not been passed to DPOSV, so 
> this causes stack corruption.

[assembly elided]

> Note that DPOSV never uses the length of the string (UPLO) from the 
> hidden argument, the compiler clearly knows that its length is 1. In 
> calls where the length is passed in registers, this does not cause 
> trouble (like LSAME) and indeed is needed as the registers have 
> different values
> 
>        IF( .NOT.LSAME( UPLO, 'U' ) .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
> 
>    117448:       b9 01 00 00 00          mov    $0x1,%ecx
> 
>    11744d:       ba 01 00 00 00          mov    $0x1,%edx
> 
>    117452:       48 8d 35 bb 12 09 00    lea    
> 0x912bb(%rip),%rsi        # 1a8714 <ipivot.4261+0xd14>
> 
>    117459:       4c 89 f7                mov    %r14,%rdi
> 
>    11745c:       e8 1f 3d ef ff          callq  b180 <lsame_@plt>
> 
> but it seems to me that the compiler could just refrain from setting the 
> length to be 1 on the stack at 1174f1, since it knows it should have 
> already been there.

That would be a nice feature of tail-call optimiation, which would also
"fix" this bug at the same time.

> It would be a nice property if Fortran code that 
> never accesses the hidden arguments with the lengths of the strings, 
> because it knows what those lengths are, would also never write to those 
> hidden arguments on the stack when it knows what they are (should be).

But it would not be clear if the callee also does not access the
memory.  Hm... I think it might make sense to make a separate
optimzation PR out of this.

> Before the gfortran change, DPOSV would call to DPOTRS normally (no 
> tail-call optimization), so this problem would not occur (I tested with 
> 268974). By disabling tail call optimization via 
> -fno-optimize-sibling-calls, the problem goes away also for other 
> packages my colleagues have identified as crashing with the new 
> gfortran.

OK, I can see that.

> Did you know of any other optimization that could break this 
> interoperability of 1-length strings? It would be really nice to users 
> if this interoperability could be preserved, and if not by default than 
> at least with some option.

LTO is likely to have issues, both positive and negative ones. On the
one hand, you will get warnings about LTO mismatches. On the other
hand, LTO has more information, so it will probably silently correct
the mismatch.

It is certainly worth a _LOT_ to be LTO-clean, especially in a project
like R where people write a lot of add-on code.

(At the moment, you put your version of LAPACK into a shared library,
so LTO cannot be used for this.).

The tail-call issue is a good catch.  I don't know of any other option
which might be affected, but I am also not an expert here. Especially,
gcc and gfortran support very many platforms with (I presume) many
calling conventions, so I am not sure that this will solve all issues.

> Traditionally, BLAS/LAPACK implementations are interchangeable at 
> dynamic linking time, using the Fortran interface that is however also 
> used from C, without passing lengths for fixed 1-character strings. R 
> supports this too, at least on some Linux distributions including 
> Debian/Ubuntu it is packaged so that it runs with the BLAS/LAPACK 
> implementation installed on the system. Even though this is probably not 
> correct wrt to the todays Fortran standard (I don't know for sure), this 
> is the common practice, and fixing this would not be easy - one would 
> have to create a new interface to be used from C, separate from the 
> Fortran one, and all software would have to start using that interface 
> from C. In the current situation when the Fortran interface is used, 
> confusion will arise with this gfortran change as different BLAS/LAPACK 
> implementations are built by different Fortran compilers and use a 
> different mix of Fortran/C for different computational subroutines. Note 
> CBLAS could not be readily used as it itself breaks with the current 
> gfortran change as well.

OK, I see this could be a major problem.

One of the issues is that the old way, calling a non-variadic
function in a variadic way, is also not guaranteed to be free
of trouble.

So, what to do?

It might be a possibility to invent some new flag, let's call it
-fomit-string-length, which simply does not pass the string
lengths. This would be binary compatible with existing calls,
but a major ABI change, and would also probably break libgfortran.

Another possibility would be to add an option like -fexternal-varargs,
which would call procedures without explicit interface with a variadic call.
Not ideal (see above), but it would restore the old behavior.

Somewhat orthogonal to this, it would also be possible to emit
C prototypes for all external subroutines.  I wrote this kind of
thing for BIND(C) procedures and variables, and it is eminently
doable. So any project could just generate the correct prototypes
during compilation.

Hm...
Comment 2 Paul Thomas 2019-05-04 07:56:08 UTC
Hi Thomas and Tomas,

I am investigating a problem with the ISO_Fortran_binding implementation and discovered that the hidden string length disappears when the procedure is declared as bind(C). This obviously correct.

Would this problem be fixed if all external procedure calls without an explicit interface, where a single character is being passed, were treated likewise?

Cheers

Paul
Comment 3 Janne Blomqvist 2019-05-04 10:20:24 UTC
Ooof!

(Just for the record, I don't think we should revert to the previous behavior. Whatever we do should be robust in the face of LTO etc.)

I'd rather not see extra command-line arguments. For something which modifies the ABI and somehow works by accident part of the time, this is just too difficult to keep track of for end users, IMHO.

My suggestions:

1) When compiling an external procedure, for character(len=1) arguments we don't generate the hidden string length argument. And similarly when calling an external procedure, if a len=1 character is passed, we omit the hidden string length argument. This, I believe, is what Paul is suggesting in the previous comment?

2) External procedures with character arguments are compiled and called as varargs functions. This is what Thomas is suggesting, except unconditionally and not controlled by an option.


I'm not really happy with either of these, but the third option, of fixing all Fortran-calling code out there isn't realistic either.
Comment 4 Thomas Koenig 2019-05-04 11:23:32 UTC
(In reply to Janne Blomqvist from comment #3)
> Ooof!
> 
> (Just for the record, I don't think we should revert to the previous
> behavior. Whatever we do should be robust in the face of LTO etc.)

I concur.

> I'd rather not see extra command-line arguments. For something which
> modifies the ABI and somehow works by accident part of the time, this is
> just too difficult to keep track of for end users, IMHO.
> 
> My suggestions:
> 
> 1) When compiling an external procedure, for character(len=1) arguments we
> don't generate the hidden string length argument. And similarly when calling
> an external procedure, if a len=1 character is passed, we omit the hidden
> string length argument. This, I believe, is what Paul is suggesting in the
> previous comment?

Unfortunately, this would not work :-(

While looking at the R source, I have also found some occurences
where 'F' was spelled out 'Full'.

Also, what could be done with CALL FOO ('a', 'bcd') ?
 
> 2) External procedures with character arguments are compiled and called as
> varargs functions. This is what Thomas is suggesting, except unconditionally
> and not controlled by an option.

Not an option, unfortunately :-(

This would bring back PR 87689, and this was an ABI breakage on
a primary GCC platform (POWER) with perfectly standard-compliant
code.

> I'm not really happy with either of these, but the third option, of fixing
> all Fortran-calling code out there isn't realistic either.

Currently, I am leaning towards using an option with a mandatory
warning (no way to turn it off) which does not push any string lenghts
onto the stack at all, something like -fbroken-character-abi,
which we would use for external procedures without an explicit interface.

The warning would have to be mandatory to a) point people towards what's
wrong with their code, and b) avoid questions of "why does my code
no longer work".

I can also extend -fc-prototypes so that it also issues prototypes
for any global procedures outside of BIND(C), or add a new option.
Comment 5 Janne Blomqvist 2019-05-04 12:42:16 UTC
(In reply to Thomas Koenig from comment #4)
> (In reply to Janne Blomqvist from comment #3)
> > 1) When compiling an external procedure, for character(len=1) arguments we
> > don't generate the hidden string length argument. And similarly when calling
> > an external procedure, if a len=1 character is passed, we omit the hidden
> > string length argument. This, I believe, is what Paul is suggesting in the
> > previous comment?
> 
> Unfortunately, this would not work :-(
> 
> While looking at the R source, I have also found some occurences
> where 'F' was spelled out 'Full'.

Ah, indeed. 
 
> Also, what could be done with CALL FOO ('a', 'bcd') ?

My idea would be to not pass the hidden string length for the 'a' argument, but pass it for the 'bcd' argument.

But as you mention above, there are indeed several ways in which his heuristic can be broken.

I have a vague memory that other Fortran compilers have special-cased length-1 string arguments, though.

> > 2) External procedures with character arguments are compiled and called as
> > varargs functions. This is what Thomas is suggesting, except unconditionally
> > and not controlled by an option.
> 
> Not an option, unfortunately :-(
> 
> This would bring back PR 87689, and this was an ABI breakage on
> a primary GCC platform (POWER) with perfectly standard-compliant
> code.

Wasn't the problem here that the caller was using varargs, while the callee was compiled as a normal function, thus causing the mismatch?

So my suggestion was that external procedures be compiled as varargs as well, which would make the caller and callee ABI's to match.

But now that I think about it, it wouldn't help this LAPACK case, as here the C side has a non-varargs prototype so the C caller will in any case not use the varargs convention. And I've never seen any C code defining prototypes to Fortran procedures as varargs either, so I guess this option is a non-starter.

Bluh.

> > I'm not really happy with either of these, but the third option, of fixing
> > all Fortran-calling code out there isn't realistic either.
> 
> Currently, I am leaning towards using an option with a mandatory
> warning (no way to turn it off) which does not push any string lenghts
> onto the stack at all, something like -fbroken-character-abi,
> which we would use for external procedures without an explicit interface.

Hmm, wouldn't this completely break procedures with "character(len=*)" dummy arguments? Or did mean that it would be used only for length=1 strings?

> I can also extend -fc-prototypes so that it also issues prototypes
> for any global procedures outside of BIND(C), or add a new option.

This might be useful.
Comment 6 Thomas Koenig 2019-05-04 16:46:30 UTC
(In reply to Janne Blomqvist from comment #5)
> (In reply to Thomas Koenig from comment #4)
> > Currently, I am leaning towards using an option with a mandatory
> > warning (no way to turn it off) which does not push any string lenghts
> > onto the stack at all, something like -fbroken-character-abi,
> > which we would use for external procedures without an explicit interface.
> 
> Hmm, wouldn't this completely break procedures with "character(len=*)" dummy
> arguments?

Yes :-(

This would be more or less equivalent to a -fbind-c flag.

>Or did mean that it would be used only for length=1 strings?

Unfortunately, the callee does not know the string length
passed by the caller.
 
> > I can also extend -fc-prototypes so that it also issues prototypes
> > for any global procedures outside of BIND(C), or add a new option.
> 
> This might be useful.

I just noticed that -fc-prototypes does not issue a prototype
for a global BIND(C) procedure. I'll probably fix that one soon.
Comment 7 Paul Thomas 2019-05-05 08:50:02 UTC
(In reply to Thomas Koenig from comment #6)
> (In reply to Janne Blomqvist from comment #5)
> > (In reply to Thomas Koenig from comment #4)
> > > Currently, I am leaning towards using an option with a mandatory
> > > warning (no way to turn it off) which does not push any string lenghts
> > > onto the stack at all, something like -fbroken-character-abi,
> > > which we would use for external procedures without an explicit interface.
> > 
> > Hmm, wouldn't this completely break procedures with "character(len=*)" dummy
> > arguments?
> 
> Yes :-(
> 
> This would be more or less equivalent to a -fbind-c flag.
> 
> >Or did mean that it would be used only for length=1 strings?
> 
> Unfortunately, the callee does not know the string length
> passed by the caller.
>  
> > > I can also extend -fc-prototypes so that it also issues prototypes
> > > for any global procedures outside of BIND(C), or add a new option.
> > 
> > This might be useful.
> 
> I just noticed that -fc-prototypes does not issue a prototype
> for a global BIND(C) procedure. I'll probably fix that one soon.

Just for the record, I have long since thought that passing the string length as a hidden argument is a design error just as awkward of the poor array descriptor design. We should be passing the data field of a descriptor as the address of the string, so that there is a known offset to the string length. This could be a generalised improvement, since it could also be used in general for f77 argument passing and would allow us to get rid in the initialisation and hidden descriptors nonsense in dummy handling. Storing all entities as descriptors would eliminate huge quantities of code in trans-*.c :-)

Dream on, Paul.... Maybe for 11-branch?

Paul
Comment 8 Janne Blomqvist 2019-05-05 12:52:34 UTC
(In reply to Janne Blomqvist from comment #3)
> 1) When compiling an external procedure, for character(len=1) arguments we
> don't generate the hidden string length argument. And similarly when calling
> an external procedure, if a len=1 character is passed, we omit the hidden
> string length argument. This, I believe, is what Paul is suggesting in the
> previous comment?

Having already shot down my (2) proposal, I thought a bit more about this one, that is omitting the hidden length argument for dummy arguments with len=1, and when calling a procedure with a len=1 argument.

Unfortunately, beyond being a somewhat fragile heuristic, it's also fundamentally broken. Consider

subroutine foo(a)
    character(len=1) :: a
    ...
end subroutine foo

So with this suggestion, this procedure would be generated without the hidden length argument. The brokenness comes from

call foo("ab")

which would generate a call to a procedure WITH the hidden string length argument. However, this code is perfectly standard conforming, F2018 15.5.2.4 says that a character dummy argument must have a length less than or equal to the actual argument, which is fulfilled by the above (1 <= 2). That is, we can't special case len=1 procedures and calls in any way.
Comment 9 Toon Moene 2019-05-05 18:48:45 UTC
> So with this suggestion, this procedure would be generated without the hidden 
> length argument. The brokenness comes from

> call foo("ab")

> which would generate a call to a procedure WITH the hidden string length 
> argument. However, this code is perfectly standard conforming, F2018 15.5.2.4 
> says that a character dummy argument must have a length less than or equal to 
> the actual argument, which is fulfilled by the above (1 <= 2). That is, we 
> can't special case len=1 procedures and calls in any way.

And this happens *a lot* in the LAPACK code itself.

Just do a grep -i sgemm on it and you will see it in action.

Perhaps this is the reason why this occurs in the R -> LAPACK boundary code ...
Comment 10 Richard Biener 2019-05-06 07:21:57 UTC
I think the only option is to go forward without any change to gfortran and simply teach people about this "ABI change" and how to fix the C side.  Everything else sounds like a hack that will create its own problems when used.

So - RESOLVED INVALID?

The recommended solution is to fix the C code.
Comment 11 Tomas Kalibera 2019-05-06 11:23:20 UTC
Even restoring the state that LAPACK/BLAS works but without providing guarantees would help short-term, and I think this could be in line with the goal of best performance within the standard.

At least in the case I debugged, I think gfortran could do better by not writing the 1 as string length to the place on the stack where the compiler knows there is already 1 as string length.
Comment 12 Janne Blomqvist 2019-05-06 11:46:19 UTC
(In reply to Tomas Kalibera from comment #11)
> At least in the case I debugged, I think gfortran could do better by not
> writing the 1 as string length to the place on the stack where the compiler
> knows there is already 1 as string length.

The problem is not that the compiler gets the wrong string length, in this case when the procedure argument is defined to have a length of 1, the compiler never needs to access the hidden string length argument [1].

The problem is that both the caller and the callee need to agree on the number (and type) of arguments. And because in Fortran implicit interfaces ("F77"-style interfaces, without modules, interface blocks or any other newfangled "modern Fortran" stuff) the only information about the interface of a procedure that the caller knows is the name of the procedure. So for an example compiling the code

call foo("abc")

the compiler has no knowledge of the procedure beyond the name "foo". From this code the compiler can then deduce that it must be a subroutine, and that it takes one argument of type character.  But the compiler cannot verify this, it must assume the programmer knows what he's doing. Now, the problem is that the definition of the procedure can be

subroutine foo(a)
    character(len=1) :: a
    ....
end subroutine foo

or

subroutine foo(a)
    character(len=3) :: a ! Or len=2
    ....
end subroutine foo

or

subroutine foo(a)
    character(len=*) :: a
    ....
end subroutine foo


In the first two the hidden string length argument is not needed, but because the procedure ABI must match for all the possibilities, and for the third case the hidden length IS needed, the hidden string length argument is always needed, although it's unused when the length is fixed.


[1] It would be possible to have some optional debugging option which inserts code to verify that the actual argument length >= the dummy argument length. But that's not really related to this issue.
Comment 13 Tomas Kalibera 2019-05-06 12:05:40 UTC
I understand the compiler may not know and typically does not whether the called function accepts "character(len=1)" or "character(len=*)", so it may have to pass the 1. But the situation where I suggest not writing the 1 is more subtle (see my original post - https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html).

There DPOSV is tail-calling (jumping) into DPOTRS). DPOSV wants to pass on the length of UPLO (1) to DPOTRS. DPOSV knows it was called with this hidden length argument of 1, and indeed at the same location on the stack as it is needed for DPOTRS, as the length of the same variable UPLO, but it still writes a compile-time constant 1 to that location on the stack that already has (should have) 1.
Comment 14 rguenther@suse.de 2019-05-06 12:46:06 UTC
On Mon, 6 May 2019, tomas.kalibera at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90329
> 
> --- Comment #13 from Tomas Kalibera <tomas.kalibera at gmail dot com> ---
> I understand the compiler may not know and typically does not whether the
> called function accepts "character(len=1)" or "character(len=*)", so it may
> have to pass the 1. But the situation where I suggest not writing the 1 is more
> subtle (see my original post -
> https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html).
> 
> There DPOSV is tail-calling (jumping) into DPOTRS). DPOSV wants to pass on the
> length of UPLO (1) to DPOTRS. DPOSV knows it was called with this hidden length
> argument of 1, and indeed at the same location on the stack as it is needed for
> DPOTRS, as the length of the same variable UPLO, but it still writes a
> compile-time constant 1 to that location on the stack that already has (should
> have) 1.

I think that's called sibling-calling, not tailcalling (tail-call with
same arguments).
Comment 15 Thomas Koenig 2019-05-06 13:40:14 UTC
Hi Tomas,

> I understand the compiler may not know and typically does not whether the
> called function accepts "character(len=1)" or "character(len=*)", so it may
> have to pass the 1. But the situation where I suggest not writing the 1 is
> more subtle (see my original post -
> https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html).

I have given this some thought, and I don't think this can be done
in the general case, unfortunately.

Consider

      program main
      call foo("ab")
      end

      subroutine foo(c)
      character*1 c
      call bar(c)
      end

      subroutine bar(c)
      character*(*) c
      print *,len(c)
      end

This is legal Fortran going back to F77, and it should print 1.

If your proposal was implemented, this would print 2, which would
be a wrong-code bug :-(

So, what can we do?  Previously, the way of interfacing C
with Fortran was fragile, non-conforming, and worked.  Now
it is fragile, non-conforming, and does not work.

In your (excellent, by the way) debugging work, you have
identified an option, -fno-optimize-sibling-calls, which
restores the status quo because things would go back to
being fragile, nonconforming, and they would work again.

Since we applied the fix for PR 87689 to gcc 7, gcc 8 and gcc 9,
I would suggest that we make -fno-optimize-sibling-calls
the default on these branches.  Maintaining binary compatibility
(even if it is bug compatibility) with existing packages is
something we should strive for, especially with such
important software packages as BLAS and LAPACK. These packages
are one important reason why people still use Fortran, and
I would hate to push them towards flang with this.

For current trunk, I would recommend keeping the current
hehavior and contact the LAPACK maintainers to a) give them
a heads-up for this problem, and b) a year to work out
the problem.

Would this be something that people could agree on?
Comment 16 kargl 2019-05-06 16:40:08 UTC
(In reply to Thomas Koenig from comment #15)
> Hi Tomas,
> 
> > I understand the compiler may not know and typically does not whether the
> > called function accepts "character(len=1)" or "character(len=*)", so it may
> > have to pass the 1. But the situation where I suggest not writing the 1 is
> > more subtle (see my original post -
> > https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html).
> 
> I have given this some thought, and I don't think this can be done
> in the general case, unfortunately.
> 
> Consider
> 
>       program main
>       call foo("ab")
>       end
> 
>       subroutine foo(c)
>       character*1 c
>       call bar(c)
>       end
> 
>       subroutine bar(c)
>       character*(*) c
>       print *,len(c)
>       end
> 
> This is legal Fortran going back to F77, and it should print 1.
> 
> If your proposal was implemented, this would print 2, which would
> be a wrong-code bug :-(
> 
> So, what can we do?  Previously, the way of interfacing C
> with Fortran was fragile, non-conforming, and worked.  Now
> it is fragile, non-conforming, and does not work.
> 
> In your (excellent, by the way) debugging work, you have
> identified an option, -fno-optimize-sibling-calls, which
> restores the status quo because things would go back to
> being fragile, nonconforming, and they would work again.
> 
> Since we applied the fix for PR 87689 to gcc 7, gcc 8 and gcc 9,
> I would suggest that we make -fno-optimize-sibling-calls
> the default on these branches.  Maintaining binary compatibility
> (even if it is bug compatibility) with existing packages is
> something we should strive for, especially with such
> important software packages as BLAS and LAPACK. These packages
> are one important reason why people still use Fortran, and
> I would hate to push them towards flang with this.
> 
> For current trunk, I would recommend keeping the current
> hehavior and contact the LAPACK maintainers to a) give them
> a heads-up for this problem, and b) a year to work out
> the problem.
> 
> Would this be something that people could agree on?

Does -fno-optimizing-sibling-calls effect performance?
A 1% (or less) degradation may be considered negligible,
and an acceptible compromise. 10% would be unacceptable.

Guess I'll need to dust off my old copy of the Polyhedron
Benchmarks and run a few tests.
Comment 17 Steve Kargl 2019-05-06 23:50:30 UTC
On Mon, May 06, 2019 at 04:40:08PM +0000, kargl at gcc dot gnu.org wrote:
> > Since we applied the fix for PR 87689 to gcc 7, gcc 8 and gcc 9,
> > I would suggest that we make -fno-optimize-sibling-calls
> > the default on these branches.  Maintaining binary compatibility
> > (even if it is bug compatibility) with existing packages is
> > something we should strive for, especially with such
> > important software packages as BLAS and LAPACK. These packages
> > are one important reason why people still use Fortran, and
> > I would hate to push them towards flang with this.
> > 
> > For current trunk, I would recommend keeping the current
> > hehavior and contact the LAPACK maintainers to a) give them
> > a heads-up for this problem, and b) a year to work out
> > the problem.
> > 
> > Would this be something that people could agree on?
> 
> Does -fno-optimizing-sibling-calls effect performance?
> A 1% (or less) degradation may be considered negligible,
> and an acceptible compromise. 10% would be unacceptable.
> 
> Guess I'll need to dust off my old copy of the Polyhedron
> Benchmarks and run a few tests.
> 

So, I dusted off my old PB code and ran some tests.
The system is x86_64-*-freebsd.  I saw nothing to
suggest that this option would have a negative 
impact on performance.

================================================================================
Date & Time     :  6 May 2019 13:29:24
Test Name       : gfcx
Compile Command : gfcx -static -ffpe-summary=none -O3 -pipe -mtune=native
   -march=native -ffast-math -ftree-vectorize -funroll-loops
   --param max-unroll-times=4 %n.f90 -o %n
Benchmarks      : ac aermod air capacita channel doduc fatigue gas_dyn induct
    linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times   :      200.0
Target Error %  :      0.100
Minimum Repeats :     5
Maximum Repeats :    10

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      1.60     5511576      8.00       6  0.0893
      aermod     49.38     6822120     19.18       5  0.0054
         air      7.49     5632568      4.57       5  0.0340
    capacita      4.97     5639072     40.91       5  0.0384
     channel      1.31     5520904      1.86      10  0.2693
       doduc      7.53     5655488     19.33       6  0.0978
     fatigue      3.42     5618928      4.41       5  0.0170
     gas_dyn      3.80     5604440      1.91       5  0.0214
      induct      7.59     5771912      6.18       5  0.0209
       linpk      1.23     5486240      7.82       5  0.0096
        mdbx      2.97     5553200      7.33       5  0.0388
          nf      2.73     5533744      8.57       5  0.0095
     protein      4.89     5762320     25.13       5  0.0494
      rnflow      7.87     5965568     34.26       5  0.0810
    test_fpu      5.90     5705216      6.03      10  0.1206
        tfft      1.64     5519096      1.81       5  0.0138

Geometric Mean Execution Time =       7.94 seconds

================================================================================

================================================================================
Date & Time     :  6 May 2019 16:11:36
Test Name       : gfcx
Compile Command : gfcx -fno-optimize-sibling-calls -static -ffpe-summary=none
    -O3 -pipe -mtune=native -march=native -ffast-math -ftree-vectorize
    -funroll-loops --param max-unroll-times=4 %n.f90 -o %n
Benchmarks      : ac aermod air capacita channel doduc fatigue gas_dyn induct
    linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times   :      200.0
Target Error %  :      0.100
Minimum Repeats :     5
Maximum Repeats :    10

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      1.58     5511576      7.99       5  0.0139
      aermod     49.16     6822120     19.17       5  0.0216
         air      7.48     5632568      4.57       5  0.0274
    capacita      4.95     5639072     41.24       5  0.3593
     channel      1.31     5520904      1.86      10  0.2607
       doduc      8.00     5655488     19.29      10  0.0947
     fatigue      3.43     5618928      4.40       5  0.0545
     gas_dyn      3.81     5604440      1.97       5  0.0328
      induct      7.58     5771912      6.18       5  0.0121
       linpk      1.23     5486240      7.85       5  0.0699
        mdbx      2.97     5553200      7.28       5  0.0703
          nf      2.72     5533744      8.62       5  0.0998
     protein      4.87     5762320     25.30       8  0.1194
      rnflow      7.86     5965568     34.28       6  0.2875
    test_fpu      5.88     5705216      6.03       8  0.0980
        tfft      1.64     5519096      1.74       6  0.0841

Geometric Mean Execution Time =       7.94 seconds

================================================================================
Comment 18 Janne Blomqvist 2019-05-07 06:10:48 UTC
(In reply to Thomas Koenig from comment #15)
> Since we applied the fix for PR 87689 to gcc 7, gcc 8 and gcc 9,
> I would suggest that we make -fno-optimize-sibling-calls
> the default on these branches.  Maintaining binary compatibility
> (even if it is bug compatibility) with existing packages is
> something we should strive for, especially with such
> important software packages as BLAS and LAPACK.

+1. Especially considering Steve's benchmark suggesting there's practically no difference, although there may of course be other code where sibling call optimization makes a difference.

> For current trunk, I would recommend keeping the current
> hehavior and contact the LAPACK maintainers to a) give them
> a heads-up for this problem, and b) a year to work out
> the problem.

Yes. Closer to GCC 10, we can revisit this. I suspect we'll have to make -fno-optimize-sibling-calls the default for GCC 10 as well; while we might be able to help LAPACK maintainers fix LAPACKE there's in this timeframe there's certainly a lot of other code out there with custom C-Fortran interfaces which might be affected.
Comment 19 rguenther@suse.de 2019-05-07 06:50:41 UTC
On Tue, 7 May 2019, jb at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90329
> 
> --- Comment #18 from Janne Blomqvist <jb at gcc dot gnu.org> ---
> (In reply to Thomas Koenig from comment #15)
> > Since we applied the fix for PR 87689 to gcc 7, gcc 8 and gcc 9,
> > I would suggest that we make -fno-optimize-sibling-calls
> > the default on these branches.  Maintaining binary compatibility
> > (even if it is bug compatibility) with existing packages is
> > something we should strive for, especially with such
> > important software packages as BLAS and LAPACK.
> 
> +1. Especially considering Steve's benchmark suggesting there's practically no
> difference, although there may of course be other code where sibling call
> optimization makes a difference.
> 
> > For current trunk, I would recommend keeping the current
> > hehavior and contact the LAPACK maintainers to a) give them
> > a heads-up for this problem, and b) a year to work out
> > the problem.
> 
> Yes. Closer to GCC 10, we can revisit this. I suspect we'll have to make
> -fno-optimize-sibling-calls the default for GCC 10 as well; while we might be
> able to help LAPACK maintainers fix LAPACKE there's in this timeframe there's
> certainly a lot of other code out there with custom C-Fortran interfaces which
> might be affected.

I don't see how -fno-optimize-sibling-calls helps in a systematic way.
It might obfuscate a specific example enough to make it work, but...?
Comment 20 Tomas Kalibera 2019-05-07 08:09:17 UTC
(In reply to rguenther@suse.de from comment #19)

> I don't see how -fno-optimize-sibling-calls helps in a systematic way.
> It might obfuscate a specific example enough to make it work, but...?

There is only empirical data, but more than one case. I've debugged only one case, but all the other R packages that were newly failing with that gfortran change also started working again with -fno-optimize-sibling-calls (~15 CRAN packages). That -fno-optimize-sibling-calls helps has also been confirmed by Kurt Hornik from the CRAN team.
Comment 21 Janne Blomqvist 2019-05-08 08:15:37 UTC
I filed https://github.com/Reference-LAPACK/lapack/issues/339 to start a discussion about fixing CBLAS and LAPACKE in upstream LAPACK.
Comment 22 Thomas Koenig 2019-05-08 21:55:44 UTC
Author: tkoenig
Date: Wed May  8 21:55:13 2019
New Revision: 271018

URL: https://gcc.gnu.org/viewcvs?rev=271018&root=gcc&view=rev
Log:
2019-05-08  Thomas Koenig  <tkoenig@gcc.gnu.org>

    PR fortran/90351
    PR fortran/90329
    * gfortran.dg/dump-parse-tree.c: Include version.h.
    (gfc_dump_external_c_prototypes): New function.
    (get_c_type_name): Select "char" as a name for a simple char.
    Adjust to handling external functions. Also handle complex.
    (write_decl): Add argument bind_c. Adjust for dumping of external
    procedures.
    (write_proc): Likewise.
    (write_interop_decl): Add bind_c argument to call of write_proc.
    * gfortran.h: Add prototype for gfc_dump_external_c_prototypes.
    * lang.opt: Add -fc-prototypes-external flag.
    * parse.c (gfc_parse_file): Move dumping of BIND(C) prototypes.
    Call gfc_dump_external_c_prototypes if option is set.
    * invoke.texi: Document -fc-prototypes-external.


Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/dump-parse-tree.c
    trunk/gcc/fortran/gfortran.h
    trunk/gcc/fortran/invoke.texi
    trunk/gcc/fortran/lang.opt
    trunk/gcc/fortran/parse.c
Comment 23 Thomas Koenig 2019-05-08 22:04:36 UTC
On trunk, it is now possible to generate C prototypes from
Fortran source code for external procedures.  Applying
this to dlapack.f in the R source tree yields something like
this (just grabbing a few lines)

$ gfortran -fsyntax-only -fc-prototypes-external dlapack.f 

[...]

void dtpqrt2_ (int *m, int *n, int *l, double *a, int *lda, double *b, int *ldb, double *t, int *ldt, int *info);
void dlarfg_ (int *n, double *alpha, double *x, int *incx, double *tau);
void dlatps_ (char *uplo, char *trans, char *diag, char *normin, int *n, double *ap, double *x, double *scale, double *cnorm, int *info, size_t uplo_len, size_t trans_len, size_t diag_len, size_t normin_len);
double dlantp_ (char *norm, char *uplo, char *diag, int *n, double *ap, double *work, size_t norm_len, size_t uplo_len, size_t diag_len);
void dlatbs_ (char *uplo, char *trans, char *diag, char *normin, int *n, int *kd, double *ab, int *ldab, double *x, double *scale, double *cnorm, int *info, size_t uplo_len, size_t trans_len, size_t diag_len, size_t normin_len);
double dlantb_ (char *norm, char *uplo, char *diag, int *n, int *k, double *ab, int *ldab, double *work, size_t norm_len, size_t uplo_len, size_t diag_len);

Looking for _len yields 236 lines, that is probably the number of
subroutines and functions which would need to be adapted.
Comment 24 Thomas Koenig 2019-05-09 17:41:01 UTC
Author: tkoenig
Date: Thu May  9 17:40:30 2019
New Revision: 271038

URL: https://gcc.gnu.org/viewcvs?rev=271038&root=gcc&view=rev
Log:
2019-05-09  Thomas Koenig  <tkoenig@gcc.gnu.org>

    Backport from trunk
    PR fortran/90351
    PR fortran/90329
    * gfortran.dg/dump-parse-tree.c: Include version.h.
    (gfc_dump_external_c_prototypes): New function.
    (get_c_type_name): Select "char" as a name for a simple char.
    Adjust to handling external functions. Also handle complex.
    (write_decl): Add argument bind_c. Adjust for dumping of external
    procedures.
    (write_proc): Likewise.
    (write_interop_decl): Add bind_c argument to call of write_proc.
    * gfortran.h: Add prototype for gfc_dump_external_c_prototypes.
    * lang.opt: Add -fc-prototypes-external flag.
    * parse.c (gfc_parse_file): Move dumping of BIND(C) prototypes.
    Call gfc_dump_external_c_prototypes if option is set.
    * invoke.texi: Document -fc-prototypes-external.


Modified:
    branches/gcc-9-branch/gcc/fortran/ChangeLog
    branches/gcc-9-branch/gcc/fortran/dump-parse-tree.c
    branches/gcc-9-branch/gcc/fortran/gfortran.h
    branches/gcc-9-branch/gcc/fortran/invoke.texi
    branches/gcc-9-branch/gcc/fortran/lang.opt
    branches/gcc-9-branch/gcc/fortran/parse.c
Comment 25 Jakub Jelinek 2019-05-14 19:16:25 UTC
(In reply to Thomas Koenig from comment #15)
> Hi Tomas,
> 
> > I understand the compiler may not know and typically does not whether the
> > called function accepts "character(len=1)" or "character(len=*)", so it may
> > have to pass the 1. But the situation where I suggest not writing the 1 is
> > more subtle (see my original post -
> > https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html).
> 
> I have given this some thought, and I don't think this can be done
> in the general case, unfortunately.
> 
> Consider
> 
>       program main
>       call foo("ab")
>       end
> 
>       subroutine foo(c)
>       character*1 c
>       call bar(c)
>       end
> 
>       subroutine bar(c)
>       character*(*) c
>       print *,len(c)
>       end
> 
> This is legal Fortran going back to F77, and it should print 1.
> 
> If your proposal was implemented, this would print 2, which would
> be a wrong-code bug :-(
> 
> So, what can we do?  Previously, the way of interfacing C
> with Fortran was fragile, non-conforming, and worked.  Now
> it is fragile, non-conforming, and does not work.
> 
> In your (excellent, by the way) debugging work, you have
> identified an option, -fno-optimize-sibling-calls, which
> restores the status quo because things would go back to
> being fragile, nonconforming, and they would work again.

You don't want to do that, that is way big hammer and in some cases, tail call optimizations are essential for correct program behavior (without the optimizations for deep recursions you won't fit on stack, while when using them you will).

If you want some hack to workaround these buggy wrappers (but will their maintainers then actually fix anything rather than ignoring the problem?),
then I could see tail-calls.c or calls.c punt if it sees DECL_ARTIFICIAL PARM_DECLs passed on the stack at the end of argument list and it is the fortran length (it would e.g. help if those arguments would be named with an initial dot rather than underscore (I believe that used to be the case in the past)), or we need some way to propagate the info that the argument is hidden string length through LTO.
Comment 26 Jakub Jelinek 2019-05-14 19:32:01 UTC
It could even allow tailcalls in the character(len=*) cases because in those cases if the caller omits the string length hidden argument, I see no reason to try to workaround that, it will simply never work properly.
The only reason without tailcalls things appear to work is that the argument is TREE_READONLY and nothing touches it, so there is some hope it won't be modified either, when doing a tailcall if the tail callee needs any of those stack slots for other arguments, it will be written into.  We could still allow tail calls
if the callee needs fewer arguments than the caller and doesn't need to overwrite any of the hidden string length arguments the buggy callers don't provide.
Comment 27 Jakub Jelinek 2019-05-14 20:02:45 UTC
Note that both ifort and xlc fortran have the hidden string length arguments as well.
Comment 28 Conrad S 2019-05-15 02:28:50 UTC
More fallout, this time with C++ code:
https://bugzilla.redhat.com/show_bug.cgi?id=1709538

+1 for "Maintaining binary compatibility (even if it is bug compatibility) with existing packages"
Comment 29 Jakub Jelinek 2019-05-15 07:16:30 UTC
Created attachment 46359 [details]
gcc10-pr90329.patch

Untested workaround that isn't a too big hammer.  This should just avoid tail calls in functions where the hidden string arguments for character(len=constant) dummy arguments are passed partially or fully on the stack and the callee needs to pass anything on the stack and thus might clobber that.
Comment 30 Thomas Koenig 2019-05-15 12:57:09 UTC
Hi Jakub,

> Untested workaround that isn't a too big hammer.  This should just avoid
> tail calls in functions where the hidden string arguments for
> character(len=constant) dummy arguments are passed partially or fully on the
> stack and the callee needs to pass anything on the stack and thus might
> clobber that.

Thanks for taking this on!

Two remarks: It should be possible to have this workaround in place
only when there is no explicit interface for the called routine.
No need to penalize modern, correct Fortran for this.

I can look at this, but I will only be able to find time on the
weekend.

And I would like to have the Fortran part behind an option, so
people could disable this pessimization if they start using the
correct prototypes.

Ideally, the compiler could warn about this. It already does so
if you use -flto, but without LTO, the warning would have to
be emitted from the C compiler, and I see no even halfway sane
way of doing it.
Comment 31 Jakub Jelinek 2019-05-15 13:21:35 UTC
(In reply to Thomas Koenig from comment #30)
> Hi Jakub,
> 
> > Untested workaround that isn't a too big hammer.  This should just avoid
> > tail calls in functions where the hidden string arguments for
> > character(len=constant) dummy arguments are passed partially or fully on the
> > stack and the callee needs to pass anything on the stack and thus might
> > clobber that.
> 
> Thanks for taking this on!
> 
> Two remarks: It should be possible to have this workaround in place
> only when there is no explicit interface for the called routine.
> No need to penalize modern, correct Fortran for this.

First of all, the thing is that we don't have just caller and callee to consider in this case, but 3 functions, some C/C++ function (C caller below) calls incorrectly prototyped (or for C unprototyped) Fortran function (Fortran caller bellow) and that function calls some other function in a position where tail call is considered (callee bellow).
I don't see how having a properly interfaced callee (and it doesn't matter if that one is implemented in Fortran or C or C++ or any other language) matters to whether we need to do this workaround or not.
The bug is in the incorrect prototype visible in the C caller and the incorrect (insufficient number) of arguments in that call.
Even if the callee function has proper interface, if we tail call it, we will clobber the argument area of the Fortran caller in the C caller.
And it doesn't really matter what arguments the callee has, it could very well be:
subroutine foo (a, b, c, d, e, f)
  character(len=1) :: a
  integer(kind=8) :: b, c, d, e, f
  interface
    subroutine bar (g, h, i, j, k, l, m)
      integer(kind=8) :: g, h, i, j, k, l, m
    end subroutine bar
  end interface
  call bar (b, c, d, e, f, 2, 3)
end subroutine foo
Here if we tail call bar from foo, which we generally on x86_64 can, because the caller has 6 normal arguments + 1 artificial hidden string length, and
the callee has 7 normal arguments, in both cases 6 go in registers, the 7th in a stack slot, then the argument stack area is equal size (for tail call it is enough if the callee stack area is smaller or equal than caller's).

> And I would like to have the Fortran part behind an option, so
> people could disable this pessimization if they start using the
> correct prototypes.

Agreed, just no idea how to call that option and how much to express in the option name that it is a workaround for undefined behavior.

> Ideally, the compiler could warn about this. It already does so
> if you use -flto, but without LTO, the warning would have to
> be emitted from the C compiler, and I see no even halfway sane
> way of doing it.

This can be diagnosed only with LTO (but it already is I think) or perhaps with some debug info parsing and analysis tool that would inspect debug info of the whole app and all libraries and complain.  It is the same thing as
extern int baz (int x, int y);
int foo (int x, int y)
{
  return baz (x + y, 0);
}
in one C TU and
extern int foo (int x);
int bar (void) { return foo (10); }
in another one, we don't diagnose that without LTO either and can't.
Comment 32 Jakub Jelinek 2019-05-16 08:13:47 UTC
BTW, as can be seen on https://fortran.godbolt.org/z/ckMt1t
it is a pure luck that these bogus C/C++ wrappers seem to work, if one tries simple
subroutine foo (a, b, c, d, e, f, g, h)
  integer :: b, c, d, e, f, g, h
  character(len=1) :: a
  call bar (a, b, c, d, e, f, g, h)
end subroutine foo
then at -O2 both older gfortran and ifort emit a tail call, which will overwrite the hidden string length argument, so if one prototypes this as
void foo_ (char *a, blas_int b, blas_int c, blas_int d, blas_int e, blas_int f, blas_int g, blas_int h);
and calls it with say foo_ ("1", 2, 3, 4, 5, 6, 7, 8); then it will overwrite some variables in the caller both with older gfortran and with ifort.
Comment 33 Jakub Jelinek 2019-05-16 09:38:14 UTC
Author: jakub
Date: Thu May 16 09:37:43 2019
New Revision: 271285

URL: https://gcc.gnu.org/viewcvs?rev=271285&root=gcc&view=rev
Log:
	PR fortran/90329
	* tree-core.h (struct tree_decl_common): Document
	decl_nonshareable_flag for PARM_DECLs.
	* tree.h (DECL_HIDDEN_STRING_LENGTH): Define.
	* calls.c (expand_call): Don't try tail call if caller
	has any DECL_HIDDEN_STRING_LENGTH PARM_DECLs that are or might be
	passed on the stack and callee needs to pass any arguments on the
	stack.
	* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
	else if instead of series of mutually exclusive ifs.  Handle
	DECL_HIDDEN_STRING_LENGTH for PARM_DECLs.
	* tree-streamer-out.c (pack_ts_decl_common_value_fields): Likewise.

	* trans-decl.c (create_function_arglist): Set
	DECL_HIDDEN_STRING_LENGTH on hidden string length PARM_DECLs if
	len is constant.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/calls.c
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-decl.c
    trunk/gcc/tree-core.h
    trunk/gcc/tree-streamer-in.c
    trunk/gcc/tree-streamer-out.c
    trunk/gcc/tree.h
Comment 34 Jakub Jelinek 2019-05-16 10:25:49 UTC
Even the
subroutine foo (a, b, c, d, e, f)
  integer :: c, e, f
  character(len=1) :: a, b
  double precision :: d (e, *)
  call bar (a, b, c, d, e, f)
end subroutine foo
is tail call optimized by both older gfortran and ifort, and this one has the same prototype as the lapack routine and so will be broken by both old gfortran and ifort when called incorrectly without the hidden string length arguments.
Comment 35 Jakub Jelinek 2019-05-16 11:29:45 UTC
One more thing, if I add to start of the DTRTRI function
      INTERFACE
      SUBROUTINE DTRTI2( UPLO, DIAG, N, A, LDA, INFO )
      CHARACTER          DIAG, UPLO
      INTEGER            INFO, LDA, N
      DOUBLE PRECISION   A( LDA, * )
      END SUBROUTINE
      END INTERFACE
      INTERFACE
      SUBROUTINE XERBLA( FOO, BAR )
      CHARACTER          FOO
      INTEGER            BAR
      END SUBROUTINE
      END INTERFACE
      INTERFACE
      SUBROUTINE DTRMM ( S1, S2, S3, S4, I1, I2, D1, A, I3, D2, I4 )
      CHARACTER          S1, S2, S3, S4
      INTEGER            I1, I2, I3, I4
      DOUBLE PRECISION   D1, D2
      DOUBLE PRECISION   A( I3, * )
      END SUBROUTINE
      END INTERFACE
      INTERFACE
      SUBROUTINE DTRSM ( S1, S2, S3, S4, I1, I2, D1, D2, I3, D3, I4 )
      CHARACTER          S1, S2, S3, S4
      INTEGER            I1, I2, I3, I4
      DOUBLE PRECISION   D1, D2, D3
      END SUBROUTINE
      END INTERFACE
      INTERFACE
      FUNCTION LSAME ( S1, S2 )
      CHARACTER          S1, S2
      LOGICAL            LSAME
      END FUNCTION
      END INTERFACE
      INTERFACE
      FUNCTION ILAENV ( I1, S1, S2, I2, I3, I4, I5 )
      INTEGER            I1, I2, I3, I4, I5, ILAENV
      CHARACTER          S1, S2
      END FUNCTION
      END INTERFACE
and remove
      LOGICAL            LSAME
      INTEGER            ILAENV
      EXTERNAL           LSAME, ILAENV
      EXTERNAL           DTRMM, DTRSM, DTRTI2, XERBLA
then it is tail call optimized back to somewhere in between r164328 (still doesn't tail call it) and r164450 (already tail calls it).  What matters is not whether the exact function/subroutine we are calling in the tail call position has interface or not, tail call optimization doesn't care about that, but the presence or absence of TYPE_ARG_TYPES on the function types that are called matters for the alias analysis (r268991 vs. r268992):
-ESCAPED = uplo
-ESCAPED = &STRING
+callarg(30) = uplo
+callarg(30) = callarg(30) + UNKNOWN
+callarg(30) = *callarg(30) + UNKNOWN
+CALLUSED(28) = callarg(30)
+CALLCLOBBERED(29) = callarg(30)
+*callarg(30) = NONLOCAL
+callarg(31) = &STRING
+callarg(31) = callarg(31) + UNKNOWN
+callarg(31) = *callarg(31) + UNKNOWN
+CALLUSED(28) = callarg(31)
+CALLCLOBBERED(29) = callarg(31)
+*callarg(31) = NONLOCAL
and similarly for many other arguments.  So, if we wanted to narrow the workaround more, we could do the DECL_HIDDEN_STRING_LENGTH setting conditional on whether the function has any calls to Fortran implicitly prototyped function.
Or if we add a command line switch, have different modes, one which would assume sane callers, another one that would do it only for functions that call implicitly prototyped functions and another that would do it always like the committed workaround does.
On the other side, those unprototyped calls could come from inlined calls too, though as a heuristics it could work, usually the codebase either is modern fortran, or is not, and given that the workaround is for undefined behavior, we are always free to do anything we want.
Comment 36 Thomas Koenig 2019-05-19 08:23:13 UTC
Author: tkoenig
Date: Sun May 19 08:22:41 2019
New Revision: 271376

URL: https://gcc.gnu.org/viewcvs?rev=271376&root=gcc&view=rev
Log:
2019-05-19  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/90329
	* invoke.texi: Document -fbroken-callers.
	* lang.opt: Add -fbroken-callers.
	* trans-decl.c (create_function_arglist): Only set
	DECL_HIDDEN_STRING_LENGTH if flag_broken_callers is set.


Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/invoke.texi
    trunk/gcc/fortran/lang.opt
    trunk/gcc/fortran/trans-decl.c
Comment 37 Conrad S 2019-05-22 04:38:40 UTC
Thanks for the workaround.
Will the patches be backported to gcc 8.x and 9.x ?
Comment 38 Steve Kargl 2019-05-22 05:02:18 UTC
On Wed, May 22, 2019 at 04:38:40AM +0000, conradsand.arma at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90329
> 
> --- Comment #37 from Conrad S <conradsand.arma at gmail dot com> ---
> Thanks for the workaround.
> Will the patches be backported to gcc 8.x and 9.x ?
> 

A better question might be: Are you going to fix your code?
Comment 39 Conrad S 2019-05-22 05:39:49 UTC
> A better question might be: Are you going to fix your code?

Yes [1], but that's besides the point here. I can certainly fix my code, but that leaves 99% of other software.

Backports to gcc 8.x and 9.x would be very beneficial.

[1] in progress: https://gitlab.com/conradsnicta/armadillo-code/issues/123
Comment 40 Jakub Jelinek 2019-05-23 09:40:54 UTC
I must say I don't like -fbroken-callers option name too much, can we use instead something like -ftail-call-workaround={0,1,2} / -f{,no-}tail-call-workaround
where -ftail-call-workaround == -ftail-call-workaround=1 would be the default (for now) and enabled only if resolve_symbol finds FL_PROCEDURE if_source == IFSRC_UNKNOWN symbols (remembered in some gfc_ns member and set on containing procedure, not say containing BLOCK), -fno-tail-call-workaround == -ftail-call-workaround=0 would disable this and -ftail-call-workaround=2 would be enabled no matter whether there are any FL_PROCEDURE IFSRC_UNKNOWN symbols or not?
Comment 41 Jakub Jelinek 2019-05-29 14:09:29 UTC
Author: jakub
Date: Wed May 29 14:08:57 2019
New Revision: 271738

URL: https://gcc.gnu.org/viewcvs?rev=271738&root=gcc&view=rev
Log:
	PR fortran/90329
	* lang.opt (fbroken-callers): Remove.
	(ftail-call-workaround, ftail-call-workaround=): New options.
	* gfortran.h (struct gfc_namespace): Add implicit_interface_calls.
	* interface.c (gfc_procedure_use): Set implicit_interface_calls
	for calls to implicit interface procedures.
	* trans-decl.c (create_function_arglist): Use flag_tail_call_workaround
	instead of flag_broken_callers.  If it is not 2, also require
	sym->ns->implicit_interface_calls.
	* invoke.texi (fbroken-callers): Remove documentation.
	(ftail-call-workaround, ftail-call-workaround=): Document.

Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/gfortran.h
    trunk/gcc/fortran/interface.c
    trunk/gcc/fortran/invoke.texi
    trunk/gcc/fortran/lang.opt
    trunk/gcc/fortran/trans-decl.c
Comment 42 Jakub Jelinek 2019-05-29 15:55:43 UTC
Author: jakub
Date: Wed May 29 15:55:12 2019
New Revision: 271743

URL: https://gcc.gnu.org/viewcvs?rev=271743&root=gcc&view=rev
Log:
	PR fortran/90329
	* lto-streamer.h (LTO_minor_version): Bump to 1.

	Backported from mainline
	2019-05-29  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* lang.opt (fbroken-callers): Remove.
	(ftail-call-workaround, ftail-call-workaround=): New options.
	* gfortran.h (struct gfc_namespace): Add implicit_interface_calls.
	* interface.c (gfc_procedure_use): Set implicit_interface_calls
	for calls to implicit interface procedures.
	* trans-decl.c (create_function_arglist): Use flag_tail_call_workaround
	instead of flag_broken_callers.  If it is not 2, also require
	sym->ns->implicit_interface_calls.
	* invoke.texi (fbroken-callers): Remove documentation.
	(ftail-call-workaround, ftail-call-workaround=): Document.

	2019-05-19  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/90329
	* invoke.texi: Document -fbroken-callers.
	* lang.opt: Add -fbroken-callers.
	* trans-decl.c (create_function_arglist): Only set
	DECL_HIDDEN_STRING_LENGTH if flag_broken_callers is set.

	2019-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* tree-core.h (struct tree_decl_common): Document
	decl_nonshareable_flag for PARM_DECLs.
	* tree.h (DECL_HIDDEN_STRING_LENGTH): Define.
	* calls.c (expand_call): Don't try tail call if caller
	has any DECL_HIDDEN_STRING_LENGTH PARM_DECLs that are or might be
	passed on the stack and callee needs to pass any arguments on the
	stack.
	* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
	else if instead of series of mutually exclusive ifs.  Handle
	DECL_HIDDEN_STRING_LENGTH for PARM_DECLs.
	* tree-streamer-out.c (pack_ts_decl_common_value_fields): Likewise.

	* trans-decl.c (create_function_arglist): Set
	DECL_HIDDEN_STRING_LENGTH on hidden string length PARM_DECLs if
	len is constant.

Modified:
    branches/gcc-9-branch/gcc/ChangeLog
    branches/gcc-9-branch/gcc/calls.c
    branches/gcc-9-branch/gcc/fortran/ChangeLog
    branches/gcc-9-branch/gcc/fortran/gfortran.h
    branches/gcc-9-branch/gcc/fortran/interface.c
    branches/gcc-9-branch/gcc/fortran/invoke.texi
    branches/gcc-9-branch/gcc/fortran/lang.opt
    branches/gcc-9-branch/gcc/fortran/trans-decl.c
    branches/gcc-9-branch/gcc/lto-streamer.h
    branches/gcc-9-branch/gcc/tree-core.h
    branches/gcc-9-branch/gcc/tree-streamer-in.c
    branches/gcc-9-branch/gcc/tree-streamer-out.c
    branches/gcc-9-branch/gcc/tree.h
Comment 43 Jakub Jelinek 2019-05-29 16:03:31 UTC
Author: jakub
Date: Wed May 29 16:02:56 2019
New Revision: 271744

URL: https://gcc.gnu.org/viewcvs?rev=271744&root=gcc&view=rev
Log:
	PR fortran/90329
	* lto-streamer.h (LTO_minor_version): Bump to 2.

	Backported from mainline
	2019-05-29  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* lang.opt (fbroken-callers): Remove.
	(ftail-call-workaround, ftail-call-workaround=): New options.
	* gfortran.h (struct gfc_namespace): Add implicit_interface_calls.
	* interface.c (gfc_procedure_use): Set implicit_interface_calls
	for calls to implicit interface procedures.
	* trans-decl.c (create_function_arglist): Use flag_tail_call_workaround
	instead of flag_broken_callers.  If it is not 2, also require
	sym->ns->implicit_interface_calls.
	* invoke.texi (fbroken-callers): Remove documentation.
	(ftail-call-workaround, ftail-call-workaround=): Document.

	2019-05-19  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/90329
	* invoke.texi: Document -fbroken-callers.
	* lang.opt: Add -fbroken-callers.
	* trans-decl.c (create_function_arglist): Only set
	DECL_HIDDEN_STRING_LENGTH if flag_broken_callers is set.

	2019-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* tree-core.h (struct tree_decl_common): Document
	decl_nonshareable_flag for PARM_DECLs.
	* tree.h (DECL_HIDDEN_STRING_LENGTH): Define.
	* calls.c (expand_call): Don't try tail call if caller
	has any DECL_HIDDEN_STRING_LENGTH PARM_DECLs that are or might be
	passed on the stack and callee needs to pass any arguments on the
	stack.
	* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
	else if instead of series of mutually exclusive ifs.  Handle
	DECL_HIDDEN_STRING_LENGTH for PARM_DECLs.
	* tree-streamer-out.c (pack_ts_decl_common_value_fields): Likewise.

	* trans-decl.c (create_function_arglist): Set
	DECL_HIDDEN_STRING_LENGTH on hidden string length PARM_DECLs if
	len is constant.

Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/calls.c
    branches/gcc-8-branch/gcc/fortran/ChangeLog
    branches/gcc-8-branch/gcc/fortran/gfortran.h
    branches/gcc-8-branch/gcc/fortran/interface.c
    branches/gcc-8-branch/gcc/fortran/invoke.texi
    branches/gcc-8-branch/gcc/fortran/lang.opt
    branches/gcc-8-branch/gcc/fortran/trans-decl.c
    branches/gcc-8-branch/gcc/lto-streamer.h
    branches/gcc-8-branch/gcc/tree-core.h
    branches/gcc-8-branch/gcc/tree-streamer-in.c
    branches/gcc-8-branch/gcc/tree-streamer-out.c
    branches/gcc-8-branch/gcc/tree.h
Comment 44 Jakub Jelinek 2019-05-29 16:07:07 UTC
Workaround added for 8.4+, 9.2+ and 10.1+ so far.
Comment 45 Kaz Kylheku 2019-06-22 06:36:13 UTC
Hi everyone.

Pardon my ignorance of C-Fortran bridging matters, but does any of the following make sense?

The Fortran subroutine has no idea whether the word where the argument is supposed to be is really the argument, or just some unrelated part of the stack (because the caller didn't pass the argument).

Can't the Fortran routine check whether that word has the expected value 1? Then in that case, just pretend it's the argument (whether that is the case, or whether that 1 is just a spurious value that looks like the right argument). In this case, just do all the tail calling to sibling routines happily, who will get the right string length.

If the word does not have value 1, then, uh oh: treat it suspiciously! The function can then *call itself* with with a *copy* of the arguments, including the correct length 1. When that call returns, it returns.   The recursively invoked activation will of course the value of 1, and then act cool.

The downside is that there is a performance penalty with this check and extra recursive call. When the callers are fixed to pass the argument, the penalty is reduced, but the check is still being made.
Comment 46 Kaz Kylheku 2019-06-22 06:51:48 UTC
C pseudocode in light of previous comment:

   double abused_fortran_fn(double x, double y, char str[1], int len)
   {
      if (len != 1)
        return abused_fortran_fn(x, y, str, 1); /* full call, not tail! */

      return tail_call_sibling(x, y, str, len);  
   }
Comment 47 Thomas Koenig 2019-06-22 10:27:01 UTC
(In reply to Kaz Kylheku from comment #45)
> Hi everyone.
> 
> Pardon my ignorance of C-Fortran bridging matters, but does any of the
> following make sense?
> 
> The Fortran subroutine has no idea whether the word where the argument is
> supposed to be is really the argument, or just some unrelated part of the
> stack (because the caller didn't pass the argument).
> 
> Can't the Fortran routine check whether that word has the expected value 1?
> Then in that case, just pretend it's the argument (whether that is the case,
> or whether that 1 is just a spurious value that looks like the right
> argument). In this case, just do all the tail calling to sibling routines
> happily, who will get the right string length.

I see two problems with this suggestion, one minor and one major.

First, there may well be a value > 1 on the stack for a regular
call, see comment #15.

Second, the value 1 may end up being there by accident, then this
method would cause a crash, and this would be even harder to debug
than the original case.
Comment 48 Kaz Kylheku 2019-06-22 17:10:08 UTC
(In reply to Thomas Koenig from comment #47)
> I see two problems with this suggestion, one minor and one major.
> 
> First, there may well be a value > 1 on the stack for a regular
> call, see comment #15.

Thus it adds overhead to cases like when "ab" is passed, where the callee treats that as a length 1 string. It has to be suspicious of the value 2 and go through the self-calling hoop.

(Oh well; obviously this is a workaround that would have to go through deprecation and removal.)

> Second, the value 1 may end up being there by accident, then this

That I clearly understand: a value 1 may be there spuriously even if no argument was passed.

> method would cause a crash, and this would be even harder to debug
> than the original case.

That's the thing I'm keenly curious about: if that value is fortuitously the right one, in the right place on the stack, where is the crash?

(Of course, we must not mutate that word; it belongs to the C caller's frame.)
Comment 49 Jakub Jelinek 2019-06-24 09:38:22 UTC
(In reply to Kaz Kylheku from comment #48)
> That's the thing I'm keenly curious about: if that value is fortuitously the
> right one, in the right place on the stack, where is the crash?
> 
> (Of course, we must not mutate that word; it belongs to the C caller's
> frame.)

In most ABIs, the argument area is owned by the callee, not caller, and this PR is exactly about mutating that area when performing a tail call.
Comment 50 Janne Blomqvist 2019-07-02 05:59:56 UTC
Jakub, should your fix be backported to the gcc-7 branch as well, considering that the PR 87689 fix was applied to that branch as well?
Comment 51 Jakub Jelinek 2019-07-02 06:08:39 UTC
Yes, I just didn't have time for that yet.
Comment 52 Jakub Jelinek 2019-08-30 12:42:18 UTC
Author: jakub
Date: Fri Aug 30 12:41:43 2019
New Revision: 275153

URL: https://gcc.gnu.org/viewcvs?rev=275153&root=gcc&view=rev
Log:
	Backported from mainline
	2019-05-29  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* lto-streamer.h (LTO_minor_version): Bump to 1.

	Backported from mainline
	2019-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* tree-core.h (struct tree_decl_common): Document
	decl_nonshareable_flag for PARM_DECLs.
	* tree.h (DECL_HIDDEN_STRING_LENGTH): Define.
	* calls.c (expand_call): Don't try tail call if caller
	has any DECL_HIDDEN_STRING_LENGTH PARM_DECLs that are or might be
	passed on the stack and callee needs to pass any arguments on the
	stack.
	* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
	else if instead of series of mutually exclusive ifs.  Handle
	DECL_HIDDEN_STRING_LENGTH for PARM_DECLs.
	* tree-streamer-out.c (pack_ts_decl_common_value_fields): Likewise.

	* lang.opt (fbroken-callers): Remove.
	(ftail-call-workaround, ftail-call-workaround=): New options.
	* gfortran.h (struct gfc_namespace): Add implicit_interface_calls.
	* interface.c (gfc_procedure_use): Set implicit_interface_calls
	for calls to implicit interface procedures.
	* trans-decl.c (create_function_arglist): Use flag_tail_call_workaround
	instead of flag_broken_callers.  If it is not 2, also require
	sym->ns->implicit_interface_calls.
	* invoke.texi (fbroken-callers): Remove documentation.
	(ftail-call-workaround, ftail-call-workaround=): Document.

	2019-05-19  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/90329
	* invoke.texi: Document -fbroken-callers.
	* lang.opt: Add -fbroken-callers.
	* trans-decl.c (create_function_arglist): Only set
	DECL_HIDDEN_STRING_LENGTH if flag_broken_callers is set.

	2019-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/90329
	* trans-decl.c (create_function_arglist): Set
	DECL_HIDDEN_STRING_LENGTH on hidden string length PARM_DECLs if
	len is constant.

Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/calls.c
    branches/gcc-7-branch/gcc/fortran/ChangeLog
    branches/gcc-7-branch/gcc/fortran/gfortran.h
    branches/gcc-7-branch/gcc/fortran/interface.c
    branches/gcc-7-branch/gcc/fortran/invoke.texi
    branches/gcc-7-branch/gcc/fortran/lang.opt
    branches/gcc-7-branch/gcc/fortran/trans-decl.c
    branches/gcc-7-branch/gcc/lto-streamer.h
    branches/gcc-7-branch/gcc/tree-core.h
    branches/gcc-7-branch/gcc/tree-streamer-in.c
    branches/gcc-7-branch/gcc/tree-streamer-out.c
    branches/gcc-7-branch/gcc/tree.h
Comment 53 Jakub Jelinek 2019-08-30 13:32:52 UTC
Backported to 7.x, please test it.
Comment 54 Tomas Kalibera 2019-09-16 10:08:24 UTC
(In reply to Jakub Jelinek from comment #53)
> Backported to 7.x, please test it.

Thanks! I tested the default behavior with DPOSV and reference LAPACK, where it worked fine. Also with all of CRAN+BIOC R packages - there is one newly failing package with 275369 compared to gfortran 7 from Ubuntu 7.4.0-8ubuntu1 with -fno-optimize-sibling-calls, but I don't think it is related to the change.