Bug 91778 - gfortran GCC9 optimizer bug
Summary: gfortran GCC9 optimizer bug
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 9.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-16 13:17 UTC by Mark Wieczorek
Modified: 2019-09-16 13:57 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wieczorek 2019-09-16 13:17:18 UTC
I am writing about a possible bug in the gfortran GCC9 optimizer on macOS (installed via brew).

Before going into the details, I note that my code (SHTOOLS/pyshtools) is widely used on many platforms and compilers. My code works with GCC8 compiled with optimizations "-O" or "-O3", and it works fine with GCC9 when compiled _without_ optimizations. I was able to "fix" my code to work with GCC9, but I feel that what I am doing is avoiding a bug in the GCC9 optimizer, and that I am not in fact "fixing" my code (perhaps I am wrong...).

The problem is related to using the FFTW3 library, which is the most widely used FFT library for scientific computing. If this is a bug, then others will probably encounter similar problems. As my code is somewhat long (and given the lack of time I have now), I will just give you a summary of two problems. If necessary, I could try to write a "small" example that reproduces these problems when I have more free time later.

I start by describing how FFTW routines are use. First, you initialize the FFT operation and get pointers to all the input and output arrays, which are stored in the variable "plan":

    call dfftw_plan_dft_c2r_1d(plan, nlong, coef, grid)

Then you perform the FFT simply by calling

    call dfftw_execute(plan)

The first problem boils down to this:

    call dfftw_plan_dft_c2r_1d(plan, nlong, coef, grid)

    coef(1) = dcmplx(coef0,0.0d0) ! A
    coef(2:lmax_comp+1) = coef(2:lmax_comp+1) / 2.0d0

    call dfftw_execute(plan) ! AA
    gridglq(i,1:nlong) = grid(1:nlong)

    coef(1) = dcmplx(coef0s,0.0d0) ! B
    coef(2:lmax_comp+1) = coefs(2:lmax_comp+1)/2.0d0

    call dfftw_execute(plan) ! BB
    gridglq(i_s,1:nlong) = grid(1:nlong)


The problem is that the optimizer thinks the line A is redundant with line B (the same variable is being defined twice). Thus, the optimizer sets line A to that of line B and deletes line B. I have verified this by doing so in my code. However, line A is necessary to execute line AA, and line B is necessary to execute line BB. The optimizer probably doesn't realize this because the variable "coef" is not explicitly included when calling the function dfftw_execute(plan).

The second problem I encountered is a little more mysterious. These are the _last_ 4 lines of the subroutine:

    coef(lmax_comp+1) = coef(lmax_comp+1) + cilm(1,lmax_comp+1,lmax_comp+1)
    coef(nlong-(lmax_comp-1)) = coef(nlong-(lmax_comp-1)) &
                                + cilm(2,lmax_comp+1,lmax_comp+1)

    call dfftw_execute(plan)

    griddh(i_eq,1:nlong) = grid(1:nlong)

The problem is that the optimizer ignores the first two lines. The reason for this is probably because (1) the variable coef is not explicitly noted in the fftw call, and (2) the variable coef is not output in the subroutine. Thus, the optimizer probably thinks that it doesn't need to compute the first two lines 

So, in summary, I believe that the GCC9 optimizer is not working correctly because it doesn't realize that the function call dfftw_execute(plan) actually depends on the variables coef and grid. Given that my code has worked well with all other versions of GCC, I suspect that there has been a change in how the optimizer works.
Comment 1 Andrew Pinski 2019-09-16 13:24:42 UTC
Are you using c bindings to bind to fftw functions?
Comment 2 kargls 2019-09-16 13:28:31 UTC
Need a reproducer.

It would also be beneficial to know what happens when
your code is compiled with -Wall -Werror -fcheck=all
-ffpe-trap=invalid,zero
Comment 3 Thomas Koenig 2019-09-16 13:30:45 UTC
(In reply to Mark Wieczorek from comment #0)
> I am writing about a possible bug in the gfortran GCC9 optimizer on macOS
> (installed via brew).
> 
> Before going into the details, I note that my code (SHTOOLS/pyshtools) is
> widely used on many platforms and compilers. My code works with GCC8
> compiled with optimizations "-O" or "-O3", and it works fine with GCC9 when
> compiled _without_ optimizations. I was able to "fix" my code to work with
> GCC9, but I feel that what I am doing is avoiding a bug in the GCC9
> optimizer, and that I am not in fact "fixing" my code (perhaps I am
> wrong...).
> 
> The problem is related to using the FFTW3 library, which is the most widely
> used FFT library for scientific computing. If this is a bug, then others
> will probably encounter similar problems. As my code is somewhat long (and
> given the lack of time I have now), I will just give you a summary of two
> problems. If necessary, I could try to write a "small" example that
> reproduces these problems when I have more free time later.

If it turns out that this is needed, please do.  However...

> I start by describing how FFTW routines are use. First, you initialize the
> FFT operation and get pointers to all the input and output arrays, which are
> stored in the variable "plan":
> 
>     call dfftw_plan_dft_c2r_1d(plan, nlong, coef, grid)

This sounds very suspicious. According to the Fortran standard, you
cannot stash away a pointer to a Fortran array unless that array
is marked as TARGET. Well, you can, but it's liable to break any time,
and apparently it did.

Can you show the declaration of dfftw_plan_dft_c2r_1d ?

> Then you perform the FFT simply by calling
> 
>     call dfftw_execute(plan)
> 
> The first problem boils down to this:
> 
>     call dfftw_plan_dft_c2r_1d(plan, nlong, coef, grid)
> 
>     coef(1) = dcmplx(coef0,0.0d0) ! A
>     coef(2:lmax_comp+1) = coef(2:lmax_comp+1) / 2.0d0
> 
>     call dfftw_execute(plan) ! AA
>     gridglq(i,1:nlong) = grid(1:nlong)
> 
>     coef(1) = dcmplx(coef0s,0.0d0) ! B
>     coef(2:lmax_comp+1) = coefs(2:lmax_comp+1)/2.0d0
> 
>     call dfftw_execute(plan) ! BB
>     gridglq(i_s,1:nlong) = grid(1:nlong)
> 
> 
> The problem is that the optimizer thinks the line A is redundant with line B
> (the same variable is being defined twice).

And that is correct behavior.

Try marking coef as TARGET or VOLATILE, this should inhibit this
optimization.


> The second problem I encountered is a little more mysterious. These are the
> _last_ 4 lines of the subroutine:
> 
>     coef(lmax_comp+1) = coef(lmax_comp+1) + cilm(1,lmax_comp+1,lmax_comp+1)
>     coef(nlong-(lmax_comp-1)) = coef(nlong-(lmax_comp-1)) &
>                                 + cilm(2,lmax_comp+1,lmax_comp+1)
> 
>     call dfftw_execute(plan)
> 
>     griddh(i_eq,1:nlong) = grid(1:nlong)
> 
> The problem is that the optimizer ignores the first two lines. The reason
> for this is probably because (1) the variable coef is not explicitly noted
> in the fftw call, and (2) the variable coef is not output in the subroutine.
> Thus, the optimizer probably thinks that it doesn't need to compute the
> first two lines 

Sounds reasonable.

> So, in summary, I believe that the GCC9 optimizer is not working correctly
> because it doesn't realize that the function call dfftw_execute(plan)
> actually depends on the variables coef and grid. Given that my code has
> worked well with all other versions of GCC, I suspect that there has been a
> change in how the optimizer works.

I assume that your program was always non-conforming, and that gcc
has simply gotten better at finding optimization opportunities.
Comment 4 Mark Wieczorek 2019-09-16 13:57:12 UTC
Thanks for the help. After realizing that the fftw_execute call was in fact suspicious I went to their web site and found that it had been updated recently. They state that 

"we have had reports that this causes problems with some recent optimizing Fortran compilers. The problem is, because the input/output arrays are not passed as explicit arguments to dfftw_execute, the semantics of Fortran (unlike C) allow the compiler to assume that the input/output arrays are not changed by dfftw_execute. As a consequence, certain compilers end up optimizing out or repositioning the call to dfftw_execute, assuming incorrectly that it does nothing."

They then suggest using new convenience functions that are like

call fftw_execute(plan, coef, grid)

where the coef and grid variable are just placeholders so that optimizer understands the dependencies. 

I am going to consider this closed. Thanks again!