Bug 110311 - [14 Regression] regression in tree-optimizer
Summary: [14 Regression] regression in tree-optimizer
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 14.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 110326 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-06-19 12:45 UTC by Jürgen Reuter
Modified: 2024-03-27 13:41 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Smaller stand-alone reproducer (41.12 KB, application/x-gzip)
2023-07-06 19:40 UTC, Jürgen Reuter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jürgen Reuter 2023-06-19 12:45:49 UTC
Hi,
let me open up an issue already. I believe there was a regression introduced in gfortran between June 11 and June 19, as our CI with a git-clone built gcc/gfortran worked last week, and fails this week. Two out of our ca. 340 functional tests fail because they return zero results. I will try to boil this down to a smaller reproducer (fingers crossed), but if you want to play around already, checkout our code from here:
https://gitlab.tp.nt.uni-siegen.de/whizard/public
Note that you need noweb and OCaml besides gcc/gfortran. Do in the main directory ./build_master.sh and autoreconf, then in a build directory _build 
do ../configure, make -j4, make -C tests/functional_tests -j4 check.
The failing tests are nlo_9.run and nlo_10.run in case you want to run them already now.
Cheers,
 Juergen
Comment 1 Jürgen Reuter 2023-06-20 06:29:50 UTC
It looks like there were no specific changes in the fortran backend or the libgfortran but a lot of optimization in the middle-end. Maybe that is responsible for this issue. Need to see what is going on.
Comment 2 Jürgen Reuter 2023-06-20 07:48:33 UTC
Actually, it could have been this commit here:
2023-06-13  Harald Anlauf  <anlauf@gmx.de>
            Mikael Morin  <mikael@gcc.gnu.org>

        PR fortran/86277
        * trans-array.cc (gfc_trans_allocate_array_storage): When passing a
        zero-sized array with fixed (= non-dynamic) size, allocate temporary
        by the caller, not by the callee.
Comment 3 Jürgen Reuter 2023-06-20 10:13:56 UTC
I redid this change here:
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e1c75e9fe0266d760b635f0dc7869a00ce53bf48..e7c51bae052b1e0e3a60dee35484c093d28d4653 100644 (file)
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -1117,7 +1117,7 @@ gfc_trans_allocate_array_storage (stmtblock_t * pre, stmtblock_t * post,
 
   desc = info->descriptor;
   info->offset = gfc_index_zero_node;
-  if (size == NULL_TREE || integer_zerop (size))
+  if (size == NULL_TREE || (dynamic && integer_zerop (size)))
     {
       /* A callee allocated array.  */
       gfc_conv_descriptor_data_set (pre, desc, null_pointer_node);

and it seems this is not the cause of the problem :(
Comment 4 anlauf 2023-06-20 17:35:40 UTC
Jürgen,

I'm afraid we need a reproducer.  Or can you bisect the regression further?
Comment 5 Jürgen Reuter 2023-06-20 17:38:11 UTC
(In reply to anlauf from comment #4)
> Jürgen,
> 
> I'm afraid we need a reproducer.  Or can you bisect the regression further?

In principle, I could. But I just undid this commit of yours which is just one line in trans-array.cc, and that didn't solve the problem. So in the corresponding period of time between last Monday (June 12) and this week (June 19) there have not been any other commits to gcc/fortran or libgfortran, as far as I can say. So this seems to be a problem with tree-optimization, maybe.
Comment 6 Manolis Tsamis 2023-06-21 06:43:06 UTC
Hi,

Due to the time frame mentioned (June 12-19), could you please test if the offending commit is r14-1873-g6a2e8dcbbd4bab374b27abea375bf7a921047800 ? This commit is now known to cause general issues, as also described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308.

Thanks,
Manolis
Comment 7 Jürgen Reuter 2023-06-21 12:53:50 UTC
The problem seems really connected to optimization, if I compile our code with -g -O0 or -g -O1, everything works ok. Next, I will try to check why it is actually failing (my guess, unconfirmed yet, is that some data structures are optimized away such that the program runs then on inconsistent data). Then I will check that specific commit. We are sure that it was introduced within this time frame, because we have a weekly CI that clones gcc, and then builds and runs our code and testsuite. That was working on the morning of June 12, but failed on the morning of June 19.
Comment 8 Andrew Pinski 2023-06-21 23:37:41 UTC
(In reply to Jürgen Reuter from comment #7)
> The problem seems really connected to optimization, if I compile our code
> with -g -O0 or -g -O1, everything works ok. Next, I will try to check why it
> is actually failing (my guess, unconfirmed yet, is that some data structures
> are optimized away such that the program runs then on inconsistent data).
> Then I will check that specific commit. We are sure that it was introduced
> within this time frame, because we have a weekly CI that clones gcc, and
> then builds and runs our code and testsuite. That was working on the morning
> of June 12, but failed on the morning of June 19.

Do you know if -fno-tree-vectorizer causes the issue to go away?
Comment 9 Jürgen Reuter 2023-06-22 11:48:30 UTC
(In reply to Andrew Pinski from comment #8)
> (In reply to Jürgen Reuter from comment #7)
> > The problem seems really connected to optimization, if I compile our code
> > with -g -O0 or -g -O1, everything works ok. Next, I will try to check why it
> > is actually failing (my guess, unconfirmed yet, is that some data structures
> > are optimized away such that the program runs then on inconsistent data).
> > Then I will check that specific commit. We are sure that it was introduced
> > within this time frame, because we have a weekly CI that clones gcc, and
> > then builds and runs our code and testsuite. That was working on the morning
> > of June 12, but failed on the morning of June 19.
> 
> Do you know if -fno-tree-vectorizer causes the issue to go away?

Hi Andrew,
you were right. Compiling and running with -fno-tree-vectorize does not show any issues. All our checks work without problems.
Cheers,
   Juergen
Comment 10 Jürgen Reuter 2023-06-22 11:54:30 UTC
*** Bug 110326 has been marked as a duplicate of this bug. ***
Comment 11 Jürgen Reuter 2023-06-23 22:01:52 UTC
(In reply to manolis.tsamis from comment #6)
> Hi,
> 
> Due to the time frame mentioned (June 12-19), could you please test if the
> offending commit is r14-1873-g6a2e8dcbbd4bab374b27abea375bf7a921047800 ?
> This commit is now known to cause general issues, as also described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308.
> 
> Thanks,
> Manolis

Unfortunately, this is not the problematic commit, the problem is still there when reverting this commit.
Comment 12 Jürgen Reuter 2023-06-23 23:23:17 UTC
Any idea which commit could cause such an issue? At least I now understand that in our program the random number object gets undefined and produces NaNs.
Comment 13 Jürgen Reuter 2023-06-24 09:21:39 UTC
I changed the component from fortran to tree-optimization, as the problematic commit during that week was in that component. The only commit in the fortran component turns out to be unproblematic.
Comment 14 Jürgen Reuter 2023-06-24 10:33:24 UTC
Did anybody manage to reproduce this? 
Download https://whizard.hepforge.org/downloads/?f=whizard-3.1.2.tar.gz
You need OCaml as a prerequisite, though.
Then configure, make, 
cd tests/functional_tests
make check TESTS=nlo_9.run
This will fail, as there are NaNs produced in our RNG module which are presumably caused by this regression in the tree-optimizer. At the moment I am deeply struggling with generating a reproducer but I don't know how tbh.
Comment 15 anlauf 2023-06-24 11:47:17 UTC
(In reply to Jürgen Reuter from comment #14)
> Did anybody manage to reproduce this? 
> Download https://whizard.hepforge.org/downloads/?f=whizard-3.1.2.tar.gz
> You need OCaml as a prerequisite, though.
> Then configure, make, 
> cd tests/functional_tests
> make check TESTS=nlo_9.run
> This will fail, as there are NaNs produced in our RNG module which are
> presumably caused by this regression in the tree-optimizer. At the moment I
> am deeply struggling with generating a reproducer but I don't know how tbh.

I may be telling you the obvious, but here's what I do in cases where changes
in optimization in new compilers cause failures and recompiling is expensive:

- create standalone-version of Fortran code and testcase
- have two build trees in parallel, (a) working and (b) failing
- relink by successively replacing objects in (a) by those from (b)
- run each binary until the failure occurs

In your case you are lucky in that you get a crash.

If testing is expensive, it may be worth to do bisecting on sets of objects.

I avoid building of shared libs for the project to ease testing.

Note: there might be multiple bad objects.

This works for me even with $$$$ compilers on $$$$ platforms, even if that
takes a day or two.
Comment 16 Jürgen Reuter 2023-06-24 13:04:35 UTC
It seems that it is this function where the NaNs appear:
  function mult_mod (a, b, c, m) result (v)
    real(default), intent(in) :: a
    real(default), intent(in) :: b
    real(default), intent(in) :: c
    real(default), intent(in) :: m
    real(default) :: v
    integer :: a1
    real(default) :: a2
    v = a * b + c
    if (v >= two53 .or. v <= -two53) then
       a1 = int (a / two17)
       a2 = a - a1 * two17
       v = mod (a1 * b, m)
       v = v * two17 + a2 * b + c
    end if
    v = mod (v, m)
    if (v < 0.0_default) v = v + m
  end function mult_mod

particularly mod (v, m) gets evaluated to NaN, even if a replace it with
v = mod (v0, m) to avoid potential aliasing problems. It appears only in a very complex setup, not in a 100 line program.
Comment 17 Jürgen Reuter 2023-06-24 13:09:01 UTC
How would I set up such a bisection for the n git commits between June 12 to June 19? Unfortunately, I cannot really get a small reproducer ....
Comment 18 anlauf 2023-06-24 13:41:18 UTC
(In reply to Jürgen Reuter from comment #17)
> How would I set up such a bisection for the n git commits between June 12 to
> June 19? Unfortunately, I cannot really get a small reproducer ....

I didn't mean that.  I meant doing a bisection on the .o files of your code.

But given that you have isolated a procedure, that is not necessary.

You could try to defeat optimization by using a temporary v0 for v and
declare it as volatile.  Would be interesting to see if that makes a
difference.
Comment 19 Jürgen Reuter 2023-06-24 13:43:42 UTC
(In reply to anlauf from comment #18)
> (In reply to Jürgen Reuter from comment #17)
> > How would I set up such a bisection for the n git commits between June 12 to
> > June 19? Unfortunately, I cannot really get a small reproducer ....
> 
> I didn't mean that.  I meant doing a bisection on the .o files of your code.
> 
> But given that you have isolated a procedure, that is not necessary.
> 
> You could try to defeat optimization by using a temporary v0 for v and
> declare it as volatile.  Would be interesting to see if that makes a
> difference.

I tried both things, or at least partially, didn't help. It also is a problem only when called in a very complicated setup in our program, in complicated setups, it works. I fear, we have to change the functionality in our program, sadly, if we are not to be stuck for all times to version of gcc < 14.
Comment 20 anlauf 2023-06-24 13:46:49 UTC
If that doesn't help: there appear to be recent optimizations for divmod.
Try declaring a1, a2 as volatile.
Comment 21 anlauf 2023-06-24 18:21:56 UTC
I forgot to mention that you need to check that the location where a symptom
is seen sometimes may not be the location of the cause.
Comment 22 Jürgen Reuter 2023-06-24 18:54:14 UTC
(In reply to anlauf from comment #21)
> I forgot to mention that you need to check that the location where a symptom
> is seen sometimes may not be the location of the cause.

Indeed, I think you are right and the problem is elsewhere. I don't really know where to continue.
Comment 23 anlauf 2023-06-24 19:16:50 UTC
You could check the input arguments for validity, e.g. using ieee_is_finite
from the intrinsic ieee_arithmetic module.

  use, intrinsic :: ieee_arithmetic, only: ieee_is_finite

...

  if (.not. ieee_is_finite (a)) then
     print *, "bad: a=", a
     stop 1
  end if

As last resort I still recommend what I wrote in comment#15: build (=link)
your executable from *.o from your project build tree with known-good objects
but replacing one candidate.o by the one from the build tree showing the
problem.

And I really mean: link only und run.
Comment 24 Jürgen Reuter 2023-06-29 14:39:36 UTC
Here is a first reproducer without the need for OCaml, unfortunately a bit too big to be uploaded, here is the link:
https://www.desy.de/~reuter/downloads/repro001.tar.xz
the tarball contains Fortran files that compile to two binaries, ./whizard and ./whizard_check.
After compilation, perform ./whizard r1.sin 
to run the program. There will be NaNs generated in our RNG stream random number generator. They originate from an erroneous optimization by the gcc/gfortran tree-optimizer. This code resides in rng_stream_sub.f90, in the function mult_mod. Eliminating the intrinsic function mod and explicitly doing the calculation makes the problem go away.
  function mult_mod (a, b, c, m) result (v)
    real(default), intent(in) :: a
    real(default), intent(in) :: b
    real(default), intent(in) :: c
    real(default), intent(in) :: m
    real(default) :: v
    integer :: a1
    real(default) :: a2
    v = a * b + c
    if (v >= two53 .or. v <= -two53) then
       a1 = int (a / two17)
       a2 = a - a1 * two17
       v = mmm_mod (a1 * b, m)
       v = v * two17 + a2 * b + c
    end if
    v = mmm_mod (v, m)
    if (v < 0.0_default) v = v + m
  contains
    elemental function mmm_mod (x1, x2) result (res)
      real(default), intent(in) :: x1, x2
      real(default) :: res
      res = x1 - int(x1/x2) * x2
    end function mmm_mod
  end function mult_mod
Comment 25 anlauf 2023-06-29 17:29:13 UTC
(In reply to Jürgen Reuter from comment #24)
> Here is a first reproducer without the need for OCaml, unfortunately a bit
> too big to be uploaded, here is the link:
> https://www.desy.de/~reuter/downloads/repro001.tar.xz
> the tarball contains Fortran files that compile to two binaries, ./whizard
> and ./whizard_check.

Unfortunately, there is no main.f90, which is needed to build whizard.

The Makefile needs to be modified to take into account that pythia.f
needs preprocessing, e.g.:

%.o: %.f
	$(FC) $(FCFLAGS) -c $< -cpp

Furthermore, one needs to compile serially; parallel make does not seem to
be supported.

Can you please provide the missing file?
Comment 26 Jürgen Reuter 2023-06-29 17:47:20 UTC
(In reply to anlauf from comment #25)

> Unfortunately, there is no main.f90, which is needed to build whizard.
>

Indeed, sorry, cf. below
 
> The Makefile needs to be modified to take into account that pythia.f
> needs preprocessing, e.g.:
> 
> %.o: %.f
> 	$(FC) $(FCFLAGS) -c $< -cpp
> 
> Furthermore, one needs to compile serially; parallel make does not seem to
> be supported.

I changed the pythia.f to make the preprocessing unnecessary.

> 
> Can you please provide the missing file?

It is included here:
https://www.desy.de/~reuter/downloads/repro002.tar.xz
I am working on a smaller example right now.
Comment 27 anlauf 2023-06-29 18:59:34 UTC
(In reply to Jürgen Reuter from comment #26)
> It is included here:
> https://www.desy.de/~reuter/downloads/repro002.tar.xz
> I am working on a smaller example right now.

Good.  I can reproduce the failure, but here's what others need to know:

- I have to

rm -f nlo_9_p2.i1.phs nlo_9_p2.m1.vg2

  each time *before* running the test. ???

- I am using the modification to rng_stream_sub.f90 from comment#24 with the
  printout added

- I am switching between

      res = mod (x1, x2)

and
      res = x1 - int(x1/x2) * x2

- I am disabling optimization completely for this file and added to Makefile:

rng_stream_sub.o: rng_stream_sub.f90
	$(FC) $(FCFLAGS) -c $< -O0 -fdump-tree-original -fdump-tree-optimized


which gives (v1 is with intrinsic mod, v2 is with explicitly coded mod):

--- rng_stream_sub.f90.005t.original.v1 2023-06-29 20:44:58.148284991 +0200
+++ rng_stream_sub.f90.005t.original.v2 2023-06-29 20:45:45.408160849 +0200
@@ -3,7 +3,7 @@
 {
   real(kind=8) res;
 
-  res = __builtin_fmod (*x1, *x2);
+  res = *x1 - (real(kind=8)) (integer(kind=4)) (*x1 / *x2) * *x2;
   return res;
 }

as expected.  The dump-tree-optimized looks unsuspicious to me.
Comment 28 anlauf 2023-06-29 19:33:56 UTC
Update: recompiling that file with 13-branch fails for me, too.
Playing with the one-line patch for pr86277 makes no difference, fortunately.

Compiling the file with gfortran-12 seems to work ok.

So is this really a 14-only regression, or is 13-branch already suspicious?
Comment 29 Jürgen Reuter 2023-06-29 19:35:48 UTC
(In reply to anlauf from comment #28)
> Update: recompiling that file with 13-branch fails for me, too.
> Playing with the one-line patch for pr86277 makes no difference, fortunately.
> 
> Compiling the file with gfortran-12 seems to work ok.
> 
> So is this really a 14-only regression, or is 13-branch already suspicious?

We have gcc 13.1 in our CI, everything works fine there. I am still working on a smaller test, but have very bad connection rn.
Comment 30 anlauf 2023-06-29 20:07:54 UTC
BTW: you can get a traceback on FP exceptions by adding to the linker options:

 -ffpe-trap=zero,overflow,invalid
Comment 31 anlauf 2023-06-30 17:45:12 UTC
Looking at rng_stream_sub.o with objdump, I see fprem generated for 13 & 14,
but not for 12.

I haven't yet found an option to suppress its generation and fall back to
the behavior of 12-branch.
Comment 32 Jakub Jelinek 2023-06-30 17:50:34 UTC
Then maybe r13-6361-g8020c9c42349f51f75239b
is the commit that changed it?
Would be good to put a breakpoint at that instruction and see in which iteration it results in NaN and what operands it had...
Comment 33 anlauf 2023-06-30 18:32:46 UTC
(In reply to Jakub Jelinek from comment #32)
> Then maybe r13-6361-g8020c9c42349f51f75239b
> is the commit that changed it?
> Would be good to put a breakpoint at that instruction and see in which
> iteration it results in NaN and what operands it had...

Program received signal SIGFPE, Arithmetic exception.
0x0000000000678f1a in rng_stream.rng_stream_s::mmm_mod (x1=330289839997, x2=4294967087) at rng_stream_sub.f90:336
336         res = mod (x1, x2)
(gdb) p x1
$1 = 330289839997
(gdb) p x2
$2 = 4294967087

Strangely enough, a small testcase with these arguments does not fail...
Comment 34 anlauf 2023-06-30 18:57:52 UTC
A few more data points:

reverting r13-6361-g8020c9c42349f51f75239b on 13-branch fixes the issue:
no fprem generated, no FPE.

Adding -ffinite-math-only to the modified 13-branch restores the FPE.

Compiling the affected module (only) with 12-branch and linking everything
with 14-mainline shows the same: fprem is used only with -ffinite-math-only,
and I get an FPE even with 12-branch in that case.

Same with 11-branch.

I am still not sure why it cannot be reproduced with a smaller example,
thus I hope that Jürgen can provide a significantly smaller reproducer.
Comment 35 Uroš Bizjak 2023-06-30 19:25:27 UTC
(In reply to anlauf from comment #33)
> (In reply to Jakub Jelinek from comment #32)
> > Then maybe r13-6361-g8020c9c42349f51f75239b
> > is the commit that changed it?
> > Would be good to put a breakpoint at that instruction and see in which
> > iteration it results in NaN and what operands it had...
> 
> Program received signal SIGFPE, Arithmetic exception.
> 0x0000000000678f1a in rng_stream.rng_stream_s::mmm_mod (x1=330289839997,
> x2=4294967087) at rng_stream_sub.f90:336
> 336         res = mod (x1, x2)
> (gdb) p x1
> $1 = 330289839997
> (gdb) p x2
> $2 = 4294967087
> 
> Strangely enough, a small testcase with these arguments does not fail...

Please show the FP registers (and coprocessor state, 'info float') just before the FPREM instruction. These two values (as shown) are nothing special, but perhaps FP register value contains something that FPREM does not like. Also, please show the state after FPREM is executed. Please note that FPREM is performed in the loop, so perhaps a couple of trips through the loop will be needed.
Comment 36 anlauf 2023-06-30 19:30:05 UTC
Breakpoint 2, rng_stream.rng_stream_s::mmm_mod (x1=330289839997, x2=4294967087) at rng_stream_sub.f90:336
336         res = mod (x1, x2)
(gdb) info float
  R7: Valid   0x401be51fb57800000000 +480507567                 
  R6: Valid   0x401be51fb57800000000 +480507567                 
  R5: Zero    0x00000000000000000000 +0                         
  R4: Zero    0x00000000000000000000 +0                         
  R3: Zero    0x00000000000000000000 +0                         
  R2: Zero    0x00000000000000000000 +0                         
  R1: Zero    0x00000000000000000000 +0                         
=>R0: Special 0xffff0000000004f5dc90 Unsupported

Status Word:         0x0000                                            
                       TOP: 0
Control Word:        0x0372      DM       UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x0556
Instruction Pointer: 0x00:0x00000000
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000
(gdb) n

Program received signal SIGFPE, Arithmetic exception.
0x0000000000678e6a in rng_stream.rng_stream_s::mmm_mod (x1=330289839997, x2=4294967087) at rng_stream_sub.f90:336
336         res = mod (x1, x2)
Comment 37 anlauf 2023-06-30 19:32:00 UTC
After the FPE:

(gdb) info float
  R7: Valid   0x401be51fb57800000000 +480507567                 
  R6: Valid   0x401be51fb57800000000 +480507567                 
  R5: Zero    0x00000000000000000000 +0                         
  R4: Zero    0x00000000000000000000 +0                         
  R3: Zero    0x00000000000000000000 +0                         
  R2: Zero    0x00000000000000000000 +0                         
  R1: Zero    0x00000000000000000000 +0                         
=>R0: Special 0xffff0000000004f5dc90 Unsupported

Status Word:         0x82c1   IE                  ES   SF      C1      
                       TOP: 0
Control Word:        0x0372      DM       UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x0556
Instruction Pointer: 0x00:0x004031d2
Operand Pointer:     0x00:0x011b5708
Opcode:              0xdd45
Comment 38 Jürgen Reuter 2023-06-30 19:46:08 UTC
At the moment unfortunately too busy to provide a smaller reproducer (which also still has a small dependency on a dynamic library), but one more info: inserting the explicit operations instead of the intrinsic mod function leads to no more NaNs with the gfortran 14, but still is numerically different from the one with previous gfortran versions: so it looks like it leads to a different random number sequence which is really disturbing.
Comment 39 Uroš Bizjak 2023-06-30 19:47:54 UTC
(In reply to anlauf from comment #36)
> Breakpoint 2, rng_stream.rng_stream_s::mmm_mod (x1=330289839997,
> x2=4294967087) at rng_stream_sub.f90:336
> 336         res = mod (x1, x2)
> (gdb) info float
>   R7: Valid   0x401be51fb57800000000 +480507567                 
>   R6: Valid   0x401be51fb57800000000 +480507567                 
>   R5: Zero    0x00000000000000000000 +0                         
>   R4: Zero    0x00000000000000000000 +0                         
>   R3: Zero    0x00000000000000000000 +0                         
>   R2: Zero    0x00000000000000000000 +0                         
>   R1: Zero    0x00000000000000000000 +0                         
> =>R0: Special 0xffff0000000004f5dc90 Unsupported

Here is the problem. FPREM chokes on invalid input in R0.

[1] Says that IA (invalid arithmetic) exception is generated for unsupported format, and this is what happened above:

#IA 	Source operand is an SNaN value, modulus is 0, dividend is ∞, or unsupported format.

[1] https://www.felixcloutier.com/x86/fprem
Comment 40 anlauf 2023-06-30 20:19:10 UTC
(In reply to Jürgen Reuter from comment #38)
> At the moment unfortunately too busy to provide a smaller reproducer (which
> also still has a small dependency on a dynamic library),

I have just commented out the references to dlopen, dlclose, dlsym, dlerror
in os_interface_sub.f90, removed the -ldl and can still reproduce the failure.
Comment 41 Jakub Jelinek 2023-06-30 21:09:48 UTC
(In reply to Uroš Bizjak from comment #39)
> (In reply to anlauf from comment #36)
> > Breakpoint 2, rng_stream.rng_stream_s::mmm_mod (x1=330289839997,
> > x2=4294967087) at rng_stream_sub.f90:336
> > 336         res = mod (x1, x2)
> > (gdb) info float
> >   R7: Valid   0x401be51fb57800000000 +480507567                 
> >   R6: Valid   0x401be51fb57800000000 +480507567                 
> >   R5: Zero    0x00000000000000000000 +0                         
> >   R4: Zero    0x00000000000000000000 +0                         
> >   R3: Zero    0x00000000000000000000 +0                         
> >   R2: Zero    0x00000000000000000000 +0                         
> >   R1: Zero    0x00000000000000000000 +0                         
> > =>R0: Special 0xffff0000000004f5dc90 Unsupported

0xffff0000000004f5dc90 is pseudo NaN:
Pseudo Not a Number. The sign bit is meaningless. The 8087 and 80287 treat this as a Signaling Not a Number. The 80387 and later treat this as an invalid operand.
So, if that comes from some random number generator, I'd say that random number generator should be fixed not to create the erroneous cases for
https://en.wikipedia.org/wiki/Extended_precision
Comment 42 Jürgen Reuter 2023-06-30 21:48:33 UTC
(In reply to Jakub Jelinek from comment #41)
> 
> 0xffff0000000004f5dc90 is pseudo NaN:
> Pseudo Not a Number. The sign bit is meaningless. The 8087 and 80287 treat
> this as a Signaling Not a Number. The 80387 and later treat this as an
> invalid operand.
> So, if that comes from some random number generator, I'd say that random
> number generator should be fixed not to create the erroneous cases for
> https://en.wikipedia.org/wiki/Extended_precision

Hm, the example provided does not use extended precision.
Comment 43 anlauf 2023-07-01 07:55:16 UTC
Mabye the fprem issue was a red herring from the beginning, pointing to a
problem in a different place.

I recompiled each module in a loop with -O0 until the FPE went away.

instances_sub.f90 seems the file someone wants to look at.

Works at -O0, -O1, -Os, -O2 -fno-tree-vectorize
Fails at -O2, -O3

on x86_64-pc-linux-gnu.

Jürgen: can you reduce this even more with this information?
Comment 44 Jürgen Reuter 2023-07-01 08:09:39 UTC
(In reply to anlauf from comment #43)
> Mabye the fprem issue was a red herring from the beginning, pointing to a
> problem in a different place.
> 
> I recompiled each module in a loop with -O0 until the FPE went away.
> 
> instances_sub.f90 seems the file someone wants to look at.
> 
> Works at -O0, -O1, -Os, -O2 -fno-tree-vectorize
> Fails at -O2, -O3
> 
> on x86_64-pc-linux-gnu.
> 
> Jürgen: can you reduce this even more with this information?

Thanks, this info is helpful. So it is the setting up of the full process via the instances module, which is in agreement with the fact that the simple test with only the RNG did not fail. I will be busy for several days, but hopefully in a week from now, I'll know more.
Comment 45 Jürgen Reuter 2023-07-06 19:40:09 UTC
Created attachment 55492 [details]
Smaller stand-alone reproducer

I will give more information in a comment, this contains 3 files and a Makefile.
Comment 46 Jürgen Reuter 2023-07-06 19:44:58 UTC
(In reply to Jürgen Reuter from comment #45)
> Created attachment 55492 [details]
> Smaller stand-alone reproducer
> 
> I will give more information in a comment, this contains 3 files and a
> Makefile.

This is a standalone reproducer with a total of 8k lines. It needs to be in three different files, as fusing the 2nd and 3rd file eliminates the optimizer problem of this issue, while fusing the 1st and the 2nd leeds to an ICE in trans-array.c (reported separately) and is independent of this problem here.
The issue goes away with -O0, with -O1 and with -O2 -fno-tree-vectorize. 
I might want to find the offending commit in the week of June 12-19 in the tree-optimizer, but I don't know whether I have time to do so. Hopefully, with this 
smaller reproducer you can figure out what happens (and help solving it)
Comment 47 anlauf 2023-07-06 21:18:29 UTC
(In reply to Jürgen Reuter from comment #46)
> The issue goes away with -O0, with -O1 and with -O2 -fno-tree-vectorize. 
> I might want to find the offending commit in the week of June 12-19 in the
> tree-optimizer, but I don't know whether I have time to do so. Hopefully,
> with this 
> smaller reproducer you can figure out what happens (and help solving it)

I recommend adding -ffpe-trap=zero,overflow,invalid to the flags.

It is code2.f90 that is sensible to -ftree-vectorize; the two other files
can be compiled even with -O3.

However, when I use -O2 together with an -march= flag, the code works.
I've tested -march=sandybridge, -march=haswell, -march=skylake, -march=native.
It FPEs without.

Do you see the same?
Comment 48 anlauf 2023-07-06 21:38:45 UTC
(In reply to anlauf from comment #47)
> However, when I use -O2 together with an -march= flag, the code works.
> I've tested -march=sandybridge, -march=haswell, -march=skylake,
> -march=native.
> It FPEs without.

And it FPEs with core2,nehalem,westmere!

Next I tried:

-march=sandybridge -mno-avx  # FPE!
-march=sandybridge           # OK.
Comment 49 Jürgen Reuter 2023-07-07 12:44:43 UTC
(In reply to anlauf from comment #48)
> (In reply to anlauf from comment #47)
> > However, when I use -O2 together with an -march= flag, the code works.
> > I've tested -march=sandybridge, -march=haswell, -march=skylake,
> > -march=native.
> > It FPEs without.
> 
> And it FPEs with core2,nehalem,westmere!
> 
> Next I tried:
> 
> -march=sandybridge -mno-avx  # FPE!
> -march=sandybridge           # OK.

Yes, I can fully confirm your findings, also the ones from comment #47. I was looking at the commits in the period June 12-18 which could have caused this,
some which seem potential candidates are:
2023-06-18  Honza  <jh@ryzen3.suse.cz>
        PR tree-optimization/109849
2023-06-16  Jakub Jelinek  <jakub@redhat.com>
        PR tree-optimization/110271
        * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
        <case PLUS_EXPR>: Ignore return value from match_arith_overflow,
        instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.
(This one sounds pretty suspicious to me)
2023-06-16  Richard Biener  <rguenther@suse.de>
        PR tree-optimization/110269
        * fold-const.cc (fold_binary_loc): Merge x != 0 folding
2023-06-13  Alexandre Oliva  <oliva@adacore.com>
        * range-op-float.cc (frange_nextafter): Drop inline.
        (frelop_early_resolve): Add static.
        (frange_float): Likewise
2023-06-12  Andrew MacLeod  <amacleod@redhat.com>
        PR tree-optimization/110205
        * range-op-float.cc (range_operator::fold_range): Add default FII
        fold routine.
        * range-op-mixed.h (class operator_gt): Add missing final overrides.
        * range-op.cc (range_op_handler::fold_range): Add RO_FII case.
2023-06-12  Andrew MacLeod  <amacleod@redhat.com>
        * gimple-range-gori.cc (gori_compute::condexpr_adjust): Do not
        pass type.
        [...]
(there is a long list of commits by Andrew on June 12)
2023-06-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
        PR middle-end/110142
        * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Don't pass
        subtype to vect_widened_op_tree and remove subtype parameter, also
        remove superfluous overloaded function definition.
        (vect_recog_widen_plus_pattern): Remove subtype parameter and dont pass
        to call to vect_recog_widen_op_pattern.
        (vect_recog_widen_minus_pattern): Likewise.
(^^^ this one also looks suspicious to me)

Any ideas which could have caused the changes?
Comment 50 Jürgen Reuter 2023-08-09 09:02:14 UTC
How to proceed here? Since almost exactly a month the current gcc git master doesn't show this problem anymore, from our CI I can deduce that the version on July 3rd still failed, while the version on July 10th worked again. Since then the problem didn't show up again. My guess is that something has changed in the optimizer again (maybe because of a different problem/regression). Is it worth to find the offending commit and see when and how it was fixed (maybe even accidentally), or shall we add a gcc testsuite for regression testing, and close this issue?
Comment 51 Jakub Jelinek 2023-08-09 09:24:03 UTC
The easiest would be to bisect gcc in the suspected ranges, that way you'd know for sure which git commit introduced the problem and which fixed/"fixed" it.
If it is about what the compiler emits, one doesn't have to build whole gcc from scratch each time, but can just --disable-bootstrap build it and during bisection
whenever git is updated just ./config.status --recheck; ./config.status; make -jN in libcpp, libiberty and gcc subdirectories and use f951/gfortran binariers from that instead of the ones from the initial build to build your project.
Comment 52 Jürgen Reuter 2023-08-25 21:17:37 UTC
(In reply to Jakub Jelinek from comment #51)
> The easiest would be to bisect gcc in the suspected ranges, that way you'd
> know for sure which git commit introduced the problem and which
> fixed/"fixed" it.
> If it is about what the compiler emits, one doesn't have to build whole gcc
> from scratch each time, but can just --disable-bootstrap build it and during
> bisection
> whenever git is updated just ./config.status --recheck; ./config.status;
> make -jN in libcpp, libiberty and gcc subdirectories and use f951/gfortran
> binariers from that instead of the ones from the initial build to build your
> project.

This was the offending commit by Richard Sayle, on Saturday June 17:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=96c3539f2a38134cb76d8ab2e924e0dc70b2ccbd

=================================
i386: Two minor tweaks to ix86_expand_move.

This patch splits out two (independent) minor changes to i386-expand.cc's
ix86_expand_move from a larger patch, given that it's better to review
and commit these independent pieces separately from a more complex patch.

The first change is to test for CONST_WIDE_INT_P before calling
ix86_convert_const_wide_int_to_broadcast.  Whilst stepping through
this function in gdb, I was surprised that the code was continually
jumping into this function with operands that obviously weren't
appropriate.

The second change is to generalize the optimization for efficiently
moving a TImode value to V1TImode (via V2DImode), to cover all 128-bit
vector modes.

Hence for the test case:

typedef unsigned long uv2di __attribute__ ((__vector_size__ (16)));
uv2di foo2(__int128 x) { return (uv2di)x; }

we'd previously move via memory with:

foo2:   movq    %rdi, -24(%rsp)
        movq    %rsi, -16(%rsp)
        movdqa  -24(%rsp), %xmm0
        ret

with this patch we now generate with -O2 (the same as V1TImode):

foo2:   movq    %rdi, %xmm0
        movq    %rsi, %xmm1
        punpcklqdq      %xmm1, %xmm0
        ret

and with -O2 -msse4 the even better:

foo2:   movq    %rdi, %xmm0
        pinsrq  $1, %rsi, %xmm0
        ret

The new test case is unimaginatively called sse2-v1ti-mov-2.c given
the original test case just for V1TI mode was called sse2-v1ti-mov-1.c.

2023-06-17  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Check that OP1 is
CONST_WIDE_INT_P before calling ix86_convert_wide_int_to_broadcast.
Generalize special case for converting TImode to V1TImode to handle
all 128-bit vector conversions.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-2.c: New test case.
===========================================================

Now the question is, was this commit later reverted? Or changed in a different manner
Comment 53 Jürgen Reuter 2023-08-25 21:18:31 UTC
Additional comment: the commit which fixed/"fixed" this offending commit came between July 3 and July 10.
Comment 54 Jürgen Reuter 2023-08-25 21:23:13 UTC
(In reply to Jürgen Reuter from comment #53)
> Additional comment: the commit which fixed/"fixed" this offending commit
> came between July 3 and July 10.

Wildly speculating, it would be this commit maybe,
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bdf2737cda53a83332db1a1a021653447b05a7e7
???
Comment 55 Jürgen Reuter 2023-08-30 12:34:37 UTC
Actually, according to my testing, the last commit where the gfortran produced failing code, ishttps://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c496d15954cdeab7f9039328f94a6f62cf893d5f
(Aldy Hernandez A singleton irange etc.)
and the first one working again is
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1f7e5a7b91862b999aab88ee0319052aaf00f0f1
(Vladimir Makarov)
that seems to have fixed it. 
The commit from Vladimir fixed an issue in RTL, but I am not sure what to conclude from this.
Comment 56 Jürgen Reuter 2023-09-22 14:56:30 UTC
What do we do now? We know the offending commit, and the commit that fixed (or "fixed") it. Closing? Do we understand what happened here, so why it went wrong and why it got fixed?
Comment 57 Richard Biener 2023-10-17 11:07:50 UTC
It might be not ideal but it seems unless somebody finds the time to analyze the difference the "fix" did and thereby identifies the problem itself closing the bug is the most efficient way of dealing with it :/
Comment 58 Richard Biener 2024-03-27 13:41:19 UTC
Thus fixed.