Bug 53086 - [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled
Summary: [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.8.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 66141 (view as bug list)
Depends on: 53073
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-23 13:57 UTC by H.J. Lu
Modified: 2021-11-29 02:32 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-04-23 00:00:00


Attachments
Fix for 416.gamess in SPEC CPU 2006 (41.48 KB, application/octet-stream)
2016-02-18 15:56 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2012-04-23 13:57:10 UTC
+++ This bug was initially created as a clone of Bug #53073 +++

On Linux/x86-64, revision 186600 compiles 416.gamess in SPEC CPU 2006
into infinite loop.  Revision 186566 is OK.
Comment 1 Richard Biener 2012-04-23 15:05:49 UTC
      DO 210 I=1,NAT
         PM = X(I-1+IGAC)
         QM = ZAN(I) - PM
         PL = X(I-1+ILAC)
         QL = ZAN(I) - PL
         IF(DBG) WRITE(IW,9160) I,ANAM(I),BNAM(I),PM,PL
         IF(LAST .AND. MASWRK) WRITE(IW,9180)
     *      I,ANAM(I),BNAM(I),PM,QM,PL,QL
         IF(LAST .AND. MASWRK) WRITE(IP,9200)
     *      ANAM(I),BNAM(I),PM,QM,PL,QL
  210 CONTINUE

This loop does not make progress for me.  I don't see the out-of-bound
access immediately.

But if you build 416.gamess with -O -fbounds-check you immediately get

  Running 416.gamess ref base amd64-m64-gcc42-nn default
/abuild/rguenther/spec2k6/bin/specinvoke -d /abuild/rguenther/spec2k6/benchspec/CPU2006/416.gamess/run/run_base_ref_amd64-m64-gcc42-nn.0001 -e speccmds.err -o speccmds.stdout -f speccmds.cmd -C

416.gamess: copy #0 non-zero return code (rc=2, signal=0)


****************************************
Contents of h2ocu2+.gradient.err
****************************************
At line 768 of file inputb.fppized.f
Fortran runtime error: Index '2' of dimension 1 of array 'x' above upper bound of 1

which is not within the offending accesses. Manually just compiling
prppop.F with -fbounds-check reveals

  Running 416.gamess ref base amd64-m64-gcc42-nn default
/abuild/rguenther/spec2k6/bin/specinvoke -d /abuild/rguenther/spec2k6/benchspec/CPU2006/416.gamess/run/run_base_ref_amd64-m64-gcc42-nn.0003 -e speccmds.err -o speccmds.stdout -f speccmds.cmd -C

416.gamess: copy #0 non-zero return code (rc=2, signal=0)


****************************************
Contents of h2ocu2+.gradient.err
****************************************
At line 928 of file prppop.fppized.f
Fortran runtime error: Index '2' of dimension 1 of array 'x' above upper bound of 1

thus, similar.
Comment 2 Richard Biener 2012-04-23 15:08:45 UTC
In most units we have

      COMMON /FMCOM / X(1)

but in unport.F

      PARAMETER (MEMSIZ=   80 000 000)
      COMMON /FMCOM / X(MEMSIZ)

that can't work (?)
Comment 3 Richard Biener 2012-04-23 15:12:59 UTC
If it is valid to have such "trailing array" in a COMMON then we need to fix
array sizes for them (that's possible).

Is this valid fortran?  If so this is a Frontend bug as well - -fbounds-check
triggers after all.
Comment 4 kargl 2012-04-23 16:09:46 UTC
(In reply to comment #3)
> If it is valid to have such "trailing array" in a COMMON then we need to fix
> array sizes for them (that's possible).
> 
> Is this valid fortran?  If so this is a Frontend bug as well - -fbounds-check
> triggers after all.

From F2003 standard one finds on page 99:

   The form variable-name (explicit-shape-spec-list) declares variable-name
   to have the DIMENSION attribute and specifies the array properties that
   apply.

My interpretation is that

      COMMON /FMCOM / X(1)

declares X to have 1 element while

      PARAMETER (MEMSIZ=   80 000 000)
      COMMON /FMCOM / X(MEMSIZ)

has 80000000 elements.  The above appears to be a common
F77 idiom for "dynamic" memory management where the 
programmer is abusing the storage association of element
x(1).
Comment 5 H.J. Lu 2012-04-23 16:28:33 UTC
(In reply to comment #4)
> 
> From F2003 standard one finds on page 99:
> 
>    The form variable-name (explicit-shape-spec-list) declares variable-name
>    to have the DIMENSION attribute and specifies the array properties that
>    apply.
> 
> My interpretation is that
> 
>       COMMON /FMCOM / X(1)
> 
> declares X to have 1 element while
> 
>       PARAMETER (MEMSIZ=   80 000 000)
>       COMMON /FMCOM / X(MEMSIZ)
> 
> has 80000000 elements.  The above appears to be a common
> F77 idiom for "dynamic" memory management where the 
> programmer is abusing the storage association of element
> x(1).

I was told that it was allowed in F77.
Comment 6 Richard Biener 2012-04-24 07:57:43 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > From F2003 standard one finds on page 99:
> > 
> >    The form variable-name (explicit-shape-spec-list) declares variable-name
> >    to have the DIMENSION attribute and specifies the array properties that
> >    apply.
> > 
> > My interpretation is that
> > 
> >       COMMON /FMCOM / X(1)
> > 
> > declares X to have 1 element while
> > 
> >       PARAMETER (MEMSIZ=   80 000 000)
> >       COMMON /FMCOM / X(MEMSIZ)
> > 
> > has 80000000 elements.  The above appears to be a common
> > F77 idiom for "dynamic" memory management where the 
> > programmer is abusing the storage association of element
> > x(1).
> 
> I was told that it was allowed in F77.

If it is allowed in F77 then the GFortran frontend needs to be fixed.  First,
-fbounds-check may not fail, second, the array type in the COMMON aggregate
type shall not have a defined size (instead it would come from the COMMONs
DECL_SIZE).

Thus, this is a frontend bug which is pre-existing with -fbounds-check
and in 4.8 triggers wrong-code generation.
Comment 7 Andrew Pinski 2012-04-24 08:38:43 UTC
Found the discussion:
http://gcc.gnu.org/ml/fortran/2011-05/msg00079.html
Comment 8 Andrew Pinski 2012-04-24 09:14:57 UTC
http://gcc.gnu.org/ml/gcc-bugs/2003-09/msg02301.html
Comment 9 Andrew Pinski 2012-04-24 09:17:23 UTC
I think http://gcc.gnu.org/ml/fortran/2006-07/msg00307.html is also referencing it too.
Comment 10 Richard Biener 2012-04-24 09:29:10 UTC
A "fix" would be for example

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 186757)
+++ gcc/expr.c  (working copy)
@@ -6809,7 +6809,8 @@ array_at_struct_end_p (tree ref)
 
   /* If the reference is based on a declared entity, the size of the array
      is constrained by its given domain.  */
-  if (DECL_P (ref))
+  if (DECL_P (ref)
+      && !DECL_COMMON (ref))
     return false;
 
   return true;

of course other parts of the compiler will still be affected by this.
If the Frontend wants to reliably make this work then it has to use
an open-ended TYPE_DOMAIN for the array (and then lay out the underlying
decl manually).  Not sure if that's easy though (the incomplete type might
trigger ICEs in other places).
Comment 11 Steve Kargl 2012-04-24 14:49:18 UTC
On Tue, Apr 24, 2012 at 07:57:43AM +0000, rguenth at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53086
> 
> Richard Guenther <rguenth at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>           Component|middle-end                  |fortran
> 
> --- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-04-24 07:57:43 UTC ---
> (In reply to comment #5)
> > (In reply to comment #4)
> > > 
> > > From F2003 standard one finds on page 99:
> > > 
> > >    The form variable-name (explicit-shape-spec-list) declares variable-name
> > >    to have the DIMENSION attribute and specifies the array properties that
> > >    apply.
> > > 
> > > My interpretation is that
> > > 
> > >       COMMON /FMCOM / X(1)
> > > 
> > > declares X to have 1 element while
> > > 
> > >       PARAMETER (MEMSIZ=   80 000 000)
> > >       COMMON /FMCOM / X(MEMSIZ)
> > > 
> > > has 80000000 elements.  The above appears to be a common
> > > F77 idiom for "dynamic" memory management where the 
> > > programmer is abusing the storage association of element
> > > x(1).
> > 
> > I was told that it was allowed in F77.
> 
> If it is allowed in F77 then the GFortran frontend needs to be fixed. 

Fortran 2008 is backwards compatible with Fortran 2003.
Fortran 2003 is backwards compatible with Fortran 95.
Fortran 95 is backwards compatible with Fortran 90.
Fortran 90 is backwards compatible with Fortran 77.

There are some well-defined exceptions.  None involve COMMON.

F77 states:

  8.3.2 Common Block Storage Sequence

  For each common block, a common block storage sequence is formed
  as follows:

    (1) A storage sequence is formed consisting of the storage sequences
        of all entities in the lists nlist for the common block.  The order
        of the storage sequence is the same as the order of the appearance
        of the lists nlist in the program unit.

    (2) (Not relevant)

  8.3.3 Size of a Common Block

  The size of a common block is the size of its common block storage
  sequence including any extensions of the sequence resulting from
  equivalence association.

  Within an executable program, all named common blocks that have the
  same name must be the same size.


For the two lines

      COMMON /FMCOM / X(1)

      PARAMETER (MEMSIZ=   80 000 000)
      COMMON /FMCOM / X(MEMSIZ)

the named common block FMCOM have different sizes.  The code
is invalid and abuses the notion that X(1) are storage 
associated.

> First,
> -fbounds-check may not fail, second, the array type in the COMMON aggregate
> type shall not have a defined size (instead it would come from the COMMONs
> DECL_SIZE).
> 
> Thus, this is a frontend bug which is pre-existing with -fbounds-check
> and in 4.8 triggers wrong-code generation.

The bug is in the Fortran code.  Given the following code fragments
from two different subroutine in a program

      COMMON /FMCOM / X(1), A

      PARAMETER (MEMSIZ=   80 000 000)
      COMMON /FMCOM / X(MEMSIZ), B

is A storage associated with B or X(2)?
Comment 12 Tobias Burnus 2012-04-25 06:46:03 UTC
(In reply to comment #2)
> In most units we have
>       COMMON /FMCOM / X(1)
> but in unport.F
>       PARAMETER (MEMSIZ=   80 000 000)
>       COMMON /FMCOM / X(MEMSIZ)
> that can't work (?)

The Fortran standard requires that all named common blocks [with the same name] have the same size (which is violated here). However, in practice, several Fortran programs violate this rule.

For unnamed common blocks ("common // var1, var2"), different sizes are allowed. However, accessing elements beyond the last argument - like "X(2)" for the first definition - is invalid.

Thus, the -fcheck=bounds error seems to be appropriate. The question is what we do about x(2). While it is invalid, it seems to be used. As the Fortran FE knows that the last item in a given common block is an array element, it could change the middle-end decl to "[1:]".

(I have not yet checked the exact wording in the Fortran 66 to 2008 standards.)
Comment 13 Steve Kargl 2012-04-25 13:32:26 UTC
On Wed, Apr 25, 2012 at 06:46:03AM +0000, burnus at gcc dot gnu.org wrote:
> 
> Thus, the -fcheck=bounds error seems to be appropriate. The question is what we
> do about x(2). While it is invalid, it seems to be used. As the Fortran FE
> knows that the last item in a given common block is an array element, it could
> change the middle-end decl to "[1:]".

This is probably a candidate for -std=legacy.

> 
> (I have not yet checked the exact wording in the Fortran 66 to 2008 standards.)
> 

I've already quoted the F77 and F03 text.  Here's F66:

  The size of a common block in a program unit is the sum
  of the storage required for the the elements introduced
  through COMMON and EQUIVALENCE statements.  The sizes of
  labeled common blocks with the same label in the program
  units that comprise an executable program must be the same.
  The sizes of blank common in the various program units
  that are to be executed together need not be the same.
  Size is measured in terms of storage units (7.2.1.3.1).

The code is invalid.  Someone with access to SPEC 200humble
should complain loudly about invalid code in their benchmark.
Comment 14 Richard Biener 2012-04-25 13:57:06 UTC
Thus, invalid.  -std=legacy support would be an enhacement request (tracked
separately please)

416.gamess still works with LTO which merges the decls properly and papers
over the issue.
Comment 15 Pat Haugen 2013-02-01 16:10:09 UTC
The recently added switch, -fno-aggressive-loop-optimizations, can be used to prevent the transformation into an infinite loop. http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195610
Comment 16 Dominique d'Humieres 2015-05-14 10:03:50 UTC
*** Bug 66141 has been marked as a duplicate of this bug. ***
Comment 17 Richard Biener 2016-02-04 15:07:25 UTC
*** Bug 69368 has been marked as a duplicate of this bug. ***
Comment 18 H.J. Lu 2016-02-18 15:56:19 UTC
Created attachment 37733 [details]
Fix for 416.gamess in SPEC CPU 2006