Bug 94109 - [8/9/10/11 Regression] Memory leak introduced in 8.3.0->8.3.1
Summary: [8/9/10/11 Regression] Memory leak introduced in 8.3.0->8.3.1
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 8.3.1
: P3 normal
Target Milestone: 8.5
Assignee: Thomas Koenig
URL:
Keywords:
Depends on:
Blocks: Finalization 86754
  Show dependency treegraph
 
Reported: 2020-03-09 19:31 UTC by Antony Lewis
Modified: 2020-06-15 18:54 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-04-11 00:00:00


Attachments
Patch which ought to work (1.27 KB, patch)
2020-06-07 13:54 UTC, Thomas Koenig
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antony Lewis 2020-03-09 19:31:14 UTC
I see an apparent memory leak introduced in running code between

ubuntu-toolchain-r/test gfortran 8.3.0 OK

Source 20200106 build gfortran 8.3.1 bad (also current 8 and 10 git heads).

I have not tracked down where it is coming from, but there is a complete example on git with travis reports:

8.3.1 issue
https://travis-ci.org/cmbant/CAMB/jobs/660297689

GCC 9 9.2.1 20191102 issue (with otherwise same config as 8.3.0 below)
https://travis-ci.org/cmbant/CAMB/jobs/660297688

8.3.0 OK
https://travis-ci.org/cmbant/CAMB/jobs/660297687

(see memory counts at bottom of trace as function of loop count, produced from python).

To produce output locally do
git clone --branch test https://github.com/cmbant/CAMB.git

and run setup.py and then setup.py test (with py3.6+).

I'm hoping this narrowish version window will enable someone to guess at the cause of the issue. I looked at this because someone reported a large memory leak on gfortran 9.2 OS X that cannot be reproduced with ifort, or gfortran versions 6-8.3.0 (on linux the leak seems much smaller).

The code uses multiple nested allocatable F2003 class types.
Comment 1 Thomas Koenig 2020-04-11 19:01:32 UTC
Is there a test case?
Comment 2 Antony Lewis 2020-04-11 19:04:47 UTC
This may be the test case, though I'm not 100% sure:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94361
Comment 3 Antony Lewis 2020-05-04 09:27:32 UTC
Although my reduced test in the other id case is one problem, it appears that is not the only memory leak. Someone tested else on various gcc versions and still found:

version    memory leak  
7.3.0           no
8.2.0           no
7.5.0           yes
8.4.0           yes
9.2.0           yes
9.3.0           yes

So it certainly looks like a regression.
Comment 4 Antony Lewis 2020-05-18 15:47:03 UTC
Not sure why no one has at least picked up on 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94361

since it is a reproducible regression with a simple test case, an a bug that effectively kills some previously-working code from 7.x updates onwards?
Comment 5 Thomas Koenig 2020-06-01 19:49:15 UTC
So, fixed with the patch for PR 94109?
Comment 6 Antony Lewis 2020-06-03 08:11:57 UTC
Thanks for looking in to it. I tried rebuilding my gcc8 docker and rerunning. It now reports GNU Fortran (GCC) 8.4.1 20200602, however the leak still seems to be there?

https://travis-ci.org/github/cmbant/CAMB/jobs/660297689
Comment 7 Antony Lewis 2020-06-03 11:17:58 UTC
However the reduced case of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94361 now seems to be OK.

However on trunk, the fix for 94361 seems to have introduced a leak that was not there before: https://travis-ci.org/github/cmbant/CAMB/jobs/692470383 (was fine from gcc source build from 5 days ago - I just reran it with the new docker image)
Comment 8 Thomas Koenig 2020-06-03 12:31:46 UTC
Unfortunately, without a somewhat reduced test case, there is
not a lot I can do :-(

Could you run this under valgrind, pinpoint the memory
leaks (somewhat) and then try to reduce this?
Comment 9 Dominique d'Humieres 2020-06-03 13:23:03 UTC
For the record:

% gfc pr94361.f90 -fanalyzer
pr94361.f90:24:0:

   24 |     end subroutine
      | 
Warning: leak of 'test.t.dat.data' [CWE-401] [-Wanalyzer-malloc-leak]
  'leaker': events 1-11
    |
    |   20 |     subroutine Leaker
    |      | 
    |......
    |   23 |     allocate(Test%T%Dat(100000000))
    |      | 
    |   24 |     end subroutine
    |      | 
    |
    +--> '__final_debug_Testtype2': events 12-30
           |
           |   30 |     end module
           |      |              ~
           |      |              |
           |      |              (18) state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |      |              (19) state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |      |              (20) state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |      |              (21) following 'true' branch...
           |      |              (24) state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |      |              (25) state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |      |              (26) state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |      |              (27) following 'true' branch...
           |
         '__final_debug_Testtype2': events 31-32
           |
f951:

           | (31): state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           | (32): state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |
         '__final_debug_Testtype2': event 33
           |
f951:

           | (33): state of 'test.t.dat.data': 'start' -> 'nonnull' (origin: NULL)
           |
         '__final_debug_Testtype2': events 34-35
           |
           |   30 |     end module
           |      | 
           |
    <------+
    |
  'leaker': events 36-37
    |
    |   24 |     end subroutine
    |      | 
    |

Is it a false positive or not?
Comment 10 Thomas Koenig 2020-06-03 14:55:12 UTC
(In reply to Dominique d'Humieres from comment #9)

> Is it a false positive or not?

You probably need a higher optimization level with -fanalyzer,
it doesn't show anything at -O.  Nor does valgrind show anything.

So, I'd say a false positive.
Comment 11 Antony Lewis 2020-06-04 09:27:22 UTC
It took ages to narrow this down, but here's a simple test case that still gives a leak with valgrind in gcc-8 HEAD, 8.4.0, 9.3.0 (OK with 7.4.0)


    module debug
    implicit none

    Type Tester
        real, dimension(:), allocatable :: Dat, Dat2
    end Type

    Type TestType2
        Type(Tester) :: T
    end type TestType2

    contains

    subroutine Leaker
    class(TestType2), pointer :: ActiveState
    Type(Tester) :: Temp

    allocate(Temp%Dat2(10000))

    allocate(TestType2::ActiveState)
    ActiveState%T = Temp
    deallocate(ActiveState)

    end subroutine

    end module


    program run
    use debug

    call Leaker()

    end program
Comment 12 Antony Lewis 2020-06-04 10:45:42 UTC
Valgrid report is

HEAP SUMMARY:
==23446==     in use at exit: 40,000 bytes in 1 blocks
==23446==   total heap usage: 26 allocs, 25 frees, 93,657 bytes allocated
==23446==
==23446== 40,000 bytes in 1 blocks are definitely lost in loss record 1 of 1
==23446==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==23446==    by 0x402052: __debug_MOD_leaker (bug.f90:21)
==23446==    by 0x4021DA: MAIN__ (bug.f90:32)
==23446==    by 0x402211: main (bug.f90:30)
==23446==
==23446== LEAK SUMMARY:
==23446==    definitely lost: 40,000 bytes in 1 blocks
==23446==    indirectly lost: 0 bytes in 0 blocks
==23446==      possibly lost: 0 bytes in 0 blocks
==23446==    still reachable: 0 bytes in 0 blocks
==23446==         suppressed: 0 bytes in 0 blocks
Comment 13 Thomas Koenig 2020-06-04 12:34:32 UTC
Confirmed, then. Thanks for reducing this!
Comment 14 Thomas Koenig 2020-06-07 13:54:44 UTC
Created attachment 48697 [details]
Patch which ought to work

This one checks for the specific combination of expression,
component and namespace.

I think this has a good chance of working (at least it
fixes the test case without regressing on the earlier one).
Comment 15 GCC Commits 2020-06-14 11:03:17 UTC
The master branch has been updated by Thomas Kथà¤nig <tkoenig@gcc.gnu.org>:

https://gcc.gnu.org/g:1af22e455584ef5fcad2b4474c1efc3fd26f6cb3

commit r11-1296-g1af22e455584ef5fcad2b4474c1efc3fd26f6cb3
Author: Thomas Koenig <tkoenig@gcc.gnu.org>
Date:   Sun Jun 14 13:01:24 2020 +0200

    When avoiding double deallocation, look at namespace, expression and component.
    
    Our finalization handling is a mess.  Really, we should get to try and get
    this fixed for gcc 11.
    
    In the meantime, here is a patch which fixes a regression I introduced
    when fixing a regression with a memory leak.  The important thing
    here is to realize that we do not need to finalize (and deallocate)
    multiple times for the same expression and the same component
    in the same namespace.  It might cause code size regressions, but
    better big code than wrong code...
    
    gcc/fortran/ChangeLog:
    
            PR fortran/94109
            * class.c (finalize_component): Return early if finalization has
            already happened for expression and component within namespace.
            * gfortran.h (gfc_was_finalized): New type.
            (gfc_namespace): Add member was_finalzed.
            (gfc_expr): Remove finalized.
            * symbol.c (gfc_free_namespace): Free was_finalized.
    
    gcc/testsuite/ChangeLog:
    
            PR fortran/94109
            * gfortran.dg/finalize_34.f90: Adjust free counts.
            * gfortran.dg/finalize_36.f90: New test.
Comment 16 GCC Commits 2020-06-14 11:45:25 UTC
The releases/gcc-10 branch has been updated by Thomas Kथà¤nig <tkoenig@gcc.gnu.org>:

https://gcc.gnu.org/g:0c39742d618bad5ab5b9dc8ae040612a61e92103

commit r10-8299-g0c39742d618bad5ab5b9dc8ae040612a61e92103
Author: Thomas Koenig <tkoenig@gcc.gnu.org>
Date:   Sun Jun 14 13:39:43 2020 +0200

    When avoiding double deallocation, look at namespace, expression and component.
    
    Our finalization handling is a mess.  Really, we should get to try and get
    this fixed for gcc 11.
    
    In the meantime, here is a patch which fixes a regression I introduced
    when fixing a regression with a memory leak.  The important thing
    here is to realize that we do not need to finalize (and deallocate)
    multiple times for the same expression and the same component
    in the same namespace.  It might cause code size regressions, but
    better big code than wrong code...
    
    Backported from r11-1296-g1af22e455584ef5fcad2b4474c1efc3fd26f6cb3 .
    
    gcc/fortran/ChangeLog:
    
            PR fortran/94109
            * class.c (finalize_component): Return early if finalization has
            already happened for expression and component within namespace.
            * gfortran.h (gfc_was_finalized): New type.
            (gfc_namespace): Add member was_finalzed.
            (gfc_expr): Remove finalized.
            * symbol.c (gfc_free_namespace): Free was_finalized.
    
    gcc/testsuite/ChangeLog:
    
            PR fortran/94109
            * gfortran.dg/finalize_34.f90: Adjust free counts.
            * gfortran.dg/finalize_36.f90: New test.
Comment 17 GCC Commits 2020-06-14 12:06:57 UTC
The releases/gcc-9 branch has been updated by Thomas Kथà¤nig <tkoenig@gcc.gnu.org>:

https://gcc.gnu.org/g:9224bcfd6a4edf61e371e2d83fb126c948182cba

commit r9-8676-g9224bcfd6a4edf61e371e2d83fb126c948182cba
Author: Thomas Koenig <tkoenig@gcc.gnu.org>
Date:   Sun Jun 14 13:50:48 2020 +0200

    When avoiding double deallocation, look at namespace, expression and component.
    
    Our finalization handling is a mess.  Really, we should get to try and get
    this fixed for gcc 11.
    
    In the meantime, here is a patch which fixes a regression I introduced
    when fixing a regression with a memory leak.  The important thing
    here is to realize that we do not need to finalize (and deallocate)
    multiple times for the same expression and the same component
    in the same namespace.  It might cause code size regressions, but
    better big code than wrong code...
    
    Backported from r11-1296-g1af22e455584ef5fcad2b4474c1efc3fd26f6cb3 .
    
    gcc/fortran/ChangeLog:
    
            PR fortran/94109
            * class.c (finalize_component): Return early if finalization has
            already happened for expression and component within namespace.
            * gfortran.h (gfc_was_finalized): New type.
            (gfc_namespace): Add member was_finalzed.
            (gfc_expr): Remove finalized.
            * symbol.c (gfc_free_namespace): Free was_finalized.
    
    gcc/testsuite/ChangeLog:
    
            PR fortran/94109
            * gfortran.dg/finalize_34.f90: Adjust free counts.
            * gfortran.dg/finalize_36.f90: New test.
Comment 18 GCC Commits 2020-06-14 12:07:39 UTC
The releases/gcc-8 branch has been updated by Thomas Kथà¤nig <tkoenig@gcc.gnu.org>:

https://gcc.gnu.org/g:a5cda02410222030c0af6679998a997bbcddb021

commit r8-10311-ga5cda02410222030c0af6679998a997bbcddb021
Author: Thomas Koenig <tkoenig@gcc.gnu.org>
Date:   Sun Jun 14 13:54:19 2020 +0200

    When avoiding double deallocation, look at namespace, expression and component.
    
    Our finalization handling is a mess.  Really, we should get to try and get
    this fixed for gcc 11.
    
    In the meantime, here is a patch which fixes a regression I introduced
    when fixing a regression with a memory leak.  The important thing
    here is to realize that we do not need to finalize (and deallocate)
    multiple times for the same expression and the same component
    in the same namespace.  It might cause code size regressions, but
    better big code than wrong code...
    
    Backported from r11-1296-g1af22e455584ef5fcad2b4474c1efc3fd26f6cb3 .
    
    gcc/fortran/ChangeLog:
    
            PR fortran/94109
            * class.c (finalize_component): Return early if finalization has
            already happened for expression and component within namespace.
            * gfortran.h (gfc_was_finalized): New type.
            (gfc_namespace): Add member was_finalzed.
            (gfc_expr): Remove finalized.
            * symbol.c (gfc_free_namespace): Free was_finalized.
    
    gcc/testsuite/ChangeLog:
    
            PR fortran/94109
            * gfortran.dg/finalize_34.f90: Adjust free counts.
            * gfortran.dg/finalize_36.f90: New test.
Comment 19 Thomas Koenig 2020-06-14 12:09:42 UTC
The test case from comment # 11 has been fixed.

Is there anything left to do to fix the regression in your original code?
Comment 20 Antony Lewis 2020-06-14 20:37:30 UTC
Tried trunk and gcc8, and both look good - many thanks for the fixes!

Is there no way to update 7.x? Since the buggy 7/8/9 version has also propagated quite widely, if you know any likely workaround that might be useful to note (the "final" bug was obvious how to work around, but not the last one).
Comment 21 Thomas Koenig 2020-06-14 21:27:24 UTC
Unfortunately, there isn't going to be any more release for gcc 7.
I could backport this to the gcc-7 branch, but the likelyhood
that distributions would pick it up from the branch is almost nil.

As for a workaround for memory leaks, you can always add

if (.not. allocated(a)) deallocate (a)

to your source code.  It's not what allocatables are about,
but it would work.

Thanks for reporting this, and for reducing the two test cases!

I'm closing this as fixed for now.
Comment 22 Thomas Koenig 2020-06-15 18:54:04 UTC
(In reply to Thomas Koenig from comment #21)

> As for a workaround for memory leaks, you can always add
> 
> if (.not. allocated(a)) deallocate (a)

Of course, that should be

  if (allocated(a)) deallocate(a)

Value propagation and dead code removal are good enough that a sequence
of such statements will be merged into a single one.