This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/80960] [5/6/7/8 Regression] Huge memory use when compiling a very large test case


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mliska at suse dot cz,
                   |                            |segher at gcc dot gnu.org

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
The first "bisection" possibly points at r190594 which limited the work FRE
does (with the effect of leaving some things unoptimized)

More precise bisection would be appreciated.

I see with GCC 7.1 and -O1 (recommended for machine-generated code) a use of
3.7GB
of ram.

The code contains a very large basic-block.

I do remember compile-time/memory-hog PRs for this code style.

compile-time analysis using perf highlights:

Samples: 680K of event 'cycles:pp', Event count (approx.): 607299351135         
Overhead  Command   Shared Object     Symbol                                  
◆
  26.58%  f951      f951              [.] refers_to_regno_p                   
▒
   9.64%  f951      f951              [.] reg_overlap_mentioned_p             
▒
   7.08%  f951      f951              [.] find_hard_regno_for_1               
▒
   4.42%  f951      f951              [.] reg_used_between_p                  
▒
   1.90%  f951      f951              [.] get_last_value_validate 

which probably means we're doing some quadratic amount of work on use->def
chains inside the BB.  With call traces:

+   48.54%     1.49%  f951      f951              [.] try_combine             
    -   25.59%    25.55%  f951      f951              [.] refers_to_regno_p    
   ▒
   - refers_to_regno_p                                                        
▒
      - 16.98% reg_overlap_mentioned_p                                        
▒
         - 16.00% reg_used_between_p                                          
▒
              can_combine_p                                                   
▒
              try_combine   
...
                                  - 4.69% refers_to_regno_p                    
                           ▒
         - 4.66% reg_overlap_mentioned_p                                      
▒
            - 4.36% reg_used_between_p                                        
▒
                 can_combine_p                                                
▒
                 try_combine                 


so it's combine (at least at -O1) and I can also imagine that's using up the
memory in its attempts to simplify & match up stuff as it uses GC memory
for all the copying that involves IIRC.  Segher?

int
reg_used_between_p (const_rtx reg, const rtx_insn *from_insn,
                    const rtx_insn *to_insn)
{
  rtx_insn *insn;

  if (from_insn == to_insn)
    return 0;

  for (insn = NEXT_INSN (from_insn); insn != to_insn; insn = NEXT_INSN (insn))
    if (NONDEBUG_INSN_P (insn)
        && (reg_overlap_mentioned_p (reg, PATTERN (insn))
           || (CALL_P (insn) && find_reg_fusage (insn, USE, reg))))
      return 1;
  return 0;
}

so that just walks the BB instead of, say, using DF uses (if available during
combine), or somehow recording "distance" between two rtx_insns to be able
to cap the amount of work done (and conservatively return true).  After all
it's going to end up combining very "distant" instructions here (remember,
gigantic basic-block).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]