This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/63191] [5/6/7 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Anyway, as far as memory consumption goes (compile time is still the same), the
following patch helps a lot:

--- gcc/config/i386/i386.c.jj   2017-03-07 20:04:52.000000000 +0100
+++ gcc/config/i386/i386.c      2017-03-10 13:46:12.482704787 +0100
@@ -17257,8 +17257,9 @@ ix86_delegitimize_tls_address (rtx orig_
    necessary to remove references to the PIC label from RTL stored by
    the DWARF output code.  */

-static rtx
-ix86_delegitimize_address (rtx x)
+template <bool base_term>
+static inline rtx
+ix86_delegitimize_address_1 (rtx x)
 {
   rtx orig_x = delegitimize_mem_from_attrs (x);
   /* addend is NULL or some rtx if x is something+GOTOFF where
@@ -17361,7 +17362,7 @@ ix86_delegitimize_address (rtx x)
   if (! result)
     return ix86_delegitimize_tls_address (orig_x);

-  if (const_addend)
+  if (const_addend && !base_term)
     result = gen_rtx_CONST (Pmode, gen_rtx_PLUS (Pmode, result,
const_addend));
   if (reg_addend)
     result = gen_rtx_PLUS (Pmode, reg_addend, result);
@@ -17399,6 +17400,12 @@ ix86_delegitimize_address (rtx x)
   return result;
 }

+static rtx
+ix86_delegitimize_address (rtx x)
+{
+  return ix86_delegitimize_address_1<false> (x);
+}
+
 /* If X is a machine specific address (i.e. a symbol or label being
    referenced as a displacement from the GOT implemented using an
    UNSPEC), then return the base term.  Otherwise return X.  */
@@ -17424,7 +17431,7 @@ ix86_find_base_term (rtx x)
       return XVECEXP (term, 0, 0);
     }

-  return ix86_delegitimize_address (x);
+  return ix86_delegitimize_address_1<true> (x);
 }


 static void

Without the patch (just the major time or memory consumers):
 tree DSE                :  40.53 ( 9%) usr   0.00 ( 0%) sys  40.51 ( 9%) wall 
     0 kB ( 0%) ggc
 dead store elim1        : 244.65 (55%) usr   1.10 (46%) sys 245.75 (55%) wall
5879136 kB (47%) ggc
 dead store elim2        :   3.12 ( 1%) usr   0.01 ( 0%) sys   3.12 ( 1%) wall 
252045 kB ( 2%) ggc
 reload CSE regs         : 106.15 (24%) usr   0.01 ( 0%) sys 106.15 (24%) wall
4496830 kB (36%) ggc
 TOTAL                 : 444.45             2.38           447.46          
12477770 kB
and with the patch:
 tree DSE                :  40.52 (10%) usr   0.00 ( 0%) sys  40.51 (10%) wall 
     0 kB ( 0%) ggc
 dead store elim1        : 223.84 (55%) usr   0.00 ( 0%) sys 223.84 (55%) wall 
  4653 kB ( 0%) ggc
 dead store elim2        :   2.92 ( 1%) usr   0.00 ( 0%) sys   2.92 ( 1%) wall 
175766 kB ( 7%) ggc
 reload CSE regs         :  98.58 (24%) usr   0.46 (53%) sys  99.04 (24%) wall
2130309 kB (83%) ggc
 TOTAL                 : 407.95             0.86           409.33           
2558609 kB
(both completely unoptimized compilers with checking etc.).

The thing is that ix86_find_base_term calls ix86_delegitimize_address that
often creates some RTL that the caller then immediately throws away.
ix86_find_base_term is called a lot on expressions like:
(plus:SI (value:SI 1:1 @0x2c60f50/0x2c50f40)
    (const:SI (plus:SI (unspec:SI [
                    (symbol_ref:SI ("_ZL2Zs") [flags 0x2] <var_decl
0x7fffefc19900 Zs>)
                ] UNSPEC_GOTOFF)
            (const_int 8 [0x8]))))
on which it returns
(const:SI (plus:SI (symbol_ref:SI ("_ZL2Zs") [flags 0x2] <var_decl
0x7fffefc19900 Zs>)
        (const_int 8 [0x8])))
but in reality, the caller only cares about the SYMBOL_REF, CONST_INT operand
on PLUS is ignored by find_base_term.
The other option is to duplicate and adjust ix86_delegitimize_address into
ix86_find_base_term.
With the above template, we can share the code, just (for now in one spot, but
likely in more spots later).

As for more spots later, e.g. both find_base_value and find_base_term (the only
users of ix86_find_base_term)
only care about MEM with arg_pointer_rtx or plus arg_pointer_rtx something. 
So, in other cases it doesn't
make sense to replace_equiv_address_nv.  Thus I think
      if (GET_CODE (x) == CONST
          && GET_CODE (XEXP (x, 0)) == PLUS
          && GET_MODE (XEXP (x, 0)) == Pmode
          && CONST_INT_P (XEXP (XEXP (x, 0), 1))
          && GET_CODE (XEXP (XEXP (x, 0), 0)) == UNSPEC
          && XINT (XEXP (XEXP (x, 0), 0), 1) == UNSPEC_PCREL)
        {
          rtx x2 = XVECEXP (XEXP (XEXP (x, 0), 0), 0, 0);
          x = gen_rtx_PLUS (Pmode, XEXP (XEXP (x, 0), 1), x2);
          if (MEM_P (orig_x))
            x = replace_equiv_address_nv (orig_x, x);
          return x;
        }
isn't really useful if base_term && MEM_P (orig_x).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]