This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/63191] [5/6/7 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
- From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 10 Mar 2017 13:25:53 +0000
- Subject: [Bug rtl-optimization/63191] [5/6/7 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
- Auto-submitted: auto-generated
- References: <bug-63191-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191
--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Anyway, as far as memory consumption goes (compile time is still the same), the
following patch helps a lot:
--- gcc/config/i386/i386.c.jj 2017-03-07 20:04:52.000000000 +0100
+++ gcc/config/i386/i386.c 2017-03-10 13:46:12.482704787 +0100
@@ -17257,8 +17257,9 @@ ix86_delegitimize_tls_address (rtx orig_
necessary to remove references to the PIC label from RTL stored by
the DWARF output code. */
-static rtx
-ix86_delegitimize_address (rtx x)
+template <bool base_term>
+static inline rtx
+ix86_delegitimize_address_1 (rtx x)
{
rtx orig_x = delegitimize_mem_from_attrs (x);
/* addend is NULL or some rtx if x is something+GOTOFF where
@@ -17361,7 +17362,7 @@ ix86_delegitimize_address (rtx x)
if (! result)
return ix86_delegitimize_tls_address (orig_x);
- if (const_addend)
+ if (const_addend && !base_term)
result = gen_rtx_CONST (Pmode, gen_rtx_PLUS (Pmode, result,
const_addend));
if (reg_addend)
result = gen_rtx_PLUS (Pmode, reg_addend, result);
@@ -17399,6 +17400,12 @@ ix86_delegitimize_address (rtx x)
return result;
}
+static rtx
+ix86_delegitimize_address (rtx x)
+{
+ return ix86_delegitimize_address_1<false> (x);
+}
+
/* If X is a machine specific address (i.e. a symbol or label being
referenced as a displacement from the GOT implemented using an
UNSPEC), then return the base term. Otherwise return X. */
@@ -17424,7 +17431,7 @@ ix86_find_base_term (rtx x)
return XVECEXP (term, 0, 0);
}
- return ix86_delegitimize_address (x);
+ return ix86_delegitimize_address_1<true> (x);
}
static void
Without the patch (just the major time or memory consumers):
tree DSE : 40.53 ( 9%) usr 0.00 ( 0%) sys 40.51 ( 9%) wall
0 kB ( 0%) ggc
dead store elim1 : 244.65 (55%) usr 1.10 (46%) sys 245.75 (55%) wall
5879136 kB (47%) ggc
dead store elim2 : 3.12 ( 1%) usr 0.01 ( 0%) sys 3.12 ( 1%) wall
252045 kB ( 2%) ggc
reload CSE regs : 106.15 (24%) usr 0.01 ( 0%) sys 106.15 (24%) wall
4496830 kB (36%) ggc
TOTAL : 444.45 2.38 447.46
12477770 kB
and with the patch:
tree DSE : 40.52 (10%) usr 0.00 ( 0%) sys 40.51 (10%) wall
0 kB ( 0%) ggc
dead store elim1 : 223.84 (55%) usr 0.00 ( 0%) sys 223.84 (55%) wall
4653 kB ( 0%) ggc
dead store elim2 : 2.92 ( 1%) usr 0.00 ( 0%) sys 2.92 ( 1%) wall
175766 kB ( 7%) ggc
reload CSE regs : 98.58 (24%) usr 0.46 (53%) sys 99.04 (24%) wall
2130309 kB (83%) ggc
TOTAL : 407.95 0.86 409.33
2558609 kB
(both completely unoptimized compilers with checking etc.).
The thing is that ix86_find_base_term calls ix86_delegitimize_address that
often creates some RTL that the caller then immediately throws away.
ix86_find_base_term is called a lot on expressions like:
(plus:SI (value:SI 1:1 @0x2c60f50/0x2c50f40)
(const:SI (plus:SI (unspec:SI [
(symbol_ref:SI ("_ZL2Zs") [flags 0x2] <var_decl
0x7fffefc19900 Zs>)
] UNSPEC_GOTOFF)
(const_int 8 [0x8]))))
on which it returns
(const:SI (plus:SI (symbol_ref:SI ("_ZL2Zs") [flags 0x2] <var_decl
0x7fffefc19900 Zs>)
(const_int 8 [0x8])))
but in reality, the caller only cares about the SYMBOL_REF, CONST_INT operand
on PLUS is ignored by find_base_term.
The other option is to duplicate and adjust ix86_delegitimize_address into
ix86_find_base_term.
With the above template, we can share the code, just (for now in one spot, but
likely in more spots later).
As for more spots later, e.g. both find_base_value and find_base_term (the only
users of ix86_find_base_term)
only care about MEM with arg_pointer_rtx or plus arg_pointer_rtx something.
So, in other cases it doesn't
make sense to replace_equiv_address_nv. Thus I think
if (GET_CODE (x) == CONST
&& GET_CODE (XEXP (x, 0)) == PLUS
&& GET_MODE (XEXP (x, 0)) == Pmode
&& CONST_INT_P (XEXP (XEXP (x, 0), 1))
&& GET_CODE (XEXP (XEXP (x, 0), 0)) == UNSPEC
&& XINT (XEXP (XEXP (x, 0), 0), 1) == UNSPEC_PCREL)
{
rtx x2 = XVECEXP (XEXP (XEXP (x, 0), 0), 0, 0);
x = gen_rtx_PLUS (Pmode, XEXP (XEXP (x, 0), 1), x2);
if (MEM_P (orig_x))
x = replace_equiv_address_nv (orig_x, x);
return x;
}
isn't really useful if base_term && MEM_P (orig_x).