This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 08 Sep 2014 08:58:28 +0000
- Subject: [Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
- Auto-submitted: auto-generated
- References: <bug-63191-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |i?86-*-*
Status|UNCONFIRMED |NEW
Keywords| |memory-hog
Last reconfirmed| |2014-09-08
CC| |rth at gcc dot gnu.org
Blocks| |47344
Ever confirmed|0 |1
Summary|32-bit gcc uses excessive |[4.8/4.9/5 Regression]
|memory during dead store |32-bit gcc uses excessive
|elimination with -fPIC |memory during dead store
| |elimination with -fPIC
Target Milestone|--- |4.8.4
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. Possibly excessive value_rtx expansion from dse.c:canon_address.
The testcase is a function with a single basic-block and 30000 stores
(the static initializer function) with the pattern
D.94947 = (struct Z *) &Zs;
D.94947->x1_ = &Xs1[0];
D.94947->x2_ = 1;
D.94947->x3_ = 1;
temp.20397 = D.94947 + 12;
temp.20397->x1_ = &Xs90[0];
temp.20397->x2_ = 2;
temp.20397->x3_ = 1;
...
temp.30587 = temp.30586 + 12;
temp.30587->x1_ = &Xs611[0];
temp.30587->x2_ = 2;
temp.30587->x3_ = 1;
thus groups of three stores followed by an address adjustment. The above
is from a GCC 4.3 IL dump.
The GCC 4.9 IL dump shows
MEM[(struct Z *)&Zs].x1_ = &Xs1;
MEM[(struct Z *)&Zs].x2_ = 1;
MEM[(struct Z *)&Zs].x3_ = 1;
MEM[(struct Z *)&Zs + 12B].x1_ = &Xs90;
MEM[(struct Z *)&Zs + 12B].x2_ = 2;
MEM[(struct Z *)&Zs + 12B].x3_ = 1;
MEM[(struct Z *)&Zs + 24B].x1_ = &Xs91;
MEM[(struct Z *)&Zs + 24B].x2_ = 2;
MEM[(struct Z *)&Zs + 24B].x3_ = 1;
...
MEM[(struct Z *)&Zs + 122292B].x1_ = &Xs611;
MEM[(struct Z *)&Zs + 122292B].x2_ = 2;
MEM[(struct Z *)&Zs + 122292B].x3_ = 1;
which causes each store to be expanded via st like
(insn 71298 71297 71299 2 (set (reg:SI 40822)
(const:SI (unspec:SI [
(symbol_ref:SI ("_ZL2Zs") [flags 0x2] <var_decl
0x7ffff5c4a098 Zs>)
] UNSPEC_GOTOFF))) t.C:5 -1
(nil))
(insn 71299 71298 71300 2 (set (mem/c:SI (plus:SI (plus:SI (reg:SI 3 bx)
(reg:SI 40822))
(const_int 122216 [0x1dd68])) [4 MEM[(struct Z *)&Zs +
122208B].x3_+0 S4 A64])
(const_int 1 [0x1])) t.C:5 -1
(nil))
I suppose "lowering" PIC addresses somewhere before RTL expansion (and
CSEing the addresses) would help here. Lowering as in not treating
them as is_gimple_min_invariant.
With 4.3 we have a single address load for &Zs (but of course we retain
the individual stored addresses loads - thus still very many PIC addresses
in this function).
Why is CSE not able to CSE the UNSPEC_GOTOFF addresses? Does it not do
it because of the (const:SI ...) wrapping (as in, not profitable)? Or is
it confused about the other intermediate UNSPEC_GOTOFF uses?
That said, cse1 should be able to turn the RTL into sth equivalent to
what 4.3 produced.