This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |i?86-*-*
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |memory-hog
   Last reconfirmed|                            |2014-09-08
                 CC|                            |rth at gcc dot gnu.org
             Blocks|                            |47344
     Ever confirmed|0                           |1
            Summary|32-bit gcc uses excessive   |[4.8/4.9/5 Regression]
                   |memory during dead store    |32-bit gcc uses excessive
                   |elimination with -fPIC      |memory during dead store
                   |                            |elimination with -fPIC
   Target Milestone|---                         |4.8.4

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  Possibly excessive value_rtx expansion from dse.c:canon_address.

The testcase is a function with a single basic-block and 30000 stores
(the static initializer function) with the pattern

  D.94947 = (struct Z *) &Zs;
  D.94947->x1_ = &Xs1[0];
  D.94947->x2_ = 1;
  D.94947->x3_ = 1;
  temp.20397 = D.94947 + 12;
  temp.20397->x1_ = &Xs90[0];
  temp.20397->x2_ = 2;
  temp.20397->x3_ = 1;
...
  temp.30587 = temp.30586 + 12;
  temp.30587->x1_ = &Xs611[0];
  temp.30587->x2_ = 2;
  temp.30587->x3_ = 1;

thus groups of three stores followed by an address adjustment.  The above
is from a GCC 4.3 IL dump.

The GCC 4.9 IL dump shows

  MEM[(struct Z *)&Zs].x1_ = &Xs1;
  MEM[(struct Z *)&Zs].x2_ = 1;
  MEM[(struct Z *)&Zs].x3_ = 1;
  MEM[(struct Z *)&Zs + 12B].x1_ = &Xs90;
  MEM[(struct Z *)&Zs + 12B].x2_ = 2;
  MEM[(struct Z *)&Zs + 12B].x3_ = 1;
  MEM[(struct Z *)&Zs + 24B].x1_ = &Xs91;
  MEM[(struct Z *)&Zs + 24B].x2_ = 2;
  MEM[(struct Z *)&Zs + 24B].x3_ = 1;
...
  MEM[(struct Z *)&Zs + 122292B].x1_ = &Xs611;
  MEM[(struct Z *)&Zs + 122292B].x2_ = 2;
  MEM[(struct Z *)&Zs + 122292B].x3_ = 1;

which causes each store to be expanded via st like

(insn 71298 71297 71299 2 (set (reg:SI 40822)
        (const:SI (unspec:SI [
                    (symbol_ref:SI ("_ZL2Zs") [flags 0x2]  <var_decl
0x7ffff5c4a098 Zs>)
                ] UNSPEC_GOTOFF))) t.C:5 -1
     (nil))
(insn 71299 71298 71300 2 (set (mem/c:SI (plus:SI (plus:SI (reg:SI 3 bx)
                    (reg:SI 40822))
                (const_int 122216 [0x1dd68])) [4 MEM[(struct Z *)&Zs +
122208B].x3_+0 S4 A64])
        (const_int 1 [0x1])) t.C:5 -1
     (nil))

I suppose "lowering" PIC addresses somewhere before RTL expansion (and
CSEing the addresses) would help here.  Lowering as in not treating
them as is_gimple_min_invariant.

With 4.3 we have a single address load for &Zs (but of course we retain
the individual stored addresses loads - thus still very many PIC addresses
in this function).

Why is CSE not able to CSE the UNSPEC_GOTOFF addresses?  Does it not do
it because of the (const:SI ...) wrapping (as in, not profitable)?  Or is
it confused about the other intermediate UNSPEC_GOTOFF uses?

That said, cse1 should be able to turn the RTL into sth equivalent to
what 4.3 produced.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]