This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] Reduce memory overhead for large functions
- From: Richard Guenther <richard dot guenther at gmail dot com>
- To: Steven Bosscher <stevenb dot gcc at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Vladimir Makarov <vmakarov at redhat dot com>
- Date: Mon, 13 Aug 2012 10:49:14 +0200
- Subject: Re: [patch] Reduce memory overhead for large functions
- References: <CABu31nPCP7afJu9YopkmG-W-Z50JB+NZJmUAv7nOGvOT28zTSQ@mail.gmail.com>
On Sun, Aug 12, 2012 at 11:49 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> This patch tried to use non-clearing memory allocation where possible.
> This is especially important for very large functions, when arrays of
> size in the order of n_basic_blocks or num_ssa_names are allocated to
> hold sparse data sets. For such cases the overhead of memset becomes
> measurable (and even dominant for the time spent in a pass in some
> cases, such as the one I recently fixed in ifcvt.c).
>
> This cuts off ~20% of the compile time for the test case of PR54146 at
> -O1. Not bad for a patch that basically only removes a bunch of
> memsets.
>
> I got another 5% for the changes in tree-ssa-loop-manip.c. A loop over
> an array with num_ssa_names there is expensive and unnecessary, and it
> helps to stuff all bitmaps together on a single obstack if you intend
> to blow them all away at the end (this could be done in a number of
> other places in the compiler). Clearing livein at the end of
> add_exit_phis_var also reduces peak memory with ~250MB at that point
> in the passes pipeline (only to blow up from ~1.5GB peak memory in the
> GIMPLE optimizers to ~3.6 GB in expand, and to ~8.6GB in IRA, but hey,
> who's counting? :-)
>
> Actually, the worst cases are not fixed with this patch. That'd be IRA
> (which consumes ~5GB on the test case, out of 8GB total), and
> tree-PRE.
>
> The IRA case looks like it may be hard to fix: Allocating multiple
> arrays of size O(max_regno) for every loop in init_loop_tree_node.
>
> The tree-PRE case is one where the avail arrays are allocated and
> cleared for every PRE candidate. This looks like a place where a
> pointer_map should be used instead. I'll tackle that later, when I've
> addressed more pressing problems in the compilation of the PR54146
> test case.
Hmm, or eaiser, use a vector of size (num_bb_preds) and index it by
edge index.
> This patch was bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?
Ok with adjusting the PRE comments according to the above.
Thanks,
Richard.
> Kudos to the compile farm people, without them I couldn't even hope to
> get any of this work done!
> Ciao!
> Steven