This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: C++ compile-time regressions
- To: Gerald Pfeifer <pfeifer at dbai dot tuwien dot ac dot at>
- Subject: Re: C++ compile-time regressions
- From: Daniel Berlin <dan at cgsoftware dot com>
- Date: Thu, 02 Aug 2001 16:53:37 -0400
- Cc: Mark Mitchell <mark at codesourcery dot com>,Joe Buck <jbuck at synopsys dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>,<gcc-patches at gcc dot gnu dot org>
- References: <Pine.BSF.4.33.0108011301040.3036-100000@taygeta.dbai.tuwien.ac.at>
Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at> writes:
> [ Includes patch. Okay for branch and mainline? ]
>
> On Wed, 25 Jul 2001, Gerald Pfeifer wrote:
>> Yes, I've been working on this now. To be honest, I don't have time to
>> do this, but I'll try to have something in time for 3.0.1 -- somehow.
>
> So, here we go. (The first column is the value of PARAM_MAX_INLINE_INSNS.)
>
> -O2 -O3 -O2 -O3
> 100 8:29 8:48 4000228 3990276
> 500 8:24 8:53 4136996 4126148
> 600 8:33 8:59 4158820 4156068
> 700 8:52 9:32 4169028 4222436
> 800 8:34? 10:27 4179652 4315396
> 1000 9:09 11:27 4239076 4425860
> 1500 9:49 14:05 4336260 4637060
> 2000 10:47 23:47 4435428 4758052
>
> To me, 600 seems like a definite and affordable improvement here; I'd
> be a bit hesitant to go over 700.
>
>>> Realistically, I think we have to be willing to compromise here; the 3.0.1
>>> compiler is going to be slower *and* probably generate slower code than
>>> 2.95, which is too bad, but that seems to be where we're at. If we could
>>> get to 10-25% on both figures that would be better than having one figure
>>> small and the other massive.
>> The problem is, on both ends of the scale (that is, either slower code
>> or slower generation) the *better* value is already around 25%, so a
>> compromise will be worse than that for *both* values.
>
> While I still see what I wrote as quoted above as a problem, here is the
> patch I had promised.
BTW, i've gotten the performance problem down using a slightly
modified heuristic from
integrate.c.
On the last run, the compile times were about the same as 200 insns,
but the performance was *much* better (we're down to about 10% speed
loss).
When your performance gets shot to hell, it's always being caused by
not inlining things. I.E. at 100 insns, *::begin and *::end are taking
>50% of the runtime, because they aren't being inlined.
With a fixed store motion, we can turn off cse-skip-blocks and
cse-follow-jumps.
They buy us absolutely no gain, but cost a lot of time (In compiling
your app, Gerald, CSE accounts for > 20% of the compile time on the
files that take the longest to compile).
I've got statistics to back this up.
However, even with cse-skip-blocks and cse-follow-jumps turned off,
CSE is still >15% of the compile.
Mainly because it's trying to eliminate memory loads and stores, which
PRE and Store Motion do much faster than it (since they don't modify
the hash table when a store/load is killed, they just set a bit or two
in a bitvector), and on a global scale.
I'm just completing some benchmark runs to see if our performance
actually changes if i tell CSE to stop caring about memory (and run
store motion after reload).
I sincerely doubt it will, now that load and store motion should be
working.
If it does, then PRE and store motion need to be improved.
--Dan
--
"When I was a kid, I went to the store and asked the guy, "Do you
have any toy train schedules?"
"-Steven Wright