This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Speed up invalid_mode_change_p


On 12/08/2010 08:39 AM, Richard Guenther wrote:
On Tue, 7 Dec 2010, Vladimir Makarov wrote:

On 12/07/2010 10:24 AM, Richard Guenther wrote:
When looking at PR18687 (again) I noticed that invalid_mode_change_p is
pretty high in the profiles, so this patch tries to rework what it does
based on the few existing callers.

This patch does two things, first, it changes the interface of
invalid_mode_change_p (only used by ira-cost.c) to not pass in
a source mode but assume the mode of the pseudo.  Second, it
caches and computes the outcome of invalid_mode_change_p at
record_subregs_of_mode time, in an array indexed by register class.

As there are usually less register classes than machine modes
this should save memory for the flags.  It also avoids the
loop over machine modes in invalid_mode_change_p, making the
predicate cheaper in favor of doing the equivalent of
N_REG_CLASSES checks for each subreg we encounter (we can
optimize this with some extra memory cost at record_subregs_of_mode
time to avoid the loop for mode pairs we've already seen).

The patch also caches the hashtable lookup (something which can
also be done independently).

A even more reworked interface would for example have a
sbitmap invalid_mode_change_classes (regno) interface
returning a bitmap indexed by reg_class, so the users can
use TEST_BIT on it directly.

The patch is a hack, just to make the following measurements
possible, I will rework it to be nice but expect some direction
from your IRA maintainer folks (maybe you've already done a
similar thing).

I like the idea, Richard.  So if you want to do it and have time, I really
appreciate this.

ira-improv branch has a code which decreases # of calls of
invalid_mode_change.  So the effect of reworking of the function will be
smaller when/if the branch is merged into the trunk.  Still implementation of
your idea will give considerable compilation time improvement (at least for
this test case) and is really worth to do.
Ok.  I have split the patch into two pieces, one obvious changing the
interface of invalid_mode_change_p (with no performance impact) and
one with reginfo.c local changes only.

Micha suggested to use a single bitmap for all information, this
exploits the sparseness of the information and with the current queries
which are iterate over all regnos should have O(1) bitmap operations.

I use a temporary bitmap to cache whether we have seen a mode change
of the same kind already to avoid the compute loop.
Even though CANNOT_CHANGE_MODE_CLASS itself should be cheap, caching
the case of a positive outcome with another bitmap test seems to be
worthwhile for the testcase.

Re-benchmarking the patch on the testcase shows a 2% improvement
and we also see slightly less number of minor pagefaults.

Nice.
I'll bootstrap and regtest this now.

Do you have any suggestions for further performance testing?
Not really. I never investigated the problem. So I think you know much better.
   Are
there any targets where we'd have a larger number of register
classes than modes?
I don't know such public targets.
   For optimized compiles I can't measure an
off-noise effect of the patches.

Ok if the bootstrap + regtest passes?

Ok for me, Richard. Thanks for finding and working on the problem.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]