This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: RFC -- CSE compile-time stupidity
- From: Kazu Hirata <kazu at cs dot umass dot edu>
- To: law at redhat dot com
- Cc: gcc at gcc dot gnu dot org
- Date: Mon, 21 Feb 2005 13:26:14 -0500 (EST)
- Subject: Re: RFC -- CSE compile-time stupidity
- References: <1109006553.21470.19.camel@localhost.localdomain>
Hi Jeff,
> Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK
> and SUBREG_TICKED saves about 1% compilation time for the components
> of cc1. Yes, that's a net 1% improvement by dropping the abstraction
> layer.
Yes, I've noticed the problem. In my defense, the code in question
was even worse before I touched it. :-) With the old code, every time
we access a cse_reg_info entry that is different from the last access,
we were generating a function call. Nowadays, we avoid calls to
get_cse_reg_info_1 95% of the time.
Of course, it's tough to beat the performance of your explicit
initialization approach, but here are couple of things that I have
thought about while keeping some abstraction layer.
The first thought is to expose the timestamp update to the user of
those macros that you mentioned.
/* Find a cse_reg_info entry for REGNO. */
static inline struct cse_reg_info *
get_cse_reg_info (unsigned int regno)
{
struct cse_reg_info *p = &cse_reg_info_table[regno];
/* If this entry has not been initialized, go ahead and initialize
it. */
if (p->timestamp != cse_reg_info_timestamp)
{
get_cse_reg_info_1 (regno);
p->timestamp = cse_reg_info_timestamp; /* <- Look! */
}
return p;
}
This way, DOM may be able to do jump threading to some extent and
remove a lot of the timestamp checks. Of couse, jump threading
opportunities are blocked when we have a non-pure/const function call
like so:
for (i = regno; i < endregno; i++)
{
if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))
remove_invalid_refs (i); /* <- Look! */
REG_IN_TABLE (i) = REG_TICK (i);
SUBREG_TICKED (i) = -1;
}
The second thought is to initialize all of cse_reg_info entries at the
beginning of cse_main. Set aside a bitmap with as many bits as
max_regs. Whenever we use one of these accessor macros for register
k, set a bit k saying "cse_reg_info_table[k] is in use." This way,
when we are done with a basic block, we can walk the bitmap and
reinitialize those that are used. Again, a good optimizer should be
able to eliminate most of these bit sets, but a non-pure/const
function call will block the cleanup opportunities. Of course, this
bitmap walk is far more expensive than cse_reg_info_timestamp++.
Kazu Hirata