This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC -- CSE compile-time stupidity


Hi Jeff,

> Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK
> and SUBREG_TICKED saves about 1% compilation time for the components
> of cc1.  Yes, that's a net 1% improvement by dropping the abstraction
> layer.

Yes, I've noticed the problem.  In my defense, the code in question
was even worse before I touched it. :-) With the old code, every time
we access a cse_reg_info entry that is different from the last access,
we were generating a function call.  Nowadays, we avoid calls to
get_cse_reg_info_1 95% of the time.

Of course, it's tough to beat the performance of your explicit
initialization approach, but here are couple of things that I have
thought about while keeping some abstraction layer.

The first thought is to expose the timestamp update to the user of
those macros that you mentioned.

/* Find a cse_reg_info entry for REGNO.  */

static inline struct cse_reg_info *
get_cse_reg_info (unsigned int regno)
{
  struct cse_reg_info *p = &cse_reg_info_table[regno];

  /* If this entry has not been initialized, go ahead and initialize
     it.  */
  if (p->timestamp != cse_reg_info_timestamp)
    {
      get_cse_reg_info_1 (regno);
      p->timestamp = cse_reg_info_timestamp;  /* <- Look! */
    }

  return p;
}

This way, DOM may be able to do jump threading to some extent and
remove a lot of the timestamp checks.  Of couse, jump threading
opportunities are blocked when we have a non-pure/const function call
like so:

      for (i = regno; i < endregno; i++)
        {
          if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i))
            remove_invalid_refs (i);   /* <- Look! */

          REG_IN_TABLE (i) = REG_TICK (i);
          SUBREG_TICKED (i) = -1;
        }

The second thought is to initialize all of cse_reg_info entries at the
beginning of cse_main.  Set aside a bitmap with as many bits as
max_regs.  Whenever we use one of these accessor macros for register
k, set a bit k saying "cse_reg_info_table[k] is in use."  This way,
when we are done with a basic block, we can walk the bitmap and
reinitialize those that are used.  Again, a good optimizer should be
able to eliminate most of these bit sets, but a non-pure/const
function call will block the cleanup opportunities.  Of course, this
bitmap walk is far more expensive than cse_reg_info_timestamp++.

Kazu Hirata


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]