Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)

Tue Nov 29 13:26:00 GMT 2005

When I mentioned on the mailing list that I was going to help review the 
CFO branch since we (Analog Devices) are interested in code size 
optimizations, people on IRC pointed me at this patch as a potentially 
better starting point, so I'll try to review it first.  Here's my first 
pass of comments.  More to come, probably, once I've played with the 
patch some more and understand it better.

 >       * ifcvt.c (noce_try_complex_cmove): New function.
 >       (noce_process_if_block): Call it.
 >       For flag_expensive_optimizations, update cond_exec_changed_p on
 >       success.
 >       (rest_of_handle_if_conversion): For flag_expensive_optimizations,
 >       provide if_convert with register lifeness info.

Can we postpone the ifcvt changes?  They look quite straightforward, but 
it seems like they are independent of the rest.  Once we've done the 
heavy lifting to get the first part in, we can revisit this.

One general comment: You're matching entire pieces of RTL.  Wouldn't it 
be possible/simpler to match insn_codes, extract the operands, and then 
just look at those?  I have in mind something that looks a bit like 
parts of regrename.

I was wondering about the use of the find_reg_equal_equiv_note - lots of 
code seems to use that function, but do we really want to mess with 
REG_EQUIV notes?

Like Stephen B., I'd like to see this moved to a separate file; then you 
can also lose the struct_equiv_ prefix from most function names and make 
them more meaningful.

 > +/* In cfgcleanup.c */

Needs updating when moving the code.

 > +/* A struct equiv_info is used to pass information to struct_equiv and
 > +   to gather state while two basic blocks are checked for structural
 > +   equivalence.

Like Stephen, I'd like to see the rest of this comment distributed so 
that each structure member has its own comment in front of it.  See also 
below - I'd like to see a struct_equiv_checkpoint used inside this 
monster struct to keep track of current state.  Further improvements 
might be to split off those members which track state, and others which 
are just inputs set by the callers of the struct_equiv pass ("struct 
struct_equiv_params"?).  Such a split could either result in two 
structures, or just two clearly separated areas inside the same 
structure.  There should be some kind of logical grouping instead of 
just a collection of members.

 > +   X_BLOCK and Y_BLOCK are the two block being compared.

"blocks"

 > +   NEED_FULL_BLOCK is set when the entire basic blocks have to match, as
 > +   in if-conversion.  It is cleared for cross-jumping.

Might be better as an arg to struct_equiv_block_eq or part of a params 
struct, since it's only needed in the toplevel function.  Anything to 
get this struct here down in size...

 > +  bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
 > +  rtx x_local[STRUCT_EQUIV_MAX_LOCAL], y_local[STRUCT_EQUIV_MAX_LOCAL];

Might want to turn this into an array of structs.  Then we could also 
make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.

 > +static int
 > +assign_reg_reg_set (regset set, rtx reg, int value)
 > +{
 > +  unsigned regno = REGNO (reg);
 > +  int nregs, i, old;
 > +
 > +  if (regno >= FIRST_PSEUDO_REGISTER)
 > +    {
 > +      if (reload_completed)
 > +     regno = reg_renumber[regno];
 > +      else
 > +     {
 > +       if ((value != 0) == REGNO_REG_SET_P (set, regno))
 > +         return 0;
 > +       if (value)
 > +         SET_REGNO_REG_SET (set, regno);
 > +       else
 > +         CLEAR_REGNO_REG_SET (set, regno);
 > +       return value ? 1 : -1;
 > +     }
 > +    }
 > +  old = 0;
 > +  for (i = nregs = hard_regno_nregs[regno][GET_MODE (reg)]; --i >= 
0; regno++)
 > +    {
 > +      old += REGNO_REG_SET_P (set, regno);
 > +      if (value)
 > +     SET_REGNO_REG_SET (set, regno);
 > +      else
 > +     CLEAR_REGNO_REG_SET (set, regno);
 > +    }
 > +  return value ? nregs - old : -old;
 > +}

Looks slightly obfuscated to me with the special case for pseudos. 
Please combine the two loops into one by setting a NREGS variable to 1 
for pseudos.

 > +   RVALUE == 0: destination
 > +   RVALUE == 1: source
 > +   RVALUE == -1: source, ignore SET_DEST of SET / clobber.  */

Better an enum.

 > +            it already registerd.  */

"registered"

 > +                   change *= IMPOSSIBLE_MOVE_FACTOR;

Just an assignment would be enough?  Wouldn't like this to overflow, and 
I don't think we need different grades of impossible.

 > +/* Do only the struct_equiv SET_DEST processing for SETs and 
CLOBBERs.  */
 > +static bool
 > +struct_equiv_set (rtx x, rtx y, struct equiv_info *info)

Not too happy with some of these function names.  "set_dest_equiv_p"?
Should mention return value in comment.  The rest of the description 
could also be enlarged to describe the necessary details of the 
algorithm to understand why we need this function.

 > +/* Record state about current inputs / local registers / lifeness
 > +   in CHECKPOINT[N] of INFO.  */
 > +static void
 > +struct_equiv_make_checkpoint (int n, struct equiv_info *info)
 > +{
 > +  struct struct_equiv_checkpoint *p = &info->checkpoint[n];
 > +
 > +  p->ninsns = info->ninsns;
 > +  p->version = info->version;
 > +  p->local_count = info->local_count;
 > +  p->input_count = info->input_count;
 > +  p->x_start = info->x_start;
 > +  p->y_start = info->y_start;
 > +  p->input_valid = info->input_valid;
 > +}

Seems like you could use a checkpoint structure inside struct equiv_info 
and do a struct copy here.  That would give struct equiv_info a bit more
structure.

 > +/* Restore state about current inputs / local registers / lifeness
 > +   from CHECKPOINT[N] of INFO.  */
 > +static void
 > +struct_equiv_restore_checkpoint (int n, struct equiv_info *info)
 > +{
 > +  struct struct_equiv_checkpoint *p = &info->checkpoint[n];
 > +
 > +  info->ninsns = p->ninsns;
 > +  info->x_start = p->x_start;
 > +  info->y_start = p->y_start;
 > +  info->input_count = p->input_count;
 > +  info->input_valid = p->input_valid;
 > +  while (info->local_count > p->local_count)
 > +    {
 > +      info->local_count--;
 > +      info->version--;
 > +      if (REGNO_REG_SET_P (info->x_local_live,
 > +                        REGNO (info->x_local[info->local_count])))
 > +     {
 > +       assign_reg_reg_set (info->x_local_live,
 > +                           info->x_local[info->local_count], 0);
 > +       assign_reg_reg_set (info->y_local_live,
 > +                           info->y_local[info->local_count], 0);
 > +       info->version--;
 > +     }
 > +    }

I'm not sure I understand what's going on here.  This goes back and gets 
reg numbers out of x_local and y_local, indexed by local_count.  For 
this to work, these arrays must always (except for in this function) 
grow and never shrink - correct?
But there's code after the comment
             /* This might be a register local to each block.  See if we 
have
                it already registerd.  */
which seems to shrink the arrays and overwrite entries.  How does this 
work together?

 > +  if (info->version != p->version)
 > +    info->need_rerun = true;
 > +}
 > +
 > +/* Check if the set of REG_DEAD notes attached to I1 and I2 allows us to
 > +   go ahead with merging I1 and I2, which otherwise look fine.
 > +   Inputs / local registers for the inputs of I1 and I2 have already 
been
 > +   set up.  */
 > +static bool
 > +struct_equiv_death (rtx i1 ATTRIBUTE_UNUSED, rtx i2 ATTRIBUTE_UNUSED,
 > +                 struct equiv_info *info ATTRIBUTE_UNUSED)

"death_notes_match_p"?

 > +{
 > +#ifdef STACK_REGS
 > +  /* If cross_jump_death_matters is not 0, the insn's mode
 > +     indicates whether or not the insn contains any stack-like regs.  */
 > +
 > +  if (INSN_P (i1)
 > +      && (info->mode & CLEANUP_POST_REGSTACK) && 
stack_regs_mentioned (i1))

Is this bit always set after regstack, or do we need an extra test such 
as reload_completed?

 > +/* Return number of matched insns.  */

Matched how?  I'd like a human-readable overview of the algorithm somewhere.

 > +/* This function must be called up to three times for a successful 
cross-jump
 > +   match:
 > +   first to find out which instructions do match.  While trying to match
 > +   another instruction that doesn't match, we destroy information in 
info
 > +   about the actual inputs.  So if there have been any before the last
 > +   match attempt, we need to callthis function again to recompute the

"call this"

 > +   actual inputs up to the actual start of the matching sequence.
 > +   When we are then satisfied that the cross-jump is worth while, we

"worthwhile"

 > +  if (info->x_start != x_stop) for (;;)

Formatting.

 > +       /* The following code helps take care of G++ cleanups.  */

Comment not helpful.  How does it help?  What's special about G++ cleanups?

Bernd