Initial shrink-wrapping patch

Bernd Schmidt bernds_cb1@t-online.de
Tue Sep 13 02:57:00 GMT 2011


On 09/01/11 15:57, Richard Sandiford wrote:
> add_to_hard_reg_set (pused, GET_MODE (x), REGNO (x));

Done.

> Strange line break, and comment doesn't match code (no "changed" variable).

Fixed.

>> +/* Return true if INSN requires the stack frame to be set up.
>> +   PROLOGUE_USED contains the hard registers used in the function
>> +   prologue.  */
>> +static bool
>> +requires_stack_frame_p (rtx insn, HARD_REG_SET prologue_used)
>> +{
>> [...]
>> +  if (for_each_rtx (&PATTERN (insn), frame_required_for_rtx, NULL))
>> +    return true;
>> +  CLEAR_HARD_REG_SET (hardregs);
>> +  note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs);
>> +  if (hard_reg_set_intersect_p (hardregs, prologue_used))
>> +    return true;
>> +  AND_COMPL_HARD_REG_SET (hardregs, call_used_reg_set);
>> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (TEST_HARD_REG_BIT (hardregs, regno)
>> +	&& df_regs_ever_live_p (regno))
>> +      return true;
> 
> ...I suppose this ought to use DF instead.

Here, I kind of disagree. Most of the new code works on unemitted
sequences, and df doesn't work for that, so we rely on note_stores and
such quite a lot. I've changed the one note_stores call in this function
to use df, but IMO it's not an improvement. As for
frame_required_for_rtx, I'd much rather test for equality with
hard_frame_pointer_rtx than muck about with register numbers and
frame_pointer_needed tests.

Also, it turns out I need to pretend that trap_if requires the prologue
due to the testcase you also ran across, so a for_each_rtx walk is
required anyway.

> Does something guarantee that non-local gotos are marked as
> needing a frame?

There are (use fp) (use sp) and such insns before nonlocal gotos. There
aren't any testsuite failures related to nonlocal gotos, so I assume
that's sufficient.

> Why do we need to check specifically for pic_offset_table_rtx?  Isn't it
> handled by the generic "live registers set by the prologue" test?

There is no such test, really - other than this code. There is a "live
registers clobbered by the prologue" test which disables the
optimization at a late stage. However, the prologue may set some
registers simply as scratch regs, and things would be more complicated
if we tried to identify which ones are scratch and which ones are really
set up for the rest of the function.

> Reason for asking is that pic_offset_table_rtx can be a global value,
> such as for mips*-elf.

Can we shelve this as a potential improvement for later?

>> +static rtx
>> +gen_return_pattern (bool simple_p)
> 
> Missing function comment.

Fixed.

> Pedantic, but the comment should reference SIMPLE_P.

Fixed.

> 
>> +#ifdef HAVE_simple_return
>> +  /* Try to perform a kind of shrink-wrapping, making sure the
>> +     prologue/epilogue is emitted only around those parts of the
>> +     function that require it.  */
>> +
>> +  if (flag_shrink_wrap && HAVE_simple_return && !flag_non_call_exceptions
>> +      && HAVE_prologue && !crtl->calls_eh_return)
>> +    {
> 
> Maybe it should be obvious, but could you add a comment explaining why
> flag_non_call_exceptions and crtl->calls_eh_return need to be checked
> explicitly?

Primarily because I think it's a waste of time to worry about them. I've
removed the flag_non_call_exceptions check, but eh_return is handled
elsewhere in this function, and there's really no reason to try to think
of the possible interactions. It's only used in one function in libgcc...

> Would checking prologue_seq != NULL be better than HAVE_prologue?

I was going to say no, but a look at the i386 expand_prologue suggests
we don't really want to try to duplicate the logic for HAVE_prologue.
"prologue_seq != NULL" isn't sufficient due to extra stuff being emitted
into the sequence, but I've added some code to check if the prologue has
nontrivial insns.

> Should this iterate over split_prologue_seq as well?

and

> Seems like we should be able to reuse the insn walk from the beginning
> of the enclosing if statement.

That seems to be a problem with merging multiple versions of this code.
Rearranged.

> Could we combine prologue_seq and split_prologue_seq into a single sequence?

Not in this patch, please.

> Excess { and } in for loop.

Fixed.

>     if (e->dest != EXIT_BLOCK_PTR
>         && bitmap_set_bit (&bb_flags, e->dest->index))
>       VEC_quick_push (basic_block, vec, e->dest);
> 
> should be a little more efficient.

Changed (three similar changes which you pointed out).

>> +#ifdef HAVE_simple_return
>> +	  simple_p = (entry_edge != orig_entry_edge
>> +		      ? !bitmap_bit_p (&bb_flags, bb->index) : false);
>> +#else
>> +	  simple_p = false;
>> +#endif
> 
> Why is this bb_flags rather than bb_antic_flags?

Doesn't really matter. (bb_antic_flags & ~bb_flags) only contains blocks
which don't need a prologue, but for which all subsequent paths lead to
something that requires a prologue. That's never going to be a block
that needs an epilogue or a return inserted.

> 	  simple_p = (entry_edge != orig_entry_edge
> 		      && !bitmap_bit_p (&bb_flags, bb->index));
> 
> seems more readable, but I suppose that's personal taste.

Fixed.

>> +#ifdef HAVE_simple_return
>> +	      if (simple_p)
> 
> Check is redundant given the gcc_assert.

Removed the if.

>> +	      /* If this block has only one successor, it both jumps
>> +		 and falls through to the fallthru block, so we can't
>> +		 delete the edge.  */
>> +	      if (single_succ_p (bb))
>> +		continue;
> 
> Seems odd that this could happen in optimised code, but if it did,
> wouldn't it invalidate the simple_p transformation?  Seems like
> the non-fallthrough edge would use a simple return but the
> fallthrough one wouldn't.

Whuh. For what it's worth, this code is discussed in this thread:
http://gcc.gnu.org/ml/gcc-patches/2000-02/msg00175.html

I've added a test for this so that blocks ending in such a block have
their bb_flags bit set.

>> -      start_sequence ();
>> -      emit_note (NOTE_INSN_EPILOGUE_BEG);
>> -      emit_insn (gen_sibcall_epilogue ());
>> -      seq = get_insns ();
>> -      end_sequence ();
>> +      ep_seq = gen_sibcall_epilogue ();
>> +      if (ep_seq)
>> +	{
> 
> Why the new null check?

This may have been more important with the old version of the dwarf2
code, but it seems cleaner this way - do we really want to insert
NOTE_INSN_EPILOGUE_BEG when there's no epilogue?

New version, bootstrapped and tested on i686-linux. Slightly earlier
versions tested with mips-elf (plus the various prologue/epilogue
patches for mips). Also added debug dump printout and a testcase (based
on the old 6/6 patch).


Bernd
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sw0912.diff
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20110913/7696a4d4/attachment.ksh>


More information about the Gcc-patches mailing list