RFC: Rewriting auto-inc-dec.c

Wed Aug 10 10:40:00 GMT 2011

Thanks for looking at this.

Bernd Schmidt <bernds@codesourcery.com> writes:
> On 07/21/11 17:42, Richard Sandiford wrote:
>>       /* When optimizing for speed, don't introduce dependencies between
>>          memory references in the chain and memory references outside of it,
>>          since doing so would limit scheduling.  If the chain is truly
>>          worthwhile, we can still try again using a new pseudo register.  */
>
> Now, it's good that there is at least some consideration for this, but I
> wonder whether it's possible to do better here. The basic principle is
> that if something is achievable without autoinc, in the same number of
> insns (or a factor 1.x thereof with x small), then it's preferrable not
> to use autoinc at all.

Yeah.  The pass tries to reject changes that have a net zero cost unless
there are specific reasons to believe that later passes would benefit.
(This is the max_cost_delta argument to test_replacements.)

> For example,
>
>>            ... = *r1                ... = *rN++
>>            ... = *(r1 + 4)          ... = *rN
>>      *(r1 + 4) = ...              *rN++ = ...
>>             r2 = r1 + 8              r2 = rN
>>            ... = *r2          --->  ... = rN++
>>             r3 = r2 + 4              r3 = rN
>>            ... = *r3                ... = rN++
>>             r4 = r1 + 16             r4 = rN
>>            ... = *r4                ... = rN
>
> On many targets, this would best be handled as a series of (reg +
> offset) addressing modes.

The idea is that we run after fwprop, so all addresses that are better
expressed as (reg + offset) should already have that form.  I.e. the
pass should be run at a stage where it doesn't have to process unoptimised
addresses (and thus have to "second guess" what later address optimisations
might do).  So the fact that the addresses above are not:

	... = *(r1 + 8)
	... = *(r1 + 12)
	... = *(r1 + 16)

or at least:

	... = *(r1 + 8)
	... = *(r2 + 4)
	... = *(r1 + 16)

should mean that those addresses aren't acceptable on the target,
or at least are more expensive than plain *rN.  This is the kind
of thing we'd see for NEON.

I should explain that in the comment -- thanks for bringing it up.

> If an update is required, on targets that
> support both (reg + offset) and {pre,post}_modify, it would be good to
> have an automodify in just one of the addresses; if a single add insn is
> required this may still be better if the number of memory references is
> large enough. Do you think this is something that can be done in the
> context of this pass?

So something like:

	 ... = *r1                ... = *rN, rN += 16
	 ... = *(r1 + 100)        ... = *(rN - 84)
	 ... = *(r1 + 80)         ... = *(rN - 64)
	 ... = *(r1 - 20)         ... = *(rN - 36)
	 r1 = r1 + 16

?  Yeah, that could be added.

> I'm not sure all the code is completely safe if a hardreg is used with a
> mix of single- and multi-word accesses.

The code was supposed to handle that.  Is there anything specifically
that worries you?  FWIW, the principles were:

  - in update_liveness, when a multi-register value (rN...rM) is used or
    defined, behave as though each individual register in the range [rN, rM]
    had been used or defined.  (The DF machinery handles this for us;
    there's a separate df_ref for each register in the range.)

  - If we're tracking (rN...rM) as a single register rtx, then any
    non-ref_point use of any register in the range [rN, rM] is recorded
    as an untracked use.

  - If we're tracking (rN...rM) as a single register rtx, then any
    definition of any register in the range [rN, rM] clobbers the
    tracked value, and brings its live range to an end.

> Maybe just mark a hardreg that occurs in a multi-word mode as
> unavailable for the optimization?

Well, I'd rather not if we can help it.  AVR in particular benefits from
handling wide hard regs.

> It would be nice not to have semi-identical pairs of functions like
> other_uses_before and other_uses_after or valid_mem_add_pair and
> valid_add_mem_pair. Can these be merged sensibly?

Although the functions are deliberately similar in interface, they're
not really that similar in implementation.  They test different live
ranges and apply different liveness checks.  So I'm not sure there's
much that can be merged.

> Just to make sure, my understanding is that this will also generate
> (automod (reg1) (plus (reg1) (reg2))). Right?

Yeah, that's right.

>> /* Represents an offset that is applied to a valno base.  */
>> union valno_offset {
>>   HOST_WIDE_INT constant;
>>   struct valno_def *valno;
>> };
>
> Should mention that this is used as splay tree key?

Well, it is, but it's also an important part of the valno record.
I mentioned that the offset was the key here:

  /* A splay tree of valnos that use this one as their base.  The splay
     key is the offset.  */

but I agree it's probably worth saying it above the "offset" field itself
as well.

> For some reason I find it slightly scary to use a union as a key; I
> see that things like valno_splay have a constant_p parameter - maybe
> you want two separate trees instead?

Yeah, I can certainly use two trees here.  I was probably overly
influenced by the fact that the current valno_def was a nice round
8 words on 64-bit hosts, so didn't want to add an extra field.

Even so, I think the splay function itself would remain largely
the same.  The offset would still be stored as a union, and we'd
still be using that as the key.  Even if we replace the constant_p
argument with a function callback. that callback would still need to
assert that the valno's constant_p field has the right value, then
extract the relevant field from the union.

>> /* Create a new entry with key KEY in ROOT's derived tree.  */
>> 
>> static struct valno_def *
>> new_splay_valno (bool constant_p, struct valno_def *base,
>>                  union valno_offset *offset)
>
> Comment doesn't match definition.

Doh!  Fixed, thanks.

Richard