AW: AW: setmemsi, movmemsi and post_inc
Stefan Franke
stefan@franke.ms
Thu Mar 25 20:16:51 GMT 2021
richt-----
> Von: Jeff Law <jeffreyalaw@gmail.com>
> Gesendet: Donnerstag, 25. März 2021 16:04
> On 3/25/2021 8:50 AM, Stefan Franke wrote:
> >> On 3/25/2021 8:21 AM, Stefan Franke wrote:
> >>> Hi there,
> >>>
> >>> I consider implementing movmemsi/setmemsi for some arch using
> >>> post_inc. Is there a "best practice" for using auto increments in
> >>> such early stages to avoid hickups in cse, gcse, cprop etc.p.p.?
> >> IIRC best practice is not to expose auto-inc until the auto-inc pass
> >> as earlier passes don't know how to deal with them. See "Incdec" in
> >> the developer manual.
> >>
> >>
> >> I'm also not aware of a target where an autoinc happens in contexts
> >> other than in a MEM. So you may run into problems with things that
> >> look like a simple reg->reg move -- insns 8 and 9 in your example.
> >>
> >>
> >> jeff
> > You looked closely! I added the reg note to the register move to enhance
> some passes to handle this correctly.
> > e.g. in cse.c
> >
> > if (find_reg_note (insn, REG_INC, dest))
> > continue;
> >
> > But these modifications aren't the standard way, thus I asked 😊
> >
> > => The routines should emit mems with offset and pray that auto-inc will
> pick it up?
>
> Yes.
>
>
> > (Btw: auto-inc-dec does not work well for unrolled loops, so I'm
> > tempted to force the auto-inc stuff...)
>
> That's likely going to lead to a variety of problems. The (documented)
> restriction around auto-inc not being used early in the pipeline has been
> around at least 30 years and passes have been written with that
> assumption. Fixing all of them may be a substantial effort.
>
> It's probably a better use of your time to get a deep understanding of why
> you're not getting the code you want in the presence of unrolling -- there
> may be things we can do in the unroller, auto-inc or passes in the middle to
> improve that.
>
> Jeff
At least it seems possible to use auto_inc inside an emitted loop, since that yields a separate bb...
Loop unrolling and auto_inc (post_inc) does not play well since there are two issues. consider these mem refs, with mode size 4:
a[0] = ...
a[4] = ...
a[8] = ...
a[12] = ...
loop unrolling does something like
b = a
b[0] = ...
b[4] = ...
b[8] = ...
b[12] = ...
b = b + 16
b[0] = ...
...
1. cse folds the memory refs from b to a, and but not the [4]
b = a
a[0] = ...
b[4] = ...
a[8] = ...
a[12] = ...
...
And you end up with one post_inc in the beginning and the rest without.
My workaround here is to consider the DF_REG_USE_COUNT and DF_REG_DEF_COUNT to decide if b[4] should be folded too
if (DF_REG_USE_COUNT(REGNO(folded_arg0)) <= 2 || DF_REG_DEF_COUNT(REGNO(folded_arg0)) > 1)
break;
2. auto-inc-dec does not yet handle the form of mem refs with offset and a matching add after these.
Since the above pattern as insns looks like
b = a + x
*b = ...
a = a + x + 4
and is detected as PRE_ADD, I convert it into
a = a + x
*a = ...
a = a + 4
which is now a POST_INC, update the variables and auto-inc-dec generates post increments up to the top, where x gets zero.
=> there is room for improvements^^
Stefan
More information about the Gcc-help
mailing list