AW: new ira optimization - adding a loop to ira
stefan@franke.ms
stefan@franke.ms
Fri Sep 13 12:58:00 GMT 2019
> -----Ursprüngliche Nachricht-----
> Von: stefan@franke.ms <stefan@franke.ms>
> Gesendet: Freitag, 13. September 2019 12:58
> An: 'Richard Sandiford' <richard.sandiford@arm.com>
> Cc: gcc-help@gcc.gnu.org
> Betreff: AW: new ira optimization - adding a loop to ira
>
> > -----Ursprüngliche Nachricht-----
> > Von: stefan@franke.ms <stefan@franke.ms>
> > Gesendet: Freitag, 13. September 2019 12:45
> > An: 'Richard Sandiford' <richard.sandiford@arm.com>
> > Cc: gcc-help@gcc.gnu.org
> > Betreff: AW: new ira optimization - adding a loop to ira
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Richard Sandiford <richard.sandiford@arm.com>
> > > Gesendet: Freitag, 13. September 2019 12:16
> > > An: stefan@franke.ms
> > > Cc: gcc-help@gcc.gnu.org
> > > Betreff: Re: new ira optimization - adding a loop to ira
> > >
> > > <stefan@franke.ms> writes:
> > > > I'm working on a new optimization to get rid of spilled tmp
> > > > variables
> > (e.g.
> > > > introduced by pre) to use the source mem ref instead of a stack
> slot.
> > > >
> > > > To do this, I added a loop into ira.c:ira()
> > > >
> > > > init_prune_stack_vars ();
> > > > do
> > > > {
> > > > #ifndef IRA_NO_OBSTACK
> > > > gcc_obstack_init (&ira_obstack); #endif
> > > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > > >
> > > > ...
> > > >
> > > > ira_color ();
> > > >
> > > > }
> > > > while (flag_prune_stack_vars && prune_stack_vars ());
> > > >
> > > > To get it work, the prune_stack_vars function resets a couple of
> data.
> > > > This is mostly working - but on some source files, it fails due to
> > > > invalid reg_equivs.
> > > > Since this also happens, if the optimizer does nothing and just
> > > > loops
> > once.
> > > >
> > > > Currently I'm calling this, before looping again
> > > >
> > > > regstat_free_n_sets_and_refs ();
> > > > regstat_free_ri ();
> > > > loop_optimizer_finalize ();
> > > > free_dominance_info (CDI_DOMINATORS);
> > > >
> > > > Any hint, what I'm missing to reset?
> > >
> > > I can't see anything obviously missing. What kind of failure do you
> > see? E.g.
> > > do you get an internal compiler error or does the compiler generate
> > > incorrect code?
> > >
> > > Do you see the failure on an in-tree test case? FWIW, I just tried
> > looping like
> > > this locally and didn't see any failures for the tests I tried. But
> > > I
> > was obviously
> > > testing without the new optimisation, and so each loop iteration
> > > should
> > just
> > > repeat what the previous one did.
> > >
> > > Not related to the failure, but: do you do anything with the
> > > obstacks
> > when
> > > looping again? Including the initialisations in the loop as above
> > > would introduce a memory leak if you don't do anything to free the
> > contents.
> > > It'd probably be better to initialise outside the loop unless you're
> > really
> > > confident that the no data is carried across iterations.
> > >
> > > Thanks,
> > > Richard
> >
> > Thanks für the ira_obstack hint - I will take care of this, once the
> loop mode
> > is working - maybe I can start looping later or I'll free the memory.
> >
> > In reload: push_reload(...) this raises an error:
> >
> > gcc_assert (regno < FIRST_PSEUDO_REGISTER
> > || reg_renumber[regno] >= 0
> > || reg_equiv_constant (regno) == NULL_RTX);
> >
> > I already know that it's reg_equiv_constant and that this
> reg_equiv_constant
> > is also set in the unpatched code.
> >
> > So I am looking why these additional reloads occur. There are
> > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> >
> > Thanks,
> > Stefan
>
>
> The difference is the additional expr_list, which causes the reload:
>
> (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
> (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> (const_int 16 [0x10])) [178 this+0 S4 A16]))
> engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
> (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> (const_int 16 [0x10])) [178 this+0 S4 A16])
> (nil)))
>
> => I'll add some code to drop the expr_list from all insns...
I took the wrong corner:
A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This sets the req_equiv and causes the failure during reload...
=> I added code to record all insn/REG_EQUAL-note pairs
=> and restore these if the loop is run again - dropping the REQ_EQUIV notes.
And this issue went aways.
Plus I moved the loop start further below, so the ira_obstack is only initialized once:
init_prune_stack_vars ();
do
{
init_reg_equiv ();
=> I can continue to work on the optimizer itself.
To provide an example:
void transformVector( double* restrict inputVector, double const transformMatrix[4][4],double* restrict outputVector)
{
for(int k = 0; k < 900; k++)
{
double x = *inputVector++;
double y = *inputVector++;
double z = *inputVector++;
for(int l = 0; l < 3; l++){
double res = transformMatrix[l][0] * x;
res += transformMatrix[l][1] * y;
res += transformMatrix[l][2] * z;
res += transformMatrix[l][3];
*outputVector++ = res;
}
}
}
m68k-amigaos-gcc -m68080 -O3 x.c -S
yields:
#NO_APP
.text
.align 2
.globl _transformVector
_transformVector:
link.w a5,#-88
move.l (16,a5),a0
move.l (8,a5),a1
fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
movem.l a4/a3/a2,-(sp)
move.l (12,a5),a2
move.l (a2)+,(-16,a5)
move.l (a2)+,(-12,a5)
lea (21600,a0),a4
fdmove.d (a2)+,fp7
move.l (a2)+,(-8,a5)
move.l (a2)+,(-4,a5)
move.l (a2)+,(-24,a5)
move.l (a2)+,(-20,a5)
move.l (a2)+,(-32,a5)
move.l (a2)+,(-28,a5)
move.l (a2)+,(-40,a5)
move.l (a2)+,(-36,a5)
move.l (a2)+,(-48,a5)
move.l (a2)+,(-44,a5)
move.l (a2)+,(-56,a5)
move.l (a2)+,(-52,a5)
move.l (a2)+,(-64,a5)
move.l (a2)+,(-60,a5)
move.l (a2)+,(-72,a5)
move.l (a2)+,(-68,a5)
move.l (a2)+,(-80,a5)
move.l (a2)+,(-76,a5)
move.l (a2),(-88,a5)
move.l (4,a2),(-84,a5)
.L2:
fdmove.d (8,a1),fp0
lea (24,a1),a3
lea (24,a0),a2
fdmove.d (a1),fp6
move.l a3,a1
fdmove.x fp0,fp4
fdmove.d (-16,a5),fp2
fdmul.x fp6,fp2
fdmul.x fp7,fp4
...
And with the new option:
m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars
_transformVector:
link.w a5,#0
move.l (16,a5),a1
move.l (12,a5),a0
fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
movem.l a6/a4/a3/a2,-(sp)
move.l (8,a5),a2
lea (21600,a1),a6
.L2:
fdmove.d (a2),fp2
lea (24,a2),a4
lea (24,a1),a3
fdmove.d (8,a2),fp0
move.l a4,a2
fdmove.x fp2,fp3
fdmove.x fp0,fp5
fdmul.d (a0),fp3
fdmul.d (8,a0),fp5
...
Btw: the code is not platform specific -> guess it's generally useful
Thanks
Stefan
More information about the Gcc-help
mailing list