This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/29256] [4.2 regression] loop performance regression



------- Comment #14 from rakdver at gcc dot gnu dot org  2006-09-28 14:40 -------
> > > for this loop instead of just one.
> > > Actually unrolling is not need to produced the bad code:
> > > .L2:
> > >         lwz 0,0(9)
> > >         stwx 0,11,9
> > >         addi 9,9,4
> > >         bdnz .L2
> > > I bet a beer that loop.c actually fixed this crap up before.
> > 
> > I am bad at reading ppc assembler; could you please explain what exactly is
> > wrong with the code you present?
> 
> One, there are two adds still there (just one is implicated)
> so why not do the loop as:

there is only one add, as far as I can see.

>  .L2:
>          lwz r0,0(r9)
>          stw r0,0(r11)
>          addi r9,r9,4
>          addi r11,r11,4
>          bdnz .L2

Otoh, this seems worse to me (one more add).

> Or:
>  .L2:
>          lwxz r0,r9,r12
>          stwx r0,r11,r12
>          addi r12,r12,4
>          bdnz .L2

Yes, this would be about the same.  Still, ivopts chose one of the best
possible ways, so I do not see what you are complaining about so much.
The unrolled case is something different -- of course we should use offsetted
modes there.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]