This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/29256] [4.2 regression] loop performance regression
- From: "rakdver at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 28 Sep 2006 14:40:23 -0000
- Subject: [Bug middle-end/29256] [4.2 regression] loop performance regression
- References: <bug-29256-12262@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #14 from rakdver at gcc dot gnu dot org 2006-09-28 14:40 -------
> > > for this loop instead of just one.
> > > Actually unrolling is not need to produced the bad code:
> > > .L2:
> > > lwz 0,0(9)
> > > stwx 0,11,9
> > > addi 9,9,4
> > > bdnz .L2
> > > I bet a beer that loop.c actually fixed this crap up before.
> >
> > I am bad at reading ppc assembler; could you please explain what exactly is
> > wrong with the code you present?
>
> One, there are two adds still there (just one is implicated)
> so why not do the loop as:
there is only one add, as far as I can see.
> .L2:
> lwz r0,0(r9)
> stw r0,0(r11)
> addi r9,r9,4
> addi r11,r11,4
> bdnz .L2
Otoh, this seems worse to me (one more add).
> Or:
> .L2:
> lwxz r0,r9,r12
> stwx r0,r11,r12
> addi r12,r12,4
> bdnz .L2
Yes, this would be about the same. Still, ivopts chose one of the best
possible ways, so I do not see what you are complaining about so much.
The unrolled case is something different -- of course we should use offsetted
modes there.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256