This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores



------- Comment #2 from dave at hiauly1 dot hia dot nrc dot ca  2006-09-24 22:15 -------
Subject: Re:  [hppa] Missing address increment optimization for fp load/stores

> For this test case:
> 
> void f(double *pds, double *pdd, unsigned long len) {
>   while (len >= 8*sizeof(double)) {
>     register double r1,r2,r3,r4;
>     r1 = *pds++;
>     r2 = *pds++;
>     r3 = *pds++;
>     r4 = *pds++;
>     *pdd++ = r1;
>     *pdd++ = r2;
>     *pdd++ = r3;
>     *pdd++ = r4;
>   }
> }
> 
> gcc starting from 4.0 produces this:
> 
> .L3:
>         fldds -16(%r26),%fr22
>         fldds -8(%r26),%fr23
>         fldds 0(%r26),%fr24
>         fldds 8(%r26),%fr25
>         ldo 32(%r26),%r26
>         fstds %fr22,-16(%r25)
>         fstds %fr23,-8(%r25)
>         fstds %fr24,0(%r25)
>         fstds %fr25,8(%r25)
>         b .L3
> 
> which I suspect is actually better, since it avoids dependencies between the
> loads. But I'm not familiar with hppa, can anybody comment?

It looks close to optimal to me.  The code is better than that generated
by 3.4.x or HP cc.  Using the auto-increment forms would allow elimination
of the two ldo instructions to increment r25 and r26.

Dave


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]