This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH], PR target/80510, Optimize offsettable memory references on power7/power8
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Michael Meissner <meissner at linux dot vnet dot ibm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>
- Date: Mon, 15 May 2017 16:26:47 -0500
- Subject: Re: [PATCH], PR target/80510, Optimize offsettable memory references on power7/power8
- Authentication-results: sourceware.org; auth=none
- References: <20170512213350.GA985@ibm-tiger.the-meissners.org>
Hi,
On Fri, May 12, 2017 at 05:33:50PM -0400, Michael Meissner wrote:
> The problem is if the DImode, DFmode, and SFmode are allowed in Altivec
> registers before ISA 3.0, and the compiler wants to do an offsettable store.
> The compiler generates a move from an Altivec register to a traditional
> floating point register, and then the compiler generates the STFD or STFS
> instruction.
>
> This code adds peephole2's that notices there is a move from an altivec
> regsiter to fpr register and store, it changes this load the offset into a GPR,
> and do the indexed store from the Altivec register. I also added code to do
> the reverse (notice if there is a load to a FPR register and copy it to an
> Altivec register) and use an indexed load.
Ok.
> I ran the Spec 2006 floating point suite with this patch, and the LBM benchmark
> shows a nearly 3% gain with this patch, and there were no significant
> regressions.
Nice :-)
> Note, using peepholes are a quick way to fix the particular problem. However,
> it would be nice long term to arrange things so the back end can tell the
> register allocator to load up the offset into a register, instead of doing the
> move/store. I tried various modifications to secondary reload, but I wasn't
> able to get it to change behavor.
These peepholes are simple and look perfectly safe. It would of course
be great if we wouldn't need them, if LRA was a bit smarter.
> +;; Optimize cases where we want to do a D-form load (register+offset) on
> +;; ISA 2.06/2.07 to an Altivec register, and the register allocator
> +;; has generated:
> +;; load fpr
> +;; move fpr->altivec
Maybe show here the actual machine instructions before and after the
peephole has been applied?
> +/* { dg-final { scan-assembler {\xsadddp\M} } } */
> +/* { dg-final { scan-assembler {\stxsdx\M} } } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
You forgot the "m" (in "\m") in the first two of these. (I wonder
how this worked, esp. the "\s" one?)
> +/* { dg-final { scan-assembler {\xsaddsp\M} } } */
> +/* { dg-final { scan-assembler {\stxsspx\M} } } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +/* { dg-final { scan-assembler-not {\mmfvsrwz\M} } } */
And again.
Okay for trunk with that fixed. Also okay for 7 (after a delay).
Thanks!
Segher