This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH], Fix PR 70131, disable (double)(int) optimization for power8
- From: David Edelsohn <dje dot gcc at gmail dot com>
- To: Michael Meissner <meissner at linux dot vnet dot ibm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 11 Mar 2016 18:51:57 -0500
- Subject: Re: [PATCH], Fix PR 70131, disable (double)(int) optimization for power8
- Authentication-results: sourceware.org; auth=none
- References: <20160311224148 dot GA31239 at ibm-tiger dot the-meissners dot org>
On Fri, Mar 11, 2016 at 5:41 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> As I was auditing rs6000.md for power9 changes, I noticed that changes I had
> made in 2010 for power7 weren't as effective with power8.
>
> The FCTIWZ/FCTIWUZ instructions convert the scalar floating point value to a
> 32-bit signed/unsigned integer in bits 32-63 of the floating point or vector
> register. Unfortunately, the hardware does not guarantee that bits 0-31 are
> copies of the sign, so that it can be used as a valid 64-bit integer. There is
> no conversion from 32-bit int to floating point. This meant in the power7
> days, if you wanted to round a floating point value to 32-bit integer, you
> would need to do:
>
> convert to 32-bit integer
> store 32-bit value on the stack
> load 32-bit value to a GPR
> sign/zero extend it
> store 32-bit value to the stack
> load 32-bit value to a FPR/vector register.
>
> The optimization does a store/load to sign/zero extend, rather than going
> through the GPRs.
>
> On power8, we have a direct move instruction that copies the value between the
> register sets, and the compiler will generate this if the above optimization is
> turned off (which is what this patch does).
>
> There are other ways to sign/zero extend a value in the vector registers
> without doing a move using multiple instructions, but in practice direct move
> seems to be as fast as the other instructions.
>
> I bootstrapped the compiler and there were no regressions with this patch.
>
> I rebuilt the Spec 2006 benchmark suite, and there 7 of the benchmarks that
> used this sequence somewhere in the code. I ran those benchmarks with this
> patch, and compared them to the original benchmarks. In 6 of the benchmarks,
> the run time was almost precisely the same. The 416.gamess benchmark was about
> 2% faster, and there were no regressions.
>
> Is this patch ok to apply to the trunk? I would like to apply it to the gcc 5
> branch as well. Is this ok also?
>
> [gcc]
> 2016-03-11 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> PR target/70131
> * config/rs6000/rs6000.md (round32<mode>2_fprs): Do not do the
> optimization if we have direct move.
> (roundu32<mode>2_fprs): Likewise.
>
> [gcc/testsuite]
> 2016-03-11 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> PR target/70131
> * gcc.target/powerpc/ppc-round2.c: New test.
Okay for trunk and GCC 5.
Thanks, David