This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH], Fix PR 70131, disable (double)(int) optimization for power8


On Fri, Mar 11, 2016 at 5:41 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> As I was auditing rs6000.md for power9 changes, I noticed that changes I had
> made in 2010 for power7 weren't as effective with power8.
>
> The FCTIWZ/FCTIWUZ instructions convert the scalar floating point value to a
> 32-bit signed/unsigned integer in bits 32-63 of the floating point or vector
> register.  Unfortunately, the hardware does not guarantee that bits 0-31 are
> copies of the sign, so that it can be used as a valid 64-bit integer.  There is
> no conversion from 32-bit int to floating point.  This meant in the power7
> days, if you wanted to round a floating point value to 32-bit integer, you
> would need to do:
>
>         convert to 32-bit integer
>         store 32-bit value on the stack
>         load 32-bit value to a GPR
>         sign/zero extend it
>         store 32-bit value to the stack
>         load 32-bit value to a FPR/vector register.
>
> The optimization does a store/load to sign/zero extend, rather than going
> through the GPRs.
>
> On power8, we have a direct move instruction that copies the value between the
> register sets, and the compiler will generate this if the above optimization is
> turned off (which is what this patch does).
>
> There are other ways to sign/zero extend a value in the vector registers
> without doing a move using multiple instructions, but in practice direct move
> seems to be as fast as the other instructions.
>
> I bootstrapped the compiler and there were no regressions with this patch.
>
> I rebuilt the Spec 2006 benchmark suite, and there 7 of the benchmarks that
> used this sequence somewhere in the code.  I ran those benchmarks with this
> patch, and compared them to the original benchmarks.  In 6 of the benchmarks,
> the run time was almost precisely the same.  The 416.gamess benchmark was about
> 2% faster, and there were no regressions.
>
> Is this patch ok to apply to the trunk?  I would like to apply it to the gcc 5
> branch as well.  Is this ok also?
>
> [gcc]
> 2016-03-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/70131
>         * config/rs6000/rs6000.md (round32<mode>2_fprs): Do not do the
>         optimization if we have direct move.
>         (roundu32<mode>2_fprs): Likewise.
>
> [gcc/testsuite]
> 2016-03-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/70131
>         * gcc.target/powerpc/ppc-round2.c: New test.

Okay for trunk and GCC 5.

Thanks, David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]