This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #4
- From: David Edelsohn <dje dot gcc at gmail dot com>
- To: Michael Meissner <meissner at linux dot vnet dot ibm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 1 Oct 2013 19:15:39 -0400
- Subject: Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #4
- Authentication-results: sourceware.org; auth=none
- References: <20130822185658 dot GA30430 at ibm-tiger dot the-meissners dot org> <20130923200617 dot GA3900 at ibm-tiger dot the-meissners dot org> <20130924203310 dot GA25337 at ibm-tiger dot the-meissners dot org> <CAGWvnykGupvw8na+GcKUxkpbcn6CV6HVrXuAcHeQdaSf52STTg at mail dot gmail dot com> <20130926205137 dot GA28417 at ibm-tiger dot the-meissners dot org> <20131001175213 dot GB2708 at ibm-tiger dot the-meissners dot org>
On Tue, Oct 1, 2013 at 1:52 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch moves most of the VSX DFmode operations from vsx.md to rs6000.md to
> use the traditional floating point instructions (f*) instead of the VSX scalar
> instructions (xs*) if all of the registers come from the traditional floating
> point register set. The add, subtract, multiply, divide, reciprocal estimate,
> square root, absolute value, negate, round functions, and multiply/add
> instructions were changed. Some of the converts have not been changed with
> these patches. If the -mupper-regs-df switch is used, it will attempt to use
> the upper registers (those that overlay on the traditional Altivec register
> set).
>
> This patch also combines the scalar SFmode/DFmode support on non-SPE systems.
> It adds in ISA 2.07 (power8) single precision floating point support if the
> -mupper-regs-sf switch is used.
>
> At present, neither -mupper-regs-df nor -mupper-regs-sf is usable if reload has
> to do anything. A future patch will address this.
>
> I did need to adjust a few tests that were specifically testing VSX scalar code
> generation. In addition, I put in a simple test to make sure the initial
> -mupper-regs-df and -mupper-regs-sf works correctly.
>
> I tested this an except for power7, power8 I could not find any changes in code
> generated for power4, power5, power6, power6x, G4, G5, cell, e5500, e6500,
> xilinx (sp_full, sp_lite, dp_full, dp_lite, none), 8548/8540 (spe), 750cl
> (paired floating point).
>
> For VSX systems there is code generation differences:
>
> 1) The traditional fp instruction is generated instead of VSX;
>
> 2) Because of #1, the code generator favors the 4 operand of multiply/add
> instructions, where the target register does not overlap with any of
> the inputs over the VSX version that that requires overlap.
>
> 3) A few of the vectorized tests on power8 now generate more direct move
> instructions, instead of moving values through the stack than
> previously. These tests are integer tests, where you are doing an
> operation between an integer vector and a scalar value. Previously in
> some cases, the register allocator would store the value from a GPR and
> reload it to the vector registers.
>
> 4) There is a slight scheduling difference in doing long double abs, that
> causes a different register to be used. The code for long double abs
> needs to be improved in any case (the early splitting is causing spills
> to the stack).
>
> I had no differences in doing bootstrap and make check (with the testsuite
> fixes applied).
>
> In addition, I am running Spec 2006 floating point tests on a power7 box to
> compare the effects of going back to the traditional floating point tests. For
> most tests, there is less than 2% difference between the runs. One benchmark
> (482.sphinx3) is slightly faster with these changes, and it is dominated by
> floating point multiply/add operations.
>
> Can I apply these patches?
>
> [gcc]
> 2013-09-30 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000-builtin.def (XSRDPIM): Use floatdf2,
> ceildf2, btruncdf2, instead of vsx_* name.
>
> * config/rs6000/vsx.md (vsx_add<mode>3): Change arithmetic
> iterators to only do V2DF and V4SF here. Move the DF code to
> rs6000.md where it is combined with SF mode. Replace <VSv> with
> just 'v' since only vector operations are handled with these insns
> after moving the DF support to rs6000.md.
> (vsx_sub<mode>3): Likewise.
> (vsx_mul<mode>3): Likewise.
> (vsx_div<mode>3): Likewise.
> (vsx_fre<mode>2): Likewise.
> (vsx_neg<mode>2): Likewise.
> (vsx_abs<mode>2): Likewise.
> (vsx_nabs<mode>2): Likewise.
> (vsx_smax<mode>3): Likewise.
> (vsx_smin<mode>3): Likewise.
> (vsx_sqrt<mode>2): Likewise.
> (vsx_rsqrte<mode>2): Likewise.
> (vsx_fms<mode>4): Likewise.
> (vsx_nfma<mode>4): Likewise.
> (vsx_copysign<mode>3): Likewise.
> (vsx_btrunc<mode>2): Likewise.
> (vsx_floor<mode>2): Likewise.
> (vsx_ceil<mode>2): Likewise.
> (vsx_smaxsf3): Delete scalar ops that were moved to rs6000.md.
> (vsx_sminsf3): Likewise.
> (vsx_fmadf4): Likewise.
> (vsx_fmsdf4): Likewise.
> (vsx_nfmadf4): Likewise.
> (vsx_nfmsdf4): Likewise.
> (vsx_cmpdf_internal1): Likewise.
>
> * config/rs6000/rs6000.h (TARGET_SF_SPE): Define macros to make it
> simpler to select whether a target has SPE or traditional floating
> point support in iterators.
> (TARGET_DF_SPE): Likewise.
> (TARGET_SF_FPR): Likewise.
> (TARGET_DF_FPR): Likewise.
> (TARGET_SF_INSN): Macros to say whether floating point support
> exists for a given operation for expanders.
> (TARGET_DF_INSN): Likewise.
>
> * config/rs6000/rs6000.c (Ftrad): New mode attributes to allow
> combining of SF/DF mode operations, using both traditional and VSX
> registers.
> (Fvsx): Likewise.
> (Ff): Likewise.
> (Fv): Likewise.
> (Fs): Likewise.
> (Ffre): Likewise.
> (FFRE): Likewise.
> (abs<mode>2): Combine SF/DF modes using traditional floating point
> instructions. Add support for using the upper DF registers with
> VSX support, and SF registers with power8-vector support. Update
> expanders for operations supported by both the SPE and traditional
> floating point units.
> (abs<mode>2_fpr): Likewise.
> (nabs<mode>2): Likewise.
> (nabs<mode>2_fpr): Likewise.
> (neg<mode>2): Likewise.
> (neg<mode>2_fpr): Likewise.
> (add<mode>3): Likewise.
> (add<mode>3_fpr): Likewise.
> (sub<mode>3): Likewise.
> (sub<mode>3_fpr): Likewise.
> (mul<mode>3): Likewise.
> (mul<mode>3_fpr): Likewise.
> (div<mode>3): Likewise.
> (div<mode>3_fpr): Likewise.
> (sqrt<mode>3): Likewise.
> (sqrt<mode>3_fpr): Likewise.
> (fre<Fs>): Likewise.
> (rsqrt<mode>2): Likewise.
> (cmp<mode>_fpr): Likewise.
> (smax<mode>3): Likewise.
> (smin<mode>3): Likewise.
> (smax<mode>3_vsx): Likewise.
> (smin<mode>3_vsx): Likewise.
> (negsf2): Delete SF operations that are merged with DF.
> (abssf2): Likewise.
> (addsf3): Likewise.
> (subsf3): Likewise.
> (mulsf3): Likewise.
> (divsf3): Likewise.
> (fres): Likewise.
> (fmasf4_fpr): Likewise.
> (fmssf4_fpr): Likewise.
> (nfmasf4_fpr): Likewise.
> (nfmssf4_fpr): Likewise.
> (sqrtsf2): Likewise.
> (rsqrtsf_internal1): Likewise.
> (smaxsf3): Likewise.
> (sminsf3): Likewise.
> (cmpsf_internal1): Likewise.
> (copysign<mode>3_fcpsgn): Add VSX/power8-vector support.
> (negdf2): Delete DF operations that are merged with SF.
> (absdf2): Likewise.
> (nabsdf2): Likewise.
> (adddf3): Likewise.
> (subdf3): Likewise.
> (muldf3): Likewise.
> (divdf3): Likewise.
> (fred): Likewise.
> (rsqrtdf_internal1): Likewise.
> (fmadf4_fpr): Likewise.
> (fmsdf4_fpr): Likewise.
> (nfmadf4_fpr): Likewise.
> (nfmsdf4_fpr): Likewise.
> (sqrtdf2): Likewise.
> (smaxdf3): Likewise.
> (smindf3): Likewise.
> (cmpdf_internal1): Likewise.
> (lrint<mode>di2): Use TARGET_<MODE>_FPR macro.
> (btrunc<mode>2): Delete separate expander, and combine with the
> insn and add VSX instruction support. Use TARGET_<MODE>_FPR.
> (btrunc<mode>2_fpr): Likewise.
> (ceil<mode>2): Likewise.
> (ceil<mode>2_fpr): Likewise.
> (floor<mode>2): Likewise.
> (floor<mode>2_fpr): Likewise.
> (fma<mode>4_fpr): Combine SF and DF fused multiply/add support.
> Add support for using the upper registers with VSX and
> power8-vector. Move insns to be closer to the define_expands. On
> VSX systems, prefer the traditional form of FMA over the VSX
> version, since the traditional form allows the target not to
> overlap with the inputs.
> (fms<mode>4_fpr): Likewise.
> (nfma<mode>4_fpr): Likewise.
> (nfms<mode>4_fpr): Likewise.
>
> [gcc/testsuite]
> 2013-09-30 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> * gcc.target/powerpc/p8vector-fp.c: New test for floating point
> scalar operations when using -mupper-regs-sf and -mupper-regs-df.
> * gcc.target/powerpc/ppc-target-1.c: Update tests to allow either
> VSX scalar operations or the traditional floating point form of
> the instruction.
> * gcc.target/powerpc/ppc-target-2.c: Likewise.
> * gcc.target/powerpc/recip-3.c: Likewise.
> * gcc.target/powerpc/recip-5.c: Likewise.
> * gcc.target/powerpc/pr72747.c: Likewise.
> * gcc.target/powerpc/vsx-builtin-3.c: Likewise.
Okay. Good cleanups.
Thanks, David