Bug 70568 - [5/6/7 regression] PowerPC64: union of floating and fixed doesn't use POWER8 GPR/VSR moves
Summary: [5/6/7 regression] PowerPC64: union of floating and fixed doesn't use POWER8 ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.0
: P2 normal
Target Milestone: 5.5
Assignee: Michael Meissner
URL:
Keywords: missed-optimization, ra
: 78823 (view as bug list)
Depends on:
Blocks: 71977
  Show dependency treegraph
 
Reported: 2016-04-07 01:16 UTC by Anton Blanchard
Modified: 2017-01-10 17:57 UTC (History)
8 users (show)

See Also:
Host:
Target: powerpc64*
Build:
Known to work: 4.8.5
Known to fail: 4.9.4, 5.4.1, 6.2.1, 7.0
Last reconfirmed: 2016-04-07 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Blanchard 2016-04-07 01:16:24 UTC
The following test case from both glibc and openlibm:

typedef union
{
  float value;
  /* FIXME: Assumes 32 bit int.  */
  unsigned int word;
} ieee_float_shape_type;

/* Get a 32 bit int from a float.  */

#define GET_FLOAT_WORD(i,d)                                     \
do {                                                            \
  ieee_float_shape_type gf_u;                                   \
  gf_u.value = (d);                                             \
  (i) = gf_u.word;                                              \
} while (0)

int foo(float d)
{
        int i;

        GET_FLOAT_WORD(i, d);

        return i;
}

Uses the stack to do the conversion:

foo:
	stfs 1,-16(1)
	ori 2,2,0
	lwa 3,-16(1)

LLVM does what I expect:

foo:
	xscvdpspn 0, 1
	xxsldwi 0, 0, 0, 3
	mfvsrwz 3, 0
	extsw 3, 3
Comment 1 Segher Boessenkool 2016-04-07 01:56:15 UTC
Confirmed.  Still better is

	xscvdpspn 0,1
	mfvsrd 3,0
	srdi 3,3,32
Comment 2 Segher Boessenkool 2016-04-07 02:02:20 UTC
sradi, that is.
Comment 3 acsawdey 2016-04-13 18:37:45 UTC
Tracked this back to 210824, and in particular this change:

@@ -860,10 +897,15 @@
                        }
                    }

-                 /* If the alternative actually allows memory, make
-                    things a bit cheaper since we won't need an extra
-                    insn to load it.  */
-                 if (op_class != NO_REGS)
+                 if (op_class == NO_REGS)
+                   /* Although we don't need insn to reload from
+                      memory, still accessing memory is usually more
+                      expensive than a register.  */
+                   pp->mem_cost = frequency;
+                 else
+                   /* If the alternative actually allows memory, make
+                      things a bit cheaper since we won't need an
+                      extra insn to load it.  */
                    pp->mem_cost
                      = ((out_p ? ira_memory_move_cost[mode][op_class][0] : 0)
                         + (in_p ? ira_memory_move_cost[mode][op_class][1] : 0)

Without this change, you get this from IRA costs:

a0 (r157,l0) best GENERAL_REGS, allocno GENERAL_REGS
a0(r157,l0) costs: BASE_REGS:4000,4000 GENERAL_REGS:4000,4000 LINK_REGS:16000,16000 CTR_REGS:16000,16000 LINK_OR_CTR_REGS:16000,16000 SPEC_OR_GEN_REGS:16000,16000 MEM:4000,4000

and we pick the pattern for xscvdpspn. 

With the change, you get

a0 (r157,l0) best NO_REGS, allocno NO_REGS
a0(r157,l0) costs: BASE_REGS:3000,3000 GENERAL_REGS:3000,3000 LINK_REGS:15000,15000 CTR_REGS:15000,15000 LINK_OR_CTR_REGS:15000,15000 SPEC_OR_GEN_REGS:15000,15000 MEM:2000,2000

And this gets done via memory instead of the register moves.
Comment 4 Pat Haugen 2016-07-19 22:14:46 UTC
(In reply to acsawdey from comment #3)
> Tracked this back to 210824, and in particular this change:
> 
> @@ -860,10 +897,15 @@
>                         }
>                     }
> 
> -                 /* If the alternative actually allows memory, make
> -                    things a bit cheaper since we won't need an extra
> -                    insn to load it.  */
> -                 if (op_class != NO_REGS)
> +                 if (op_class == NO_REGS)
> +                   /* Although we don't need insn to reload from
> +                      memory, still accessing memory is usually more
> +                      expensive than a register.  */
> +                   pp->mem_cost = frequency;
> +                 else
> +                   /* If the alternative actually allows memory, make
> +                      things a bit cheaper since we won't need an
> +                      extra insn to load it.  */
>                     pp->mem_cost
>                       = ((out_p ? ira_memory_move_cost[mode][op_class][0] :
> 0)
>                          + (in_p ? ira_memory_move_cost[mode][op_class][1] :
> 0)

So looking into things, there are the following 2 insns using pseudo 157:

(insn 2 4 3 2 (set (reg/v:SF 157 [ d ])
        (reg:SF 33 1 [ d ])) test.c:9 466 {movsf_hardfloat}
     (expr_list:REG_DEAD (reg:SF 33 1 [ d ])
        (expr_list:REG_EQUIV (mem/c:SF (plus:DI (reg/f:DI 67 ap)
                    (const_int 48 [0x30])) [1 d+0 S4 A64])
            (nil))))
(insn 11 6 12 2 (set (reg/i:DI 3 3)
        (sign_extend:DI (subreg:SI (reg/v:SF 157 [ d ]) 0))) test.c:16 40 {extendsidi2}
     (expr_list:REG_DEAD (reg/v:SF 157 [ d ])
        (nil)))

For the alternatives of the 2 insns that accept memory, the new code is now setting a mem_cost of 1000 (frequency), whereas before it wouldn't have been modified for those alternatives. This seems to ignore the fact that the given insn alternatives will be a store and load.

Vlad,
  Should the new code somehow be taking into account memory_move_cost (i.e. memory_move_cost * frequency)?
Comment 5 Richard Biener 2016-08-03 11:57:06 UTC
GCC 4.9 branch is being closed
Comment 6 Michael Meissner 2016-12-20 19:16:55 UTC
*** Bug 78823 has been marked as a duplicate of this bug. ***
Comment 7 Michael Meissner 2016-12-20 19:20:07 UTC
I noticed this bug and created PR 78823.  Since that's the same bug as this, I'll take it.
Comment 8 Michael Meissner 2017-01-05 00:44:25 UTC
Author: meissner
Date: Thu Jan  5 00:43:53 2017
New Revision: 244084

URL: https://gcc.gnu.org/viewcvs?rev=244084&root=gcc&view=rev
Log:
[gcc]
2017-01-04  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71977
	PR target/70568
	PR target/78823
	* config/rs6000/predicates.md (sf_subreg_operand): New predicate.
	(altivec_register_operand): Do not return true if the operand
	contains a SUBREG mixing SImode and SFmode.
	(vsx_register_operand): Likewise.
	(vsx_reg_sfsubreg_ok): New predicate.
	(vfloat_operand): Do not return true if the operand contains a
	SUBREG mixing SImode and SFmode.
	(vint_operand): Likewise.
	(vlogical_operand): Likewise.
	(gpc_reg_operand): Likewise.
	(int_reg_operand): Likewise.
	* config/rs6000/rs6000-protos.h (valid_sf_si_move): Add
	declaration.
	* config/rs6000/rs6000.c (valid_sf_si_move): New function to
	determine if a MOVSI or MOVSF operation contains SUBREGs that mix
	SImode and SFmode.
	(rs6000_emit_move_si_sf_subreg): New helper function.
	(rs6000_emit_move): Call rs6000_emit_move_si_sf_subreg to possbily
	fixup SUBREGs involving SImode and SFmode.
	* config/rs6000/vsx.md (SFBOOL_*): New constants that are operand
	numbers for the new peephole2 optimization.
	(peephole2 for SFmode unions): New peephole2 to optimize cases in
	the GLIBC math library that do AND/IOR/XOR operations on single
	precision floating point.
	* config/rs6000/rs6000.h (TARGET_NO_SF_SUBREG): New internal
	target macros to say whether we need to avoid SUBREGs mixing
	SImode and SFmode.
	(TARGET_ALLOW_SF_SUBREG): Likewise.
	* config/rs6000/rs6000.md (UNSPEC_SF_FROM_SI): New unspecs.
	(UNSPEC_SI_FROM_SF): Likewise.
	(iorxor): Change spacing.
	(and_ior_xor): New iterator for AND, IOR, and XOR.
	(movsi_from_sf): New insns for SImode/SFmode SUBREG support.
	(movdi_from_sf_zero_ext): Likewise.
	(mov<mode>_hardfloat, FMOVE32 iterator): Use register_operand
	instead of gpc_reg_operand.  Add SImode/SFmode SUBREG support.
	(movsf_from_si): New insn for SImode/SFmode SUBREG support.
	(fma<mode>4): Use gpc_reg_operand instead of register_operand.
	(fms<mode>4): Likewise.
	(fnma<mode>4): Likewise.
	(fnms<mode>4): Likewise.
	(nfma<mode>4): Likewise.
	(nfms<mode>4): Likewise.

[gcc/testsuite]
2017-01-04  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71977
	PR target/70568
	PR target/78823
	* gcc.target/powerpc/pr71977-1.c: New tests to check whether on
	64-bit VSX systems with direct move, whether we optimize common
	code sequences in the GLIBC math library for float math functions.
	* gcc.target/powerpc/pr71977-2.c: Likewise.


Added:
    trunk/gcc/testsuite/gcc.target/powerpc/pr71977-1.c
    trunk/gcc/testsuite/gcc.target/powerpc/pr71977-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/predicates.md
    trunk/gcc/config/rs6000/rs6000-protos.h
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/config/rs6000/rs6000.h
    trunk/gcc/config/rs6000/rs6000.md
    trunk/gcc/config/rs6000/vsx.md
    trunk/gcc/testsuite/ChangeLog
Comment 9 Michael Meissner 2017-01-10 17:44:49 UTC
Author: meissner
Date: Tue Jan 10 17:44:17 2017
New Revision: 244279

URL: https://gcc.gnu.org/viewcvs?rev=244279&root=gcc&view=rev
Log:
[gcc]
2017-01-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	Back port from trunk
	2017-01-04  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71977
	PR target/70568
	PR target/78823
	* config/rs6000/predicates.md (sf_subreg_operand): New predicate.
	(altivec_register_operand): Do not return true if the operand
	contains a SUBREG mixing SImode and SFmode.
	(vsx_register_operand): Likewise.
	(vsx_reg_sfsubreg_ok): New predicate.
	(vfloat_operand): Do not return true if the operand contains a
	SUBREG mixing SImode and SFmode.
	(vint_operand): Likewise.
	(vlogical_operand): Likewise.
	(gpc_reg_operand): Likewise.
	(int_reg_operand): Likewise.
	* config/rs6000/rs6000-protos.h (valid_sf_si_move): Add declaration.
	* config/rs6000/rs6000.c (valid_sf_si_move): New function to
	determine if a MOVSI or MOVSF operation contains SUBREGs that mix
	SImode and SFmode.
	(rs6000_emit_move_si_sf_subreg): New helper function.
	(rs6000_emit_move): Call rs6000_emit_move_si_sf_subreg to possbily
	fixup SUBREGs involving SImode and SFmode.
	* config/rs6000/vsx.md (SFBOOL_*): New constants that are operand
	numbers for the new peephole2 optimization.
	(peephole2 for SFmode unions): New peephole2 to optimize cases in
	the GLIBC math library that do AND/IOR/XOR operations on single
	precision floating point.
	* config/rs6000/rs6000.h (TARGET_NO_SF_SUBREG): New internal
	target macros to say whether we need to avoid SUBREGs mixing
	SImode and SFmode.
	(TARGET_ALLOW_SF_SUBREG): Likewise.
	* config/rs6000/rs6000.md (UNSPEC_SF_FROM_SI): New unspecs.
	(UNSPEC_SI_FROM_SF): Likewise.
	(iorxor): Change spacing.
	(and_ior_xor): New iterator for AND, IOR, and XOR.
	(movsi_from_sf): New insns for SImode/SFmode SUBREG support.
	(movdi_from_sf_zero_ext): Likewise.
	(mov<mode>_hardfloat, FMOVE32 iterator): Use register_operand
	instead of gpc_reg_operand.  Add SImode/SFmode SUBREG support.
	(movsf_from_si): New insn for SImode/SFmode SUBREG support.
	(fma<mode>4): Use gpc_reg_operand instead of register_operand.
	(fms<mode>4): Likewise.
	(fnma<mode>4): Likewise.
	(fnms<mode>4): Likewise.
	(nfma<mode>4): Likewise.
	(nfms<mode>4): Likewise.

	* config/rs6000/rs6000.h (TARGET_DIRECT_MOVE_64BIT): Define macro
	used by the above patch in a limited form.
	* config/rs6000/vsx.md (peephole2 for SFmode unions): Use DFmode
	for doing direct moves instead of DImode, since GCC 6.x does not
	support DImode in Altivec registers.

[gcc/testsuite]
2017-01-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	Back port from trunk
	2017-01-04  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71977
	PR target/70568
	PR target/78823
	* gcc.target/powerpc/pr71977-1.c: New tests to check whether on
	64-bit VSX systems with direct move, whether we optimize common
	code sequences in the GLIBC math library for float math functions.
	* gcc.target/powerpc/pr71977-2.c: Likewise.


Added:
    branches/ibm/gcc-6-branch/gcc/testsuite/gcc.target/powerpc/pr71977-1.c
      - copied unchanged from r244232, trunk/gcc/testsuite/gcc.target/powerpc/pr71977-1.c
    branches/ibm/gcc-6-branch/gcc/testsuite/gcc.target/powerpc/pr71977-2.c
      - copied unchanged from r244232, trunk/gcc/testsuite/gcc.target/powerpc/pr71977-2.c
Modified:
    branches/ibm/gcc-6-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-6-branch/gcc/config/rs6000/predicates.md
    branches/ibm/gcc-6-branch/gcc/config/rs6000/rs6000-protos.h
    branches/ibm/gcc-6-branch/gcc/config/rs6000/rs6000.c
    branches/ibm/gcc-6-branch/gcc/config/rs6000/rs6000.h
    branches/ibm/gcc-6-branch/gcc/config/rs6000/rs6000.md
    branches/ibm/gcc-6-branch/gcc/config/rs6000/vsx.md
    branches/ibm/gcc-6-branch/gcc/testsuite/ChangeLog.ibm
Comment 10 Michael Meissner 2017-01-10 17:57:18 UTC
Fixed in trunk (GCC 7) and the IBM Advance Toolchain branch for GCC 6.  See PR 71977 for more details.