This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH], PR target/80718, Improve PowerPC splat double word

From: Michael Meissner <meissner at linux dot vnet dot ibm dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Segher Boessenkool <segher at kernel dot crashing dot org>, David Edelsohn <dje dot gcc at gmail dot com>
Date: Mon, 22 May 2017 14:32:44 -0400
Subject: [PATCH], PR target/80718, Improve PowerPC splat double word
Authentication-results: sourceware.org; auth=none

When I was comparing spec 2006 numbers between GCC 6.3 and 7.1, there was one
benchmark that was noticeably slower (milc).  In looking at the code generated,
the #1 hot function (mult_adj_su3_mat_vec) had some cases where automatic
vectorization generated splat of double from memory.

The register allocator did not use the load with splat instruction (LXVDSX)
because all of the loads were register+offset.  For the scalar values that it
could load into the FPR registers, it used the normal register+offset load
(d-form).  For the other scalar values that would wind up in the traditional
Altivec registers, the register allocator decided to load up the value into a
GPR register and do a direct move.

Now, it turns out that while the above code is inefficient, it is not a cause
for slow down of the milc benchmark.  However there might be other places where
using a load, direct move, and double word permute are causing a performance
problem, so I made this patch.

The patch splits the splat into a register splat and a memory splat.  This
forces the register allocator to convert the load to the indexed form which the
LXVDSX instruction uses.  I did a spec 2006 run with these changes, and there
were no significant performance differences with this patch.

In the mult_adj_su3_mat_vec function, there were previously 5 GPR loads, direct
move, and permute sequences along with one LXVDSK.  With this patch, those GPR
loads have been replaced with LXVDSKs.

Can I apply this patch to the trunk, and later apply it to the GCC 7 and 6
branches?

[gcc]
2017-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/80718
	* config/rs6000/vsx.md (vsx_splat_<mode>, VSX_D iterator): Split
	V2DF/V2DI splat into two separate patterns, one that handles
	registers, and the other that only handles memory.  Drop support
	for splatting from a GPR on ISA 2.07 and then splitting the
	splat into direct move and splat.
	(vsx_splat_<mode>_reg): Likewise.
	(vsx_splat_<mode>_mem): Likewise.

[gcc/testsuite]
2017-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/80718
	* gcc.target/powerpc/pr80718.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Attachment: pr80718.patch03b
Description: Text document

Follow-Ups:
- Re: [PATCH], PR target/80718, Improve PowerPC splat double word
  - From: Segher Boessenkool
- Re: [PATCH], PR target/80718, Improve PowerPC splat double word
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]