This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, 1 of 4 or 5], Enhance PowerPC vec_extract support for power8/power9 machines

These patches enhance the vec_extract built-in on modern PowerPC server
systems.  Currently, vec_extract is optimized for constant element numbers for
vector double/vector long on any VSX system, and constant element numbers for
vector char/vector short/vector int on ISA 3.0 (power9) systems.

If the vec_extract is not handled, the compiler currently stores the vector
into memory, and then indexes the element via normal memory addressing.  This
creates a store-load hit.

This patch and the successive patches will enable better code generation of
vec_extract on 64-bit systems with direct move (power8) and above.

This particular patch changes the infrastructure so that in the next patch, I
can add support for extracting a variable element of vector double or vector
long.  This particular patch is just infrastructure, and does not change the
code generation.

In addition, I discovered a previous change for ISA 3.0 extraction spelled an
instruction wrong, and it is fixed here.  It turns out that I messed up the
constraints, so the register allocator would never generate this instruction.
This patch just uses the correct name, but it won't be until the next patch
that the constraints will be fixed so it can be generated.

I have tested this patch and there are no regressions.  Can I apply this to the
trunk?  These sets of patches depend on the DImode in Altivec registers patches
that have not been back ported to GCC 6.2, so it is for trunk only.

The next patch will enhance vec_extract to to allow the element number to be
variable for vec_extract of vector double/vector long on 64-bit systems with
direct move using the VSLO instruction.

The third patch will enhance vec_extract to better optimize extracting elements
if the vector is in memory.  Right now, the code only optimizes extracting
element 0.  The new patch will allow both constant and variable element

The fourth patch will enhance vec_extract for the other vector types.  I might
split it up into two patches, one for vector float, and the other for vector
char, vector short, and vector long.

I built spec 2006 with all of the patches, and there were some benchmarks that
generated a few changes, and a few benchmarks that generated a lot (gamess had
over 500 places that were optimized).  I ran a comparison between the old
compiler and one with the patches installed on several of the benchmarks that
showed the most changes.  I did not see any performance changes on the
benchmarks that I ran.  I believe this is because vec_extract is typically
generated at the end of vector reductions, and it does not account for much
time in the whole benchmark.  User written code that uses vec_extract would
hopefully see speed improvements.

2016-07-27  Michael Meissner  <>

	* config/rs6000/ (vec_extract<mode>): Change the calling
	signature of rs6000_expand_vector_extract so that the element
	number is a RTX instead of a constant integer.
	* config/rs6000/rs6000-protos.h (rs6000_expand_vector_extract):
	* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Likewise.
	(altivec_expand_vec_ext_builtin): Likewise.
	* config/rs6000/ (reduc_plus_scal_<mode>): Likewise.
	* config/rs6000/ (vsx_extract_<mode>): Fix spelling of the
	MFVSRLD instruction.

Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email:, phone: +1 (978) 899-4797

Attachment: gcc-stage7.extract004b
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]