This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] rs6000: Remove WORD_REGISTER_OPERATIONS
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: gcc-patches at gcc dot gnu dot org
- Cc: dje dot gcc at gmail dot com, Segher Boessenkool <segher at kernel dot crashing dot org>
- Date: Thu, 18 Jun 2015 10:08:42 -0700
- Subject: [PATCH] rs6000: Remove WORD_REGISTER_OPERATIONS
- Authentication-results: sourceware.org; auth=none
The macro WORD_REGISTER_OPERATIONS, if defined, means that all reg-reg
operations on data smaller than words are performed on the full word.
For TARGET_POWERPC64 words are 64 bits; but many operations on SImode
do not behave as if on DImode. So rs6000 should not define the macro.
Bootstrappped and tested as usual (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra),
no regressions. Is this okay for mainline?
-
I did some analysis on the code differences this causes.
- For both 32-bit and 64-bit, combine can combine more AND instructions,
including to a whole bunch of dot forms. This is mostly because combine
thinks it should "simplify" to a smaller mode (because it has more info
about zero bits), but we have no compare instructions in smaller modes.
- Range checks (x >= a && x <= b) are problematic. They are folded (in
the frontend already) to the usual x-a u<= b-a affair, but often in
less than 32 bits. This survives in that form throughout the middle end,
and then expand makes it a minus, a zero_extend from the smaller mode to
SImode, and a compare as SI. Without WORD_REGISTER_OPERATIONS combine
can never get rid of the zero_extend (and with it, only sometimes). Had
it been a zero_extend, minus, compare in that order (with slightly
modified constants to adjust for the wider mode), the zero_extend can
more often be removed. This happens on almost all targets.
- For 64-bit, many 64-bit loads are changed to 32-bit loads. This is
fine in most places; the one case that looks nasty is where it spills
a 64-bit reg to stack and immediately loads it back as 32-bit (with
an ori 2,2,0 in between, thankfully). Only reload does this; LRA makes
better code (with a clrldi), not worse than with W_R_O defined.
In all, you get about 1 in 20000 extra insns (and a bit more for the
compiler itself, it does a *lot* of range checks). Following patches
to improve the rotate insns more than make up for it (and get better
results than with the macro defined even :-) )
Segher
2015-06-18 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.h (WORD_REGISTER_OPERATIONS): Delete.
---
gcc/config/rs6000/rs6000.h | 4 ----
1 file changed, 4 deletions(-)
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1b1145f..ef8ff38 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2039,10 +2039,6 @@ do { \
is undesirable. */
#define SLOW_BYTE_ACCESS 1
-/* Define if operations between registers always perform the operation
- on the full register even if a narrower mode is specified. */
-#define WORD_REGISTER_OPERATIONS
-
/* Define if loading in MODE, an integral mode narrower than BITS_PER_WORD
will either zero-extend or sign-extend. The value of this macro should
be the code that says which one of the two operations is implicitly
--
1.8.1.4