This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Check TRULY_NOOP_TRUNCATION in make_extraction and force_to_mode
- From: Adam Nemet <anemet at caviumnetworks dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Tue, 10 Jan 2006 13:47:40 -0800
- Subject: [PATCH] Check TRULY_NOOP_TRUNCATION in make_extraction and force_to_mode
Hi,
As I mentioned in the thread generated by apply_distributive_law patch
(http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00038.html), combine.c
has futher missing TRULY_NOOP_TRUNCATION checks.
In the first test below (20060110-1.c) on MIPS64, where truncation
from DI to SI needs an explicit operation, make_extract does:
(ashiftrt:DI (ashift:DI (reg:DI 4) 32) 32)
-> (sign_extend:DI (reg:SI 4))
Sign-extension from SI to DI is a nop on MIPS64 so at the end the
function compiles into a move $2,$4.
Similar thing is happening in the second test (20060110-2.c) but the
culprit is force_to_mode. It does this:
(plus:DI (reg:DI 4) (reg:DI 5))
-> (plus:SI (reg:SI 4) (reg:SI 5))
There are two things however that complicate this patch. One is that
the interface of force_to_mode is such that it always converts the
input to the requested mode. So I have to use a truncate instead of a
subreg if subreging the requested mode is not safe.
The other thing is that putting all the correctness bits in place has
significant negative impact on the generated code. On
gcc.c-torture/execute, using eabi and with only the correctness
changes the difference is:
189 files changed, 5294 insertions(+), 4886 deletions(-)
The problem lies in how combine deals with compound operations
(extensions and extracts). While combining related instructions it
expands compound operations (expand_compound_operation) into simpler
operations hoping that it can merge them. If merging is not possible
it counts on being able to recover the original compound operations in
make_compound_operation.
If expand_compound_operation created a paradoxical subreg and we are
either dealing with hard-regs ((reg:SI 4) -> (reg:DI 4)) or with
already subreg'ed expressions ((subreg:SI (reg:DI)) -> (reg:DI)) then
we lose the information that the original value is already truncated
and we will fail to recover the original compound operation.
To compensate for this, the optimization part of my patch extends the
simple local data-flow analysis in combine to track truncated values
and uses the new information to convert registers more freely when
trying to recover compound operations. This is actually very similar
to the promoted-subreg analysis.
With this change I get most of the loss back:
21 files changed, 972 insertions(+), 935 deletions(-)
The remaining degradation has to with zero_extend. In the testcase:
unsigned long long
f (unsigned long long l)
{
return l & 0xffffffff;
}
we used to rely on the combiner to synthesize (zero_extend:DI (reg:SI
4)) here. This no longer happens after the patch. The reason this
used to work was that zero_extend does not require proper SI values.
I think we should formulate the same thing in the backend by adding a
(and:DI (match_operand) (const_int 0xffffffff)) pattern and emit the
same code as for the standard zero-extend. I will submit a
MIPS-specific patch for this.
Boostrapped and tested on x86_64-linux. Tested on mipsis64-elf and
mips64-elf. In addition to the new tests the patch fixes the
following tests on mipsisa64-elf:
gcc.c-torture/execute/20020529-1.c execution, -O3 -fomit-frame-pointer -funroll-loops
gcc.c-torture/execute/20020529-1.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
We used to get an UNPREDICTABLE fault from the simulator because the
operands of subu (32-bit substraction) were not sign-extended. Now we
generate dsubu.
Adam
* combine.c (struct reg_stat): Add new fields truncation_label and
truncated_to_mode.
(record_value_for_reg): Reset truncated_to_mode.
(record_truncated_value): New function.
(check_promoted_subreg): Call it. Rename to check_conversions.
(combine_instructions): Rename check_promoted_subreg to
check_conversions.
(reg_truncated_to_mode): New function.
(make_extraction): Use it. Check TRULY_NOOP_TRUNCATION.
(gen_lowpart_or_truncate): New function.
(force_to_mode): Use it instead of gen_lowpart.
* gcc.c-torture/execute/20060110-1.c: New test.
* gcc.c-torture/execute/20060110-2.c: New test.
Index: combine.c
===================================================================
*** combine.c (revision 109261)
--- combine.c (working copy)
*************** struct reg_stat {
*** 220,226 ****
unsigned HOST_WIDE_INT last_set_nonzero_bits;
char last_set_sign_bit_copies;
! ENUM_BITFIELD(machine_mode) last_set_mode : 8;
/* Set nonzero if references to register n in expressions should not be
used. last_set_invalid is set nonzero when this register is being
--- 220,226 ----
unsigned HOST_WIDE_INT last_set_nonzero_bits;
char last_set_sign_bit_copies;
! ENUM_BITFIELD(machine_mode) last_set_mode : 8;
/* Set nonzero if references to register n in expressions should not be
used. last_set_invalid is set nonzero when this register is being
*************** struct reg_stat {
*** 243,248 ****
--- 243,261 ----
unsigned char sign_bit_copies;
unsigned HOST_WIDE_INT nonzero_bits;
+
+ /* Record the value of the label_tick when the last truncation
+ happened. The field truncated_to_mode is only valid if
+ truncation_label == label_tick. */
+
+ int truncation_label;
+
+ /* Record the last truncation seen for this register. If truncation
+ is not a nop to this mode we might be able to save an explicit
+ truncation if we know that value already contains a truncated
+ value. */
+
+ ENUM_BITFIELD(machine_mode) truncated_to_mode : 8;
};
static struct reg_stat *reg_stat;
*************** static rtx gen_lowpart_for_combine (enum
*** 408,414 ****
static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
static void update_table_tick (rtx);
static void record_value_for_reg (rtx, rtx, rtx);
! static void check_promoted_subreg (rtx, rtx);
static void record_dead_and_set_regs_1 (rtx, rtx, void *);
static void record_dead_and_set_regs (rtx);
static int get_last_value_validate (rtx *, rtx, int, int);
--- 421,427 ----
static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
static void update_table_tick (rtx);
static void record_value_for_reg (rtx, rtx, rtx);
! static void check_conversions (rtx, rtx);
static void record_dead_and_set_regs_1 (rtx, rtx, void *);
static void record_dead_and_set_regs (rtx);
static int get_last_value_validate (rtx *, rtx, int, int);
*************** static int insn_cuid (rtx);
*** 425,430 ****
--- 438,446 ----
static void record_promoted_value (rtx, rtx);
static int unmentioned_reg_p_1 (rtx *, void *);
static bool unmentioned_reg_p (rtx, rtx);
+ static void record_truncated_value (rtx);
+ static bool reg_truncated_to_mode (enum machine_mode, rtx);
+ static rtx gen_lowpart_or_truncate (enum machine_mode, rtx);
/* It is not safe to use ordinary gen_lowpart in combine.
*************** combine_instructions (rtx f, unsigned in
*** 762,768 ****
{
/* See if we know about function return values before this
insn based upon SUBREG flags. */
! check_promoted_subreg (insn, PATTERN (insn));
/* Try this insn with each insn it links back to. */
--- 778,784 ----
{
/* See if we know about function return values before this
insn based upon SUBREG flags. */
! check_conversions (insn, PATTERN (insn));
/* Try this insn with each insn it links back to. */
*************** make_extraction (enum machine_mode mode,
*** 5838,5843 ****
--- 5854,5864 ----
&& ! (spans_byte && inner_mode != tmode)
&& ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0
&& !MEM_P (inner)
+ && (inner_mode == tmode
+ || !REG_P (inner)
+ || TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (tmode),
+ GET_MODE_BITSIZE (inner_mode))
+ || reg_truncated_to_mode (tmode, inner))
&& (! in_dest
|| (REG_P (inner)
&& have_insn_for (STRICT_LOW_PART, tmode))))
*************** canon_reg_for_combine (rtx x, rtx reg)
*** 6607,6612 ****
--- 6628,6649 ----
return x;
}
+ /* Return X converted to MODE. If the value is already truncated to
+ MODE we can just return a subreg even though in the general case we
+ would need an explicit truncation. */
+
+ static rtx
+ gen_lowpart_or_truncate (enum machine_mode mode, rtx x)
+ {
+ if (GET_MODE_SIZE (GET_MODE (x)) <= GET_MODE_SIZE (mode)
+ || TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode),
+ GET_MODE_BITSIZE (GET_MODE (x)))
+ || (REG_P (x) && reg_truncated_to_mode (mode, x)))
+ return gen_lowpart (mode, x);
+ else
+ return gen_rtx_TRUNCATE (mode, x);
+ }
+
/* See if X can be simplified knowing that we will only refer to it in
MODE and will only refer to those bits that are nonzero in MASK.
If other bits are being computed or if masking operations are done
*************** force_to_mode (rtx x, enum machine_mode
*** 6872,6882 ****
/* For most binary operations, just propagate into the operation and
change the mode if we have an operation of that mode. */
! op0 = gen_lowpart (op_mode,
! force_to_mode (XEXP (x, 0), mode, mask,
! next_select));
! op1 = gen_lowpart (op_mode,
! force_to_mode (XEXP (x, 1), mode, mask,
next_select));
if (op_mode != GET_MODE (x) || op0 != XEXP (x, 0) || op1 != XEXP (x, 1))
--- 6909,6919 ----
/* For most binary operations, just propagate into the operation and
change the mode if we have an operation of that mode. */
! op0 = gen_lowpart_or_truncate (op_mode,
! force_to_mode (XEXP (x, 0), mode, mask,
! next_select));
! op1 = gen_lowpart_or_truncate (op_mode,
! force_to_mode (XEXP (x, 1), mode, mask,
next_select));
if (op_mode != GET_MODE (x) || op0 != XEXP (x, 0) || op1 != XEXP (x, 1))
*************** force_to_mode (rtx x, enum machine_mode
*** 6909,6917 ****
else
mask = fuller_mask;
! op0 = gen_lowpart (op_mode,
! force_to_mode (XEXP (x, 0), op_mode,
! mask, next_select));
if (op_mode != GET_MODE (x) || op0 != XEXP (x, 0))
x = simplify_gen_binary (code, op_mode, op0, XEXP (x, 1));
--- 6946,6954 ----
else
mask = fuller_mask;
! op0 = gen_lowpart_or_truncate (op_mode,
! force_to_mode (XEXP (x, 0), op_mode,
! mask, next_select));
if (op_mode != GET_MODE (x) || op0 != XEXP (x, 0))
x = simplify_gen_binary (code, op_mode, op0, XEXP (x, 1));
*************** force_to_mode (rtx x, enum machine_mode
*** 7115,7123 ****
mask = fuller_mask;
unop:
! op0 = gen_lowpart (op_mode,
! force_to_mode (XEXP (x, 0), mode, mask,
! next_select));
if (op_mode != GET_MODE (x) || op0 != XEXP (x, 0))
x = simplify_gen_unary (code, op_mode, op0, op_mode);
break;
--- 7152,7160 ----
mask = fuller_mask;
unop:
! op0 = gen_lowpart_or_truncate (op_mode,
! force_to_mode (XEXP (x, 0), mode, mask,
! next_select));
if (op_mode != GET_MODE (x) || op0 != XEXP (x, 0))
x = simplify_gen_unary (code, op_mode, op0, op_mode);
break;
*************** force_to_mode (rtx x, enum machine_mode
*** 7140,7150 ****
written in a narrower mode. We play it safe and do not do so. */
SUBST (XEXP (x, 1),
! gen_lowpart (GET_MODE (x), force_to_mode (XEXP (x, 1), mode,
! mask, next_select)));
SUBST (XEXP (x, 2),
! gen_lowpart (GET_MODE (x), force_to_mode (XEXP (x, 2), mode,
! mask, next_select)));
break;
default:
--- 7177,7189 ----
written in a narrower mode. We play it safe and do not do so. */
SUBST (XEXP (x, 1),
! gen_lowpart_or_truncate (GET_MODE (x),
! force_to_mode (XEXP (x, 1), mode,
! mask, next_select)));
SUBST (XEXP (x, 2),
! gen_lowpart_or_truncate (GET_MODE (x),
! force_to_mode (XEXP (x, 2), mode,
! mask, next_select)));
break;
default:
*************** force_to_mode (rtx x, enum machine_mode
*** 7152,7158 ****
}
/* Ensure we return a value of the proper mode. */
! return gen_lowpart (mode, x);
}
/* Return nonzero if X is an expression that has one of two values depending on
--- 7191,7197 ----
}
/* Ensure we return a value of the proper mode. */
! return gen_lowpart_or_truncate (mode, x);
}
/* Return nonzero if X is an expression that has one of two values depending on
*************** record_value_for_reg (rtx reg, rtx insn,
*** 10720,10725 ****
--- 10759,10765 ----
reg_stat[i].last_set_nonzero_bits = 0;
reg_stat[i].last_set_sign_bit_copies = 0;
reg_stat[i].last_death = 0;
+ reg_stat[i].truncated_to_mode = 0;
}
/* Mark registers that are being referenced in this value. */
*************** record_dead_and_set_regs (rtx insn)
*** 10853,10858 ****
--- 10893,10899 ----
reg_stat[i].last_set_nonzero_bits = 0;
reg_stat[i].last_set_sign_bit_copies = 0;
reg_stat[i].last_death = 0;
+ reg_stat[i].truncated_to_mode = 0;
}
last_call_cuid = mem_last_set = INSN_CUID (insn);
*************** record_promoted_value (rtx insn, rtx sub
*** 10916,10930 ****
}
}
! /* Scan X for promoted SUBREGs. For each one found,
! note what it implies to the registers used in it. */
static void
! check_promoted_subreg (rtx insn, rtx x)
{
! if (GET_CODE (x) == SUBREG && SUBREG_PROMOTED_VAR_P (x)
! && REG_P (SUBREG_REG (x)))
! record_promoted_value (insn, x);
else
{
const char *format = GET_RTX_FORMAT (GET_CODE (x));
--- 10957,11036 ----
}
}
! /* Check if X, a register, is known to contain a value already
! truncated to MODE. In this case we can use a subreg to refer to
! the truncated value even though in the generic case we would need
! an explicit truncation. */
!
! static bool
! reg_truncated_to_mode (enum machine_mode mode, rtx x)
! {
! enum machine_mode truncated = reg_stat[REGNO (x)].truncated_to_mode;
!
! if (truncated == 0 || reg_stat[REGNO (x)].truncation_label != label_tick)
! return false;
! if (GET_MODE_SIZE (truncated) <= GET_MODE_SIZE (mode))
! return true;
! if (TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode),
! GET_MODE_BITSIZE (truncated)))
! return true;
! return false;
! }
!
! /* X is a REG or a SUBREG. If X is some sort of a truncation record
! it. For non-TRULY_NOOP_TRUNCATION targets we might be able to turn
! a truncate into a subreg using this information. */
static void
! record_truncated_value (rtx x)
{
! enum machine_mode truncated_mode;
!
! if (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x)))
! {
! enum machine_mode original_mode = GET_MODE (SUBREG_REG (x));
! truncated_mode = GET_MODE (x);
!
! if (GET_MODE_SIZE (original_mode) <= GET_MODE_SIZE (truncated_mode))
! return;
!
! if (TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (truncated_mode),
! GET_MODE_BITSIZE (original_mode)))
! return;
!
! x = SUBREG_REG (x);
! }
! /* ??? For hard-regs we now record everthing. We might be able to
! optimize this using last_set_mode. */
! else if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
! truncated_mode = GET_MODE (x);
! else
! return;
!
! if (reg_stat[REGNO (x)].truncated_to_mode == 0
! || reg_stat[REGNO (x)].truncation_label < label_tick
! || (GET_MODE_SIZE (truncated_mode)
! < GET_MODE_SIZE (reg_stat[REGNO (x)].truncated_to_mode)))
! {
! reg_stat[REGNO (x)].truncated_to_mode = truncated_mode;
! reg_stat[REGNO (x)].truncation_label = label_tick;
! }
! }
!
! /* Scan X for promoted SUBREGs and truncated REGs. For each one
! found, note what it implies to the registers used in it. */
!
! static void
! check_conversions (rtx insn, rtx x)
! {
! if (GET_CODE (x) == SUBREG || REG_P (x))
! {
! if (GET_CODE (x) == SUBREG && SUBREG_PROMOTED_VAR_P (x)
! && REG_P (SUBREG_REG (x)))
! record_promoted_value (insn, x);
!
! record_truncated_value (x);
! }
else
{
const char *format = GET_RTX_FORMAT (GET_CODE (x));
*************** check_promoted_subreg (rtx insn, rtx x)
*** 10934,10946 ****
switch (format[i])
{
case 'e':
! check_promoted_subreg (insn, XEXP (x, i));
break;
case 'V':
case 'E':
if (XVEC (x, i) != 0)
for (j = 0; j < XVECLEN (x, i); j++)
! check_promoted_subreg (insn, XVECEXP (x, i, j));
break;
}
}
--- 11040,11052 ----
switch (format[i])
{
case 'e':
! check_conversions (insn, XEXP (x, i));
break;
case 'V':
case 'E':
if (XVEC (x, i) != 0)
for (j = 0; j < XVECLEN (x, i); j++)
! check_conversions (insn, XVECEXP (x, i, j));
break;
}
}
Index: testsuite/gcc.c-torture/execute/20060110-1.c
===================================================================
*** testsuite/gcc.c-torture/execute/20060110-1.c (revision 0)
--- testsuite/gcc.c-torture/execute/20060110-1.c (revision 0)
***************
*** 0 ****
--- 1,16 ----
+ extern void abort (void);
+
+ long long
+ f (long long a)
+ {
+ return (a << 32) >> 32;
+ }
+ long long a = 0x1234567876543210LL;
+ long long b = (0x1234567876543210LL << 32) >> 32;
+ int
+ main ()
+ {
+ if (f (a) != b)
+ abort ();
+ return 0;
+ }
Index: testsuite/gcc.c-torture/execute/20060110-2.c
===================================================================
*** testsuite/gcc.c-torture/execute/20060110-2.c (revision 0)
--- testsuite/gcc.c-torture/execute/20060110-2.c (revision 0)
***************
*** 0 ****
--- 1,19 ----
+ extern void abort (void);
+
+ long long
+ f (long long a, long long b)
+ {
+ return ((a + b) << 32) >> 32;
+ }
+
+ long long a = 0x1234567876543210LL;
+ long long b = 0x2345678765432101LL;
+ long long c = ((0x1234567876543210LL + 0x2345678765432101LL) << 32) >> 32;
+
+ int
+ main ()
+ {
+ if (f (a, b) != c)
+ abort ();
+ return 0;
+ }