Bug 84071

Summary: [7/8 regression] wrong elimination of zero-extension after sign-extended load
Product: gcc Reporter: Wilco <wilco>
Component: rtl-optimizationAssignee: Eric Botcazou <ebotcazou>
Status: RESOLVED FIXED    
Severity: major CC: mpf, wdijkstr
Priority: P2 Keywords: wrong-code
Version: 7.0   
Target Milestone: 7.4   
Host: Target: arm-none-eabi
Build: Known to work:
Known to fail: Last reconfirmed: 2018-01-28 00:00:00

Description Wilco 2018-01-26 19:02:28 UTC
PR59461 changed nonzero_bits1 incorrectly for subregs:

          /* On many CISC machines, accessing an object in a wider mode
             causes the high-order bits to become undefined.  So they are
             not known to be zero.  */
          rtx_code extend_op;
          if ((!WORD_REGISTER_OPERATIONS
               /* If this is a typical RISC machine, we only have to worry
                  about the way loads are extended.  */
               || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
                   ? val_signbit_known_set_p (inner_mode, nonzero)
                   : extend_op != ZERO_EXTEND)
               || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
              && xmode_width > inner_width)
            nonzero
              |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK (inner_mode));

If WORD_REGISTER_OPERATIONS is set and load_extend_op is ZERO_EXTEND, rtl like

(subreg:SI (reg:HI 125) 0)

is assumed to be always zero-extended. This is incorrect since modes that are smaller than WORD_MODE may contain random top bits. This is equally true for RISC and CISC ISAs and independent of WORD_REGISTER_OPERATIONS, so it's unclear why the !REG_P check was added.

On ARM this causes the following bug:

arm-none-eabi-gcc -march=armv7-a -marm -O2 -S -o- -mbig-endian

typedef union 
{
  signed short ss;
  unsigned short us;
  int x;
} U;

int f(int x, int y, int z, int a, U u)
{
  return (u.ss <= 0) + u.us;
}

	ldrsh	r3, [sp]
	sxth	r0, r3
	cmp	r0, #0  // correctly uses sign-extended value
	movgt	r0, r3  // wrong - must be zero-extended!!!
	addle	r0, r3, #1
	bx	lr
Comment 1 Andrew Pinski 2018-01-26 23:07:11 UTC
I think this is related to PR 81443.  Can you see if the 7.3 release has it fixed?
Comment 2 Wilco 2018-01-27 17:40:45 UTC
(In reply to Andrew Pinski from comment #1)
> I think this is related to PR 81443.  Can you see if the 7.3 release has it
> fixed?

It's related indeed, in both cases it incorrectly uses load_extend_op on registers rather than MEM. The first primarily fails on targets that return ZERO_EXTEND, the 2nd on targets that use SIGN_EXTEND.
Comment 3 Eric Botcazou 2018-01-28 22:37:42 UTC
> PR59461 changed nonzero_bits1 incorrectly for subregs:
> 
>           /* On many CISC machines, accessing an object in a wider mode
>              causes the high-order bits to become undefined.  So they are
>              not known to be zero.  */
>           rtx_code extend_op;
>           if ((!WORD_REGISTER_OPERATIONS
>                /* If this is a typical RISC machine, we only have to worry
>                   about the way loads are extended.  */
>                || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
>                    ? val_signbit_known_set_p (inner_mode, nonzero)
>                    : extend_op != ZERO_EXTEND)
>                || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
>               && xmode_width > inner_width)
>             nonzero
>               |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK
> (inner_mode));
> 
> If WORD_REGISTER_OPERATIONS is set and load_extend_op is ZERO_EXTEND, rtl
> like
> 
> (subreg:SI (reg:HI 125) 0)
> 
> is assumed to be always zero-extended.

That's not what the code is supposed to do.  As explained in the comment, the code is intended to compute the nonzero bits of the subreg from the nonzero_bits of the inner reg:

	  nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
					  known_x, known_mode, known_ret);

> This is incorrect since modes that are smaller than WORD_MODE may contain
> random top bits. This is equally true for RISC and CISC ISAs and independent 
> of WORD_REGISTER_OPERATIONS, so it's unclear why the !REG_P check was added.

No, that's wrong, WORD_REGISTER_OPERATIONS precisely means that the bits up to the word are defined when operations operate in mode smaller than a word.
Comment 4 Eric Botcazou 2018-01-29 07:35:30 UTC
Investigating.
Comment 5 Wilco 2018-01-29 13:19:44 UTC
(In reply to Eric Botcazou from comment #3)
> > PR59461 changed nonzero_bits1 incorrectly for subregs:
> > 
> >           /* On many CISC machines, accessing an object in a wider mode
> >              causes the high-order bits to become undefined.  So they are
> >              not known to be zero.  */
> >           rtx_code extend_op;
> >           if ((!WORD_REGISTER_OPERATIONS
> >                /* If this is a typical RISC machine, we only have to worry
> >                   about the way loads are extended.  */
> >                || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
> >                    ? val_signbit_known_set_p (inner_mode, nonzero)
> >                    : extend_op != ZERO_EXTEND)
> >                || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
> >               && xmode_width > inner_width)
> >             nonzero
> >               |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK
> > (inner_mode));
> > 
> > If WORD_REGISTER_OPERATIONS is set and load_extend_op is ZERO_EXTEND, rtl
> > like
> > 
> > (subreg:SI (reg:HI 125) 0)
> > 
> > is assumed to be always zero-extended.
> 
> That's not what the code is supposed to do.  As explained in the comment,
> the code is intended to compute the nonzero bits of the subreg from the
> nonzero_bits of the inner reg:
> 
> 	  nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
> 					  known_x, known_mode, known_ret);

That's based on the inner type alone and not correct for WORD_REGISTER_OPERATIONS. The

nonzero |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK;

adds in the unknown bits for the wider type. And that's the bit that is no longer triggering.

> > This is incorrect since modes that are smaller than WORD_MODE may contain
> > random top bits. This is equally true for RISC and CISC ISAs and independent 
> > of WORD_REGISTER_OPERATIONS, so it's unclear why the !REG_P check was added.
> 
> No, that's wrong, WORD_REGISTER_OPERATIONS precisely means that the bits up
> to the word are defined when operations operate in mode smaller than a word.

They are always written but have an undefined value. Adding 2 8-bit values results in a 9-bit value with WORD_REGISTER_OPERATIONS.
Comment 6 Eric Botcazou 2018-01-30 08:42:21 UTC
> They are always written but have an undefined value. Adding 2 8-bit values
> results in a 9-bit value with WORD_REGISTER_OPERATIONS.

If they have an undefined value, then WORD_REGISTER_OPERATIONS must not be defined for ARM.  Here's the definition:

 -- Macro: WORD_REGISTER_OPERATIONS
     Define this macro to 1 if operations between registers with
     integral mode smaller than a word are always performed on the
     entire register.  Most RISC machines have this property and most
     CISC machines do not.

If the 8-bit addition is not performed on the entire 32-bit register, then this is not a WORD_REGISTER_OPERATIONS target.
Comment 7 Eric Botcazou 2018-01-30 09:35:04 UTC
The problem is that reg_nonzero_bits_for_combine returns 0xffff for (reg:HI 121) when queried for SImode after:

(insn 10 7 11 2 (set (subreg:SI (reg:HI 121) 0)
        (sign_extend:SI (mem/c:HI (plus:SI (reg/f:SI 103 afp)
                    (const_int 4 [0x4])) [1 u+0 S2 A32]))) 171 {*arm_extendhisi2_v6}
Comment 8 Wilco 2018-01-30 10:47:25 UTC
(In reply to Eric Botcazou from comment #6)
> > They are always written but have an undefined value. Adding 2 8-bit values
> > results in a 9-bit value with WORD_REGISTER_OPERATIONS.
> 
> If they have an undefined value, then WORD_REGISTER_OPERATIONS must not be
> defined for ARM.  Here's the definition:
> 
>  -- Macro: WORD_REGISTER_OPERATIONS
>      Define this macro to 1 if operations between registers with
>      integral mode smaller than a word are always performed on the
>      entire register.  Most RISC machines have this property and most
>      CISC machines do not.
> 
> If the 8-bit addition is not performed on the entire 32-bit register, then
> this is not a WORD_REGISTER_OPERATIONS target.

The addition is performed on the full 32-bit register, so this obviously means that the top 24 bits have an undefined value.
Comment 9 Eric Botcazou 2018-01-30 11:39:32 UTC
The problematic code is the SUBREG case of record_dead_and_set_regs_1:

  if (REG_P (dest))
    {
      /* If we are setting the whole register, we know its value.  Otherwise
	 show that we don't know the value.  We can handle SUBREG in
	 some cases.  */
      if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
	record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
      else if (GET_CODE (setter) == SET
	       && GET_CODE (SET_DEST (setter)) == SUBREG
	       && SUBREG_REG (SET_DEST (setter)) == dest
	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
	       && subreg_lowpart_p (SET_DEST (setter)))
	record_value_for_reg (dest, record_dead_insn,
			      gen_lowpart (GET_MODE (dest),
						       SET_SRC (setter)));
      else
	record_value_for_reg (dest, record_dead_insn, NULL_RTX);
    }

It's very old (RMS r357) and effectively synthesizes a wrong SET operation.
Comment 10 Eric Botcazou 2018-01-30 11:47:01 UTC
> The addition is performed on the full 32-bit register, so this obviously
> means that the top 24 bits have an undefined value.

Not if the entire registers have a defined value before the addition.  The point of WORD_REGISTER_OPERATIONS is the following: you have a pair of SImode registers with known values and you do a QImode addition on them.  If the macro is defined, then the compiler considers that the result has a defined value in SImode.

In any case, that's not really the issue here and I'll just fix the combiner.
Comment 11 Eric Botcazou 2018-01-30 17:53:28 UTC
Recategorizing.
Comment 12 Eric Botcazou 2018-01-31 10:03:38 UTC
Author: ebotcazou
Date: Wed Jan 31 10:03:06 2018
New Revision: 257224

URL: https://gcc.gnu.org/viewcvs?rev=257224&root=gcc&view=rev
Log:
	PR rtl-optimization/84071
	* combine.c (record_dead_and_set_regs_1): Record the source unmodified
	for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/20180131-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog
Comment 13 Eric Botcazou 2018-01-31 10:08:39 UTC
Author: ebotcazou
Date: Wed Jan 31 10:08:08 2018
New Revision: 257226

URL: https://gcc.gnu.org/viewcvs?rev=257226&root=gcc&view=rev
Log:
	PR rtl-optimization/84071
	* combine.c (record_dead_and_set_regs_1): Record the source unmodified
	for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.c-torture/execute/20180131-1.c
      - copied unchanged from r257224, trunk/gcc/testsuite/gcc.c-torture/execute/20180131-1.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/combine.c
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 14 Eric Botcazou 2018-01-31 10:11:01 UTC
Thanks for reporting the problem and distilling a reduced testcase.
Comment 15 Wilco 2018-01-31 12:56:44 UTC
(In reply to Eric Botcazou from comment #10)
> > The addition is performed on the full 32-bit register, so this obviously
> > means that the top 24 bits have an undefined value.
> 
> Not if the entire registers have a defined value before the addition.  The
> point of WORD_REGISTER_OPERATIONS is the following: you have a pair of
> SImode registers with known values and you do a QImode addition on them.  If
> the macro is defined, then the compiler considers that the result has a
> defined value in SImode.

That's the best description of WORD_REGISTER_OPERATIONS I've seen - maybe we should fix the docs to be clearer? Also I wonder whether this means AArch64 should set it since targets like MIPS and Sparc already set it.

> In any case, that's not really the issue here and I'll just fix the combiner.

Thanks for fixing this. I'm still not convinced that the logic of this is right:

          if ((!WORD_REGISTER_OPERATIONS
               /* If this is a typical RISC machine, we only have to worry
                  about the way loads are extended.  */
               || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
                   ? val_signbit_known_set_p (inner_mode, nonzero)
                   : extend_op != ZERO_EXTEND)
               || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
              && xmode_width > inner_width)

So assuming WORD_REGISTER_OPERATIONS and load_extend_op is ZERO_EXTEND, we fall into the (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))). But that effectively means that load_extend_op applies to REG_P as well as MEM_P, which can't be right...
Comment 16 Eric Botcazou 2018-01-31 14:05:46 UTC
> That's the best description of WORD_REGISTER_OPERATIONS I've seen - maybe we
> should fix the docs to be clearer?

Yes, I can add a blurb indeed.

> Also I wonder whether this means AArch64 should set it since targets like MIPS 
> and Sparc already set it.

There seems to be a good reason against that:

/* WORD_REGISTER_OPERATIONS does not hold for AArch64.
   The assigned word_mode is DImode but operations narrower than SImode
   behave as 32-bit operations if using the W-form of the registers rather
   than as word_mode (64-bit) operations as WORD_REGISTER_OPERATIONS
   expects.  */
#define WORD_REGISTER_OPERATIONS 0

> Thanks for fixing this. I'm still not convinced that the logic of this is
> right:
> 
>           if ((!WORD_REGISTER_OPERATIONS
>                /* If this is a typical RISC machine, we only have to worry
>                   about the way loads are extended.  */
>                || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
>                    ? val_signbit_known_set_p (inner_mode, nonzero)
>                    : extend_op != ZERO_EXTEND)
>                || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
>               && xmode_width > inner_width)
> 
> So assuming WORD_REGISTER_OPERATIONS and load_extend_op is ZERO_EXTEND, we
> fall into the (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))). But that
> effectively means that load_extend_op applies to REG_P as well as MEM_P,
> which can't be right...

You need to be sure that the upper bits in the paradoxical SUBREG are preserved in the case of spilling to memory.  In other words, you need to be sure that zeros in the upper bits are preserved through spilling and a simple way to do it is to require that loads be zero-extended, in case the SUBREG and not the entire REG is spilled.  reload and LRA have specific code for WORD_REGISTER_OPERATIONS targets but I'm not sure they guarantee a full word spill in all cases, at least reload historically hasn't I think.
Comment 17 Eric Botcazou 2018-01-31 15:02:11 UTC
Author: ebotcazou
Date: Wed Jan 31 15:01:40 2018
New Revision: 257237

URL: https://gcc.gnu.org/viewcvs?rev=257237&root=gcc&view=rev
Log:
	PR rtl-optimization/84071
	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
	* doc/tm.texi: Regenerate.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
Comment 18 Eric Botcazou 2018-01-31 15:02:27 UTC
Author: ebotcazou
Date: Wed Jan 31 15:01:53 2018
New Revision: 257238

URL: https://gcc.gnu.org/viewcvs?rev=257238&root=gcc&view=rev
Log:
	PR rtl-optimization/84071
	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
	* doc/tm.texi: Regenerate.

Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/doc/tm.texi
    branches/gcc-7-branch/gcc/doc/tm.texi.in
Comment 19 Wilco 2018-01-31 16:49:01 UTC
(In reply to Eric Botcazou from comment #16)

> > Also I wonder whether this means AArch64 should set it since targets like MIPS 
> > and Sparc already set it.
> 
> There seems to be a good reason against that:
> 
> /* WORD_REGISTER_OPERATIONS does not hold for AArch64.
>    The assigned word_mode is DImode but operations narrower than SImode
>    behave as 32-bit operations if using the W-form of the registers rather
>    than as word_mode (64-bit) operations as WORD_REGISTER_OPERATIONS
>    expects.  */
> #define WORD_REGISTER_OPERATIONS 0

How is this any different from 32-bit operations in say MIPS? The only difference seems to be that MIPS sign-extends 32-bit operations to 64 bit while AArch64 zero-extends. If that's correct for MIPS it seems that WORD_REGISTER_OPERATIONS only applies to types smaller than SImode.
Comment 20 Eric Botcazou 2018-02-02 11:08:10 UTC
> How is this any different from 32-bit operations in say MIPS? The only
> difference seems to be that MIPS sign-extends 32-bit operations to 64 bit
> while AArch64 zero-extends. If that's correct for MIPS it seems that
> WORD_REGISTER_OPERATIONS only applies to types smaller than SImode.

Interesting question.  One indeed could argue that, if 64-bit MIPS defines it, then Aarch64 could do it too since they are symmetric wrt sign/zero-extension
but I don't have a definitive answer as I don't really know the history of WORD_REGISTER_OPERATIONS on MIPS.  Maybe worth a try if this brings measurable benefits, typically elimination of redundant 32->64 zero-extensions.
Comment 21 mpf 2018-02-02 11:21:40 UTC
(In reply to Eric Botcazou from comment #20)
> > How is this any different from 32-bit operations in say MIPS? The only
> > difference seems to be that MIPS sign-extends 32-bit operations to 64 bit
> > while AArch64 zero-extends. If that's correct for MIPS it seems that
> > WORD_REGISTER_OPERATIONS only applies to types smaller than SImode.
> 
> Interesting question.  One indeed could argue that, if 64-bit MIPS defines
> it, then Aarch64 could do it too since they are symmetric wrt
> sign/zero-extension
> but I don't have a definitive answer as I don't really know the history of
> WORD_REGISTER_OPERATIONS on MIPS.

I believe the critical part for MIPS is that our 32-bit instructions do care what the upper 32-bits of a 64-bit register contain. It is undefined if they operate on a non-canonical 32-bit form and the instructions that are width agnostic (conditional branch and logical ops) rely on canonical forms to work correctly with 32-bit values.

So MIPS fundamentally needs this feature to work correctly; whether AArch64 needs it or may just benefit from it depends on a lot of detailed knowledge of the ISA and architecture. Given Richard Sandiford is currently working on ARM ports but also fully understands the MIPS arch then he may be a good person to consult.
Comment 22 Eric Botcazou 2018-02-02 17:25:56 UTC
> So MIPS fundamentally needs this feature to work correctly; whether AArch64
> needs it or may just benefit from it depends on a lot of detailed knowledge
> of the ISA and architecture. Given Richard Sandiford is currently working on
> ARM ports but also fully understands the MIPS arch then he may be a good
> person to consult.

If "this feature" is WORD_REGISTER_OPERATIONS, then I don't think that any target, including MIPS, really needs it.  Defining the macro is only an optimization and a way to convey multiple properties of the target with only one setting.  And, for MIPS, the implementation of the sign-extension invariant is done by other means.