Bug 29589 - incorrect conversion of (ior (ashiftrt (plus ...))) in combine.c
Summary: incorrect conversion of (ior (ashiftrt (plus ...))) in combine.c
Status: RESOLVED DUPLICATE of bug 57829
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.1.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2006-10-25 02:09 UTC by Charles Fu
Modified: 2013-12-26 23:13 UTC (History)
3 users (show)

See Also:
Host:
Target: powerpc-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-02-03 19:30:25


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Charles Fu 2006-10-25 02:09:42 UTC
We add an optimization phase in GCC, and our changes trigger GCC generates incorrect code during the combine phase. GCC incorrectly combine three instructions into two intructions:

  from:
    insn_1    set (SI reg_a) (plus:SI     (SI: reg_b) (const_int -1))
    insn_2    set (SI reg_c) (ashiftrt:SI (SI: reg_a) (const_int 31))
    insn_3    set (SI reg_d) (ior:SI      (SI: reg_c) (const_int C))

  into
    insn_1    set (SI reg_a) (plus:SI     (SI: reg_b) (const_int -1))
    insn_2    set (SI reg_c) (ashiftrt:SI (SI: reg_a) (const_int 31))

where the third instruction has been deleted incorrectly.

The buggy code is in  simplify_logical (rtx x) in combine.c:
       ....
      /* If OP0 is (ashiftrt (plus ...) C), it might actually be
         a (sign_extend (plus ...)).  If so, OP1 is a CONST_INT, and the PLUS
         does not affect any of the bits in OP1, it can really be done
         as a PLUS and we can associate.  We do this by seeing if OP1
         can be safely shifted left C bits.  */
      if (GET_CODE (op1) == CONST_INT && GET_CODE (op0) == ASHIFTRT
          && GET_CODE (XEXP (op0, 0)) == PLUS
          && GET_CODE (XEXP (XEXP (op0, 0), 1)) == CONST_INT
          && GET_CODE (XEXP (op0, 1)) == CONST_INT
          && INTVAL (XEXP (op0, 1)) < HOST_BITS_PER_WIDE_INT)
        {
          int count = INTVAL (XEXP (op0, 1));
          HOST_WIDE_INT mask = INTVAL (op1) << count;

          if (mask >> count == INTVAL (op1)
              && (mask & nonzero_bits (XEXP (op0, 0), mode)) == 0)
            {
              SUBST (XEXP (XEXP (op0, 0), 1),
                     GEN_INT (INTVAL (XEXP (XEXP (op0, 0), 1)) | mask));
              return op0;
            }
        }
        ...

Where x = 
        (ior:SI (ashiftrt:SI (plus:SI (reg:SI 127)
                                      (const_int -1 [0xffffffffffffffff]))
                             (const_int 31 [0x1f]))
                (const_int 60 [0x3c]))

After the simplification:
      x = op0 =
        (ashiftrt:SI (plus:SI (reg:SI 127)
                              (const_int -1 [0xffffffffffffffff]))
        (const_int 31 [0x1f]))

We think the buggy code is this line:
   ...
   (mask & nonzero_bits (XEXP (op0, 0), mode)) == 0
    ^^^^
   
where nonzero_bits returns 0xffffffff.

We think may be what wanted is:
    ...
   (INTVAL (op1) & nonzero_bits (XEXP (op0, 0), mode)) == 0
    ^^^^^^^^^^^^ 

With this patch, the optimization for this case is undoable and should just be disabled.

The following is the source code and rtl dumps before combine (test.c.26.life1) and after combine phase (test.c.27.combine). 

The option we uses: gcc -O3 -fno-inline (+with our optimization changes)

The instructions to be combined are in foo() in test.c.26.life1:
 ...
 (insn 51 50 52 0 (set (reg:SI 128)
        (plus:SI (reg:SI 127)
            (const_int -1 [0xffffffffffffffff]))) 39 {*addsi3_internal1} (insn_l
ist:REG_DEP_TRUE 50 (nil))
    (expr_list:REG_DEAD (reg:SI 127)
        (nil)))

(insn 52 51 54 0 (set (reg:SI 129)
        (ashiftrt:SI (reg:SI 128)
            (const_int 31 [0x1f]))) 134 {ashrsi3_no_power} (insn_list:REG_DEP_TR
UE 51 (nil))
    (expr_list:REG_DEAD (reg:SI 128)
        (nil)))

(insn 54 52 26 0 (set (reg/v:SI 119 [ b ])
        (ior:SI (reg:SI 129)
            (const_int 60 [0x3c]))) 85 {*boolsi3_internal1} (insn_list:REG_DEP_T
RUE 52 (nil))
    (expr_list:REG_DEAD (reg:SI 129)
        (nil)))
...
------------------- source code ---------------
extern void abort (void);
extern void exit (int);

_Bool a;

int foo ()
{
  int b;

  if (a) goto L1; else goto L0;

L0:
  b = -1; goto L2;

L1:
  b = 0x3c;

L2:
  return b;
}

int main(int argc, char **argv)
{
    a = 1;
    if (foo() != 0x3c)
       abort();
    exit(0);
}

----------------------------------------
rtl-dump: test.c.26.life1

;; Function foo (foo)

try_optimize_cfg iteration 1

try_optimize_cfg iteration 1

(note 2 0 6 NOTE_INSN_DELETED)

(note 6 2 9 NOTE_INSN_FUNCTION_BEG)

;; Start of basic block 0, registers live: 1 [1] 2 [2] 31 [31] 67 [ap] 109 [vrsa
ve] 110 [vscr] 113 [sfp]
(note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn 11 9 12 0 (use (symbol_ref:DI ("a") <var_decl 0x2aaaaac0f4d0 a>)) -1 (nil)
    (nil))

(insn 12 11 13 0 (set (reg/f:DI 121 [ A8 ])
        (zero_extend:DI (mem/u/c:SI (plus:DI (reg:DI 2 2)
                    (const:DI (minus:DI (symbol_ref/u:DI ("*.LC0") [flags 0x2])
                            (symbol_ref:DI ("*.LCTOC1"))))) [5 S8 A64]))) 2 {*ze
ro_extendsidi2_internal1} (nil)
    (expr_list:REG_DEAD (reg:DI 2 2)
        (expr_list:REG_EQUAL (symbol_ref:DI ("a") <var_decl 0x2aaaaac0f4d0 a>)
            (nil))))

(insn 13 12 14 0 (set (reg:QI 123 [ a ])
        (mem/c/i:QI (reg/f:DI 121 [ A8 ]) [3 a+0 S1 A8])) 257 {*movqi_internal} 
(insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg/f:DI 121 [ A8 ])
        (nil)))

(insn 14 13 50 0 (set (reg:SI 122 [ a ])
        (zero_extend:SI (reg:QI 123 [ a ]))) 20 {*rs6000.md:474} (insn_list:REG_
DEP_TRUE 13 (nil))
    (expr_list:REG_DEAD (reg:QI 123 [ a ])
        (nil)))

(insn 50 14 51 0 (parallel [
            (set (reg:SI 127)
                (abs:SI (reg:SI 122 [ a ])))
            (clobber (scratch:SI))
        ]) 54 {abssi2_nopower} (insn_list:REG_DEP_TRUE 14 (nil))
    (expr_list:REG_DEAD (reg:SI 122 [ a ])
        (expr_list:REG_UNUSED (scratch:SI)
            (nil))))

(insn 51 50 52 0 (set (reg:SI 128)
        (plus:SI (reg:SI 127)
            (const_int -1 [0xffffffffffffffff]))) 39 {*addsi3_internal1} (insn_l
ist:REG_DEP_TRUE 50 (nil))
    (expr_list:REG_DEAD (reg:SI 127)
        (nil)))

(insn 52 51 54 0 (set (reg:SI 129)
        (ashiftrt:SI (reg:SI 128)
            (const_int 31 [0x1f]))) 134 {ashrsi3_no_power} (insn_list:REG_DEP_TR
UE 51 (nil))
    (expr_list:REG_DEAD (reg:SI 128)
        (nil)))

(insn 54 52 26 0 (set (reg/v:SI 119 [ b ])
        (ior:SI (reg:SI 129)
            (const_int 60 [0x3c]))) 85 {*boolsi3_internal1} (insn_list:REG_DEP_T
RUE 52 (nil))
    (expr_list:REG_DEAD (reg:SI 129)
        (nil)))

(note 26 54 28 0 ("L2") NOTE_INSN_DELETED_LABEL 4)

(insn 28 26 32 0 (set (reg:DI 126 [ b ])
        (sign_extend:DI (reg/v:SI 119 [ b ]))) 17 {*rs6000.md:406} (insn_list:RE
G_DEP_TRUE 54 (nil))
    (expr_list:REG_DEAD (reg/v:SI 119 [ b ])
        (nil)))

(note 32 28 35 0 NOTE_INSN_FUNCTION_END)

(insn 35 32 41 0 (set (reg/i:DI 3 3 [ <result> ])
        (reg:DI 126 [ b ])) 274 {*movdi_internal64} (insn_list:REG_DEP_TRUE 28 (
nil))
    (expr_list:REG_DEAD (reg:DI 126 [ b ])
        (nil)))

(insn 41 35 0 0 (use (reg/i:DI 3 3 [ <result> ])) -1 (insn_list:REG_DEP_TRUE 35 
(nil))
    (nil))
;; End of basic block 0, registers live:
 1 [1] 3 [3] 31 [31] 67 [ap] 109 [vrsave] 110 [vscr] 113 [sfp]


------------------------------------------------------------
rtl-dump after combine: test.c.27.combine


;; Function foo (foo)

insn_cost 11: 0
insn_cost 12: 8
insn_cost 13: 8
insn_cost 14: 4
insn_cost 50: 4
insn_cost 51: 4
insn_cost 52: 4
insn_cost 54: 4
insn_cost 28: 4
insn_cost 35: 4
insn_cost 41: 0
(note 2 0 6 NOTE_INSN_DELETED)

(note 6 2 9 NOTE_INSN_FUNCTION_BEG)

;; Start of basic block 0, registers live: 1 [1] 2 [2] 31 [31] 67 [ap] 109 [vrsa
ve] 110 [vscr] 113 [sfp]
(note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn 11 9 12 0 (use (symbol_ref:DI ("a") <var_decl 0x2aaaaac0f4d0 a>)) -1 (nil)
    (nil))

(insn 12 11 13 0 (set (reg/f:DI 121 [ A8 ])
        (zero_extend:DI (mem/u/c:SI (plus:DI (reg:DI 2 2)
                    (const:DI (minus:DI (symbol_ref/u:DI ("*.LC0") [flags 0x2])
                            (symbol_ref:DI ("*.LCTOC1"))))) [5 S8 A64]))) 2 {*ze
ro_extendsidi2_internal1} (nil)
    (expr_list:REG_DEAD (reg:DI 2 2)
        (expr_list:REG_EQUAL (symbol_ref:DI ("a") <var_decl 0x2aaaaac0f4d0 a>)
            (nil))))

(note 13 12 14 0 NOTE_INSN_DELETED)

(note 14 13 50 0 NOTE_INSN_DELETED)

(insn 50 14 51 0 (set (reg:SI 127)
        (zero_extend:SI (mem/c/i:QI (reg/f:DI 121 [ A8 ]) [3 a+0 S1 A8]))) 20 {*
rs6000.md:474} (insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg/f:DI 121 [ A8 ])
        (nil)))

(note 51 50 52 0 NOTE_INSN_DELETED)

(insn 52 51 54 0 (set (reg:SI 129)
        (plus:SI (reg:SI 127)
            (const_int -1 [0xffffffffffffffff]))) 39 {*addsi3_internal1} (insn_l
ist:REG_DEP_TRUE 50 (nil))
    (expr_list:REG_DEAD (reg:SI 127)
        (nil)))

(note 54 52 26 0 NOTE_INSN_DELETED)

(note 26 54 28 0 ("L2") NOTE_INSN_DELETED_LABEL 4)

(insn 28 26 32 0 (set (reg:DI 126 [ b ])
        (ashift:DI (subreg:DI (reg:SI 129) 0)
            (const_int 32 [0x20]))) 224 {*ashldi3_internal1} (insn_list:REG_DEP_
TRUE 52 (nil))
    (expr_list:REG_DEAD (reg:SI 129)
        (nil)))

(note 32 28 35 0 NOTE_INSN_FUNCTION_END)

(insn 35 32 41 0 (set (reg/i:DI 3 3 [ <result> ])
        (ashiftrt:DI (reg:DI 126 [ b ])
            (const_int 63 [0x3f]))) 236 {*ashrdi3_internal1} (insn_list:REG_DEP_
TRUE 28 (nil))
    (expr_list:REG_DEAD (reg:DI 126 [ b ])
        (nil)))

(insn 41 35 0 0 (use (reg/i:DI 3 3 [ <result> ])) -1 (insn_list:REG_DEP_TRUE 35 
(nil))
    (nil))
;; End of basic block 0, registers live:
 1 [1] 3 [3] 31 [31] 67 [ap] 109 [vrsave] 110 [vscr] 113 [sfp]
Comment 1 Doug Evans 2006-10-25 02:23:24 UTC
Re: "We think may be what wanted is:" ...

That's just off the cuff speculation.  The curious things are:

- op1 is shifted outside the mode of the operation (0x3c << 31) (HOST_WIDE_INT is 64 bits) and this value is the value AND'd with the result of nonzero_bits.

- nonzero_bits returns bits that may be one, not bits that are one, so it's not clear this optimization is valid regardless of anything else
Comment 2 Doug Evans 2006-10-25 02:41:09 UTC
Thinking about it some more, disregard this (I think):

- nonzero_bits returns bits that may be one, not bits that are one, so it's not
clear this optimization is valid regardless of anything else

I _think_ this is the culprit, though it's not worded very well:

- op1 is shifted outside the mode of the operation (0x3c << 31) (HOST_WIDE_INT
is 64 bits) and this value is the value AND'd with the result of nonzero_bits.

I think that's what you want, but "We do this by seeing if OP1 can be safely shifted left C bits" doesn't take into account the mode of the operation.

Just guessing though.
Comment 3 Andrew Pinski 2006-10-25 03:07:05 UTC
Revision 1.169 / (download) - annotate - [select for diffs] , Fri May 6 16:42:40 1994 UTC (12 years, 5 months ago) by kenner
Branch: MAIN
Changes since 1.168: +123 -91 lines
Diff to previous 1.168 (colored)

(simplify_rtx, case MULT): Don't convert MULT to shift here.
(simplify_logical, case IOR): Convert back to PLUS if valid and it will
combine with another PLUS.
(extract_left_shift): New function.
(make_compound_operation, case ASHIFTRT): Simplify by calling it.
(force_to_mode): Don't ignore if X is a SUBREG.
(force_to_mode, case AND): Try to turn unchecked bits on instead of just off
and see which is cheaper.
Comment 4 Andrew Pinski 2007-06-07 21:00:42 UTC
And here is a testcase which makes this reproduciable on the trunk (on powerpc-linux):
extern void abort (void);
extern void exit (int);

_Bool a;

int foo ()
{
  int b;
  int c = a;
  c = __builtin_abs(c);
  c = c-1;
  c = (c) >>31;
  b = c | 60;
  return b;
}

int main(int argc, char **argv)
{
    a = 1;
    if (foo() != 0x3c)
       abort();
    exit(0);
}
Comment 5 Andrew Pinski 2007-06-11 04:44:14 UTC
I have a fix from our local tree which also fixes up the regression which we found with a different patch.
Comment 6 Steven Bosscher 2008-02-03 14:39:41 UTC
Andrew, ping about your fix?
Comment 7 Andrew Pinski 2008-02-03 19:30:25 UTC
(In reply to comment #6)
> Andrew, ping about your fix?

I have to extract it, it has changed since the bug was originally sent out.
Comment 8 Andrew Pinski 2011-11-29 23:16:23 UTC
No longer working on this and I lost the patch.
Comment 9 Mikael Pettersson 2011-11-30 08:37:49 UTC
I can reproduce the bug on powerpc64-linux with gcc-4.2.4 -O2 -m32/-m64, but not with gcc-4.3.6, 4.4.6, 4.5.3, 4.6.1, or 4.7-current, with either -m32 or -m64.
Close as fixed?
Comment 10 Bernhard Kaindl 2012-03-02 19:16:13 UTC
Verification using gcc-4.4.6 --target=powerpc-linux --host=i686-linux:

The test case works with -O2, -Os as well as -O3 -no-inline with this cross-gcc,
but when using -O1, the test case still calls abort() -> foo returns 0 in gdb,
so it has been improved, but not completely fixed in 4.4.6
Comment 11 Steven Bosscher 2013-12-26 23:13:23 UTC
Merry Christmas!

*** This bug has been marked as a duplicate of bug 57829 ***