Bug 23837 - [4.0 regression] Wrong code with REG_NO_CONFLICT notes (caused by combine)
Summary: [4.0 regression] Wrong code with REG_NO_CONFLICT notes (caused by combine)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.0.2
: P3 critical
Target Milestone: 4.0.3
Assignee: Steven Bosscher
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2005-09-12 19:01 UTC by Debian GCC Maintainers
Modified: 2005-12-19 12:36 UTC (History)
5 users (show)

See Also:
Host:
Target: hppa2.0w-hp-hpux11.11, hppa-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-12-15 06:42:28


Attachments
proposed patch (523 bytes, patch)
2005-09-20 20:53 UTC, Jorn Wolfgang Rennecke
Details | Diff
Instruction stream (stripped) before scheduling (874 bytes, text/plain)
2005-12-12 21:38 UTC, Steven Bosscher
Details
Instruction stream (stripped) after scheduling (934 bytes, text/plain)
2005-12-12 21:38 UTC, Steven Bosscher
Details
a full set of debugging dumps (15.72 KB, application/x-bzip2)
2005-12-12 22:14 UTC, Steven Bosscher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Debian GCC Maintainers 2005-09-12 19:01:33 UTC
[forwarded from http://bugs.debian.org/326026]

falk@paer:~% cat test.c
void abort(void);

unsigned long long f(unsigned long long x) {
    return ((x >> 8) | (x << 56)) ^ ((x >> 48) | (x << 16)) ^ (x >> 1);
}

int main() {
    volatile unsigned long long v = 0x1122334455667788ULL;
    if (f(v) != 0xb3c46ef7196e4c91ULL)
        abort();
    return 0;
}

falk@paer:~% gcc-4.0 -O1 -fno-schedule-insns test.c && ./a.out
falk@paer:~% gcc-4.0 -O1 -fschedule-insns test.c && ./a.out   
zsh: abort (core dumped)  ./a.out

Reproduced with 4.0.2 20050821 on hppa-linux and with 4.1.0 20050705 on
i686-linux.
Comment 1 Andrew Pinski 2005-09-12 19:05:03 UTC
Confirmed but works with "4.0.0 20050225".
Comment 2 Jakub Jelinek 2005-09-13 09:04:11 UTC
At least on i?86, this seems to be introduced by reload, .lreg dump is still
sane.
Comment 3 janis187 2005-09-19 19:42:56 UTC
A regression hunt on i686-linux showed the failure starting with this patch
from amylaar@gcc.gnu.org:

  http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00640.html
Comment 4 Jorn Wolfgang Rennecke 2005-09-19 21:04:38 UTC
(In reply to comment #3)
> A regression hunt on i686-linux showed the failure starting with this patch
> from amylaar@gcc.gnu.org:
> 
>   http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00640.html

If the lreg dump is still sane that indicates that the problem was not
caused, by the patch, merely a latent bug was triggered.  emit_no_conflict_block
should not be called during reload.  to double-check, you can set a breakpoint
on emit_no_conflict_block when reload starts.

However, while looking for possible connections, I found that emit_libcall_block
has a similar bug as emit_no_conflict_block used to have, i.e. it ignores the
result of SETs if an insn has more than one.

FWIW, emit_libcall_block is used in i386.c in legitimize_tls_address, which in
turn is used in ix86_expand_move, which is used by the i386.md move expanders,
which are used by emit_move_insn_1, which is used by gen_move_insn, which
is used by gen_reload.
Comment 5 Jorn Wolfgang Rennecke 2005-09-20 19:06:43 UTC
Confirmed with gcc version 4.1.0 20050919 (experimental) on i686-pc-linux-gnu.
Comment 6 Jorn Wolfgang Rennecke 2005-09-20 20:52:11 UTC
(In reply to comment #3)
> A regression hunt on i686-linux showed the failure starting with this patch
> from amylaar@gcc.gnu.org:
> 
>   http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00640.html

The i386.md x86_shrd_1 pattern is bogus, is should use lshiftrt,
not ashiftrt.  However, that is not the reason for this regression.

The reason is that we have a bogus REG_NO_CONFLICT block.  The generic
rotate expander expects emit_no_conflict_block to re-order its instructions,
but that is now suppresed by a match between clobbers of the flags register.
Comment 7 Jorn Wolfgang Rennecke 2005-09-20 20:53:33 UTC
Created attachment 9783 [details]
proposed patch
Comment 8 joern.rennecke@st.com 2005-09-22 17:37:15 UTC
Subject: RFA: fix PR 23837 (Re:  [4.0/4.1 regression] Wrong code with -fschedule-insns)

I have regtested the attached patch on i686-pc-linux-gnu in mainline
from 2005-09-19 18:00 UTC.
2005-09-22  J"orn Rennecke <joern.rennecke@st.com>

	PR rtl-optimization/23837
	*  optabs.c (no_conflict_move_test): Don't set must_stay for a
	clobber / clobber match between dest and p->first.

Index: optabs.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/optabs.c,v
retrieving revision 1.291
diff -p -r1.291 optabs.c
*** optabs.c	15 Sep 2005 21:51:13 -0000	1.291
--- optabs.c	22 Sep 2005 17:31:13 -0000
*************** no_conflict_move_test (rtx dest, rtx set
*** 3021,3027 ****
      return;
    /* If this insn sets / clobbers a register that feeds one of the insns
       already in the list, this insn has to stay too.  */
!   else if (reg_mentioned_p (dest, PATTERN (p->first))
  	   || reg_used_between_p (dest, p->first, p->insn)
  	   /* Likewise if this insn depends on a register set by a previous
  	      insn in the list.  */
--- 3021,3028 ----
      return;
    /* If this insn sets / clobbers a register that feeds one of the insns
       already in the list, this insn has to stay too.  */
!   else if (reg_overlap_mentioned_p (dest, PATTERN (p->first))
! 	   || (CALL_P (p->first) && (find_reg_fusage (p->first, USE, dest)))
  	   || reg_used_between_p (dest, p->first, p->insn)
  	   /* Likewise if this insn depends on a register set by a previous
  	      insn in the list.  */
Comment 9 GCC Commits 2005-09-23 12:06:12 UTC
Subject: Bug 23837

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	amylaar@gcc.gnu.org	2005-09-23 12:05:58

Modified files:
	gcc            : ChangeLog optabs.c 

Log message:
	PR rtl-optimization/23837
	*  optabs.c (no_conflict_move_test): Don't set must_stay for a
	clobber / clobber match between dest and p->first.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10007&r2=2.10008
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&r1=1.291&r2=1.292

Comment 10 Andrew Pinski 2005-10-10 04:29:26 UTC
Only a hppa-linux bug now, which means this should be targetted at 4.2.0.
Comment 11 Steven Bosscher 2005-10-21 11:21:25 UTC
How do you know this is hppa-linux only now?  There's a bit if information
missing about how you got to that conclusion.

Is there some simple way to test this bug on HPPA? (Maybe add a HPPA 
maintainer to the CC: list? ;-)
Comment 12 Steven Bosscher 2005-11-06 11:50:52 UTC
Folks, what's up with this bug?  Is it HPPA-only now?  Can someone comment on the status of this bug please?
Comment 13 John David Anglin 2005-11-06 17:32:03 UTC
Fails on hppa2.0w-hp-hpux11.11 with 4.0.0, 4.0.1 and 4.0.2.  However, it
doesn't fail using 3.4.4.
Comment 14 Andrew Pinski 2005-11-19 01:28:08 UTC
This also effects hppa2.0w-hp-hpux11.11 which means this effects a primary target.
Comment 15 Steven Bosscher 2005-12-12 20:19:04 UTC
I can reproduce this on hppa2.0-suse-linux-gnu with the "4.2-20051210" snapshot.
Comment 16 Jorn Wolfgang Rennecke 2005-12-12 20:28:07 UTC
(In reply to comment #15)
> I can reproduce this on hppa2.0-suse-linux-gnu with the "4.2-20051210"
> snapshot.

Could you make a full set of debugging dumps (i.e. from compiling with -da) available somewhere?
 

Comment 17 Steven Bosscher 2005-12-12 21:38:21 UTC
Created attachment 10461 [details]
Instruction stream (stripped) before scheduling
Comment 18 Steven Bosscher 2005-12-12 21:38:46 UTC
Created attachment 10462 [details]
Instruction stream (stripped) after scheduling
Comment 19 Steven Bosscher 2005-12-12 22:03:49 UTC
The dumps before and after scheduling look OK to me.

There are three groups of instructions for libcalls:
1) 19-14-18-20
   inputs: regs 95, 99, and 102 (all of them for x)
   result: reg 97
   clobbers: nothing
2) 29-24-28-30
   inputs: regs 95, 99, and 102
   result: reg 104
   clobbers: nothing
3) 37-34-35-36-38
   inputs: regs 95, and 113
   result: reg 112
   clobbers: nothing

Surrounding those, you have the following additional insns:
* insn 6 loads %r25 into reg 95
* insn 11 and insn 15 load the low and high parts of 95 into new regs 99
  and 102, so they must go before groups 1 and 2
* insn 33 sets reg 113 <- reg 102 << 31, so it goes before group 3 
* insn 31 sets reg 111 <- reg 97 XOR reg 104, so groups 1 and two need to
  be scheduled before this instruction
* insn 46 sets reg %28 <- reg 111 XOR reg 112, so it goes after insn 31 and
  also after group 3.

Before scheduling, the order is,
insn 6    --> out 95
insn 11   --> out 99, in 95
insn 15   --> out 102, in 95
group 1   --> out 97, in 95 99 102
group 2   --> out 104, in 95 99 102
insn 31   --> out 111, in 97 104
insn 33   --> out 113, in 102
group 3   --> out 112, in 95 113 
insn 46   --> out %26, in 111 112

After scheduling, the order is,
insn 6    --> out 95
insn 15   --> out 102, in 95
insn 33   --> out 113, in 102
insn 11   --> out 99, in 95
group 3   --> out 112, in 95 113 
group 1   --> out 97, in 95 99 102
group 2   --> out 104, in 95 99 102
insn 31   --> out 111, in 97 104
insn 46   --> out %26, in 111 112

The groups are scheduled properly as groups, and the required regs are all scheduled before the groups that they are needed in.

In short, unless I'm getting it all wrong, the scheduler is not to blame here.
Comment 20 Steven Bosscher 2005-12-12 22:14:02 UTC
Created attachment 10463 [details]
a full set of debugging dumps

Re. comment #16, sorry, I didn't read it until after going through the scheduler dumps.  Here is the complete set of dumps.
Comment 21 Jorn Wolfgang Rennecke 2005-12-13 13:16:57 UTC
(In reply to comment #17)
> Created an attachment (id=10461) [edit]
> Instruction stream (stripped) before scheduling
> 

(insn/s 24 0 (set (subreg:SI (reg:DI 104) 0)
        (ior:SI (ashift:SI (reg:SI 102)
                (const_int 16 [0x10]))
            (lshiftrt:SI (subreg:SI (reg/v:DI 95 [ x ]) 4)
                (const_int 16 [0x10])))) 181 {*pa.md:6669} (insn_list:REG_DEP_ANTI 29 (nil))
    (expr_list:REG_NO_CONFLICT (reg/v:DI 95 [ x ])
        (nil)))

(insn/s 28 0 (set (subreg:SI (reg:DI 104) 4)
        (ior:SI (ashift:SI (reg:SI 99 [ x ])
                (const_int 16 [0x10]))
            (lshiftrt:SI (subreg:SI (reg/v:DI 95 [ x ]) 0)
                (const_int 16 [0x10])))) 181 {*pa.md:6669} (insn_list:REG_DEP_ANTI 24 (nil))
    (expr_list:REG_NO_CONFLICT (reg/v:DI 95 [ x ])
        (nil)))


If (reg/v:DI 95 [ x ]) and (reg:DI 104) are assigned to the same hard register,
(subreg:SI (reg/v:DI 95 [ x ]) 0) gets clobbered in insn 24, before it is used
in insn 28.  Therefore, the REG_NO_CONFLICT block containing these two
instructions is bogus.
Comment 22 Jorn Wolfgang Rennecke 2005-12-13 13:41:37 UTC
(In reply to comment #20)
> Created an attachment (id=10463) [edit]
>  a full set of debugging dumps
> 
> Re. comment #16, sorry, I didn't read it until after going through the
> scheduler dumps.  Here is the complete set of dumps.
> 

The REG_NO_CONFLICT block in f from insn 29 to insn 30 is still sound in
t.c.26.life1.  In t.c.27.combine, it is bogus.  Apparently, the problem arises
because insns 25 and 27 have been combined into insn 28.  Suppressing this
combination or removing the violated REG_NO_CONFLICT note when the
combination is done should solve this problem.
It appears that local-alloc has different algorithms for combining pseudos
with and without REG_NO_CONFLICT notes.  Therefore, I suspect that having
a REG_NO_CONFLICT note on only some of the insns of a REG_NO_CONFLICT block
is worse than not having the note at all inside that block.  Thus,
if the REG_NO_CONFLICT note is removed, it might be best to remove it in all
insns of the affected REG_NO_CONLFICT block, and dissolve the block if that
was the last REG_NO_CONFLICT note.
Comment 23 Steven Bosscher 2005-12-15 00:00:43 UTC
lreg decides to do this:
;; Register 95 in 25.
;; Register 97 in 28.
;; Register 99 in 22.
;; Register 102 in 21.
;; Register 104 in 25.
;; Register 111 in 28.
;; Register 112 in 19.
;; Register 113 in 28.

and reload/greg agree (note the missing page breaks in the .greg dump btw):
;; Register dispositions:
95 in 25
97 in 28
99 in 22
102 in 21
104 in 25
111 in 28
112 in 19
113 in 28

So reg 95 and reg 104 are indeed assigned the same hard register.
Comment 24 Steven Bosscher 2005-12-15 00:09:10 UTC
I think we can blame combine for this bug.
Comment 25 Steven Bosscher 2005-12-15 00:52:27 UTC
Smarter folks than me (iant ;-) suggest that "a multi-word rotate will normally need all the input bits when setting any of the output bits", so the entire no-conflict thing doesn't make sense here.

So, let's not do that:

--- optabs.c    2005-12-15 01:49:23.000000000 +0100
+++ optabs.c.jj 2005-12-15 01:49:55.000000000 +0100
@@ -1525,7 +1525,14 @@ expand_binop (enum machine_mode mode, op
          else
            equiv_value = 0;

-         emit_insn (insns);
+         /* We can't make this a no conflict block if this is a word swap,
+            because the word swap case fails if the input and output values
+            are in the same register.  */
+         if (shift_count != BITS_PER_WORD)
+           emit_no_conflict_block (insns, target, op0, op1, equiv_value);
+         else
+           emit_insn (insns);
+

          return target;
        }
Comment 26 Steven Bosscher 2005-12-15 00:54:08 UTC
Needless to say really, but the patchlet in comment #25 is inverted...
Comment 27 Steven Bosscher 2005-12-15 06:42:28 UTC
accept while testing a patch...
Comment 28 Jorn Wolfgang Rennecke 2005-12-16 14:37:29 UTC
(In reply to comment #25)
> Smarter folks than me (iant ;-) suggest that "a multi-word rotate will normally
> need all the input bits when setting any of the output bits", so the entire
> no-conflict thing doesn't make sense here.

This is not really relevant, since the the insns that are using other words than
the result word are computing their (intermediate) results into pseudos, and
thus get emitted in front of (i.e. outside) the REG_NO_CONFLICT block.
combine is really to blame for combining insns into a REG_NO_CONFLICT block
in a way to invalidate the REG_NO_CONFLICT note without removing the note.
Comment 29 Steven Bosscher 2005-12-16 22:19:13 UTC
Subject: Bug 23837

Author: steven
Date: Fri Dec 16 22:19:09 2005
New Revision: 108690

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108690
Log:
	PR rtl-optimization/23837
	* optabs.c (expand_binop): For a multi-word rotate, never emit
	a REG_NO_CONFLICT block.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/optabs.c

Comment 30 Steven Bosscher 2005-12-16 22:25:09 UTC
Should be fixed on the trunk.

I guess the patch could be backported to all active branches, the bug is latent but present in all GCCs releases since 2.early.

Comment 31 John David Anglin 2005-12-17 17:27:16 UTC
Subject: Bug 23837

Author: danglin
Date: Sat Dec 17 17:27:11 2005
New Revision: 108718

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108718
Log:
	PR rtl-optimization/23837
	* optabs.c (expand_binop): For a multi-word rotate, never emit
	a REG_NO_CONFLICT block.


Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/optabs.c

Comment 32 John David Anglin 2005-12-17 17:30:00 UTC
Subject: Bug 23837

Author: danglin
Date: Sat Dec 17 17:29:57 2005
New Revision: 108719

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108719
Log:
	PR rtl-optimization/23837
	* optabs.c (expand_binop): For a multi-word rotate, never emit
	a REG_NO_CONFLICT block.


Modified:
    branches/gcc-4_0-branch/gcc/ChangeLog
    branches/gcc-4_0-branch/gcc/optabs.c

Comment 33 Andrew Pinski 2005-12-17 17:30:33 UTC
Fixed also in 4.1.0.
Comment 34 Andrew Pinski 2005-12-17 17:31:10 UTC
Fixed overall.
Comment 35 John David Anglin 2005-12-17 17:32:39 UTC
Subject: Bug 23837

Author: danglin
Date: Sat Dec 17 17:32:36 2005
New Revision: 108721

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108721
Log:
	PR rtl-optimization/23837
	* optabs.c (expand_binop): For a multi-word rotate, never emit
	a REG_NO_CONFLICT block.


Modified:
    branches/gcc-3_4-branch/gcc/ChangeLog
    branches/gcc-3_4-branch/gcc/optabs.c

Comment 36 Jorn Wolfgang Rennecke 2005-12-19 12:01:18 UTC
While the patch will stop the bug from being triggered by the test case,
it does not fix th underlying combiner bug.

Any REG_NO_CONFLICT block could potentially be rendered invalid by an
inappropriate instruction combination.  The REG_NO_CONFLICT note must be removed
when such a combination is made.
Comment 37 Steven Bosscher 2005-12-19 12:36:12 UTC
That would be a different bug, and the fix would still be to not have a no-conflict block to begin with.
Comment 38 joern.rennecke@st.com 2005-12-19 13:34:35 UTC
Subject: Re:  [4.0 regression] Wrong code with REG_NO_CONFLICT notes (caused by combine)

steven at gcc dot gnu dot org wrote:

>------- Comment #37 from steven at gcc dot gnu dot org  2005-12-19 12:36 -------
>That would be a different bug, and the fix would still be to not have a
>no-conflict block to begin with.
>  
>
No, it's the same bug.  combine rendering REG_NO_CONFLICT notes invalid.
And you seem to advocate  removing REG_NO_CLONFLICT notes from gcc 
altogether here.