[forwarded from http://bugs.debian.org/326026] falk@paer:~% cat test.c void abort(void); unsigned long long f(unsigned long long x) { return ((x >> 8) | (x << 56)) ^ ((x >> 48) | (x << 16)) ^ (x >> 1); } int main() { volatile unsigned long long v = 0x1122334455667788ULL; if (f(v) != 0xb3c46ef7196e4c91ULL) abort(); return 0; } falk@paer:~% gcc-4.0 -O1 -fno-schedule-insns test.c && ./a.out falk@paer:~% gcc-4.0 -O1 -fschedule-insns test.c && ./a.out zsh: abort (core dumped) ./a.out Reproduced with 4.0.2 20050821 on hppa-linux and with 4.1.0 20050705 on i686-linux.
Confirmed but works with "4.0.0 20050225".
At least on i?86, this seems to be introduced by reload, .lreg dump is still sane.
A regression hunt on i686-linux showed the failure starting with this patch from amylaar@gcc.gnu.org: http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00640.html
(In reply to comment #3) > A regression hunt on i686-linux showed the failure starting with this patch > from amylaar@gcc.gnu.org: > > http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00640.html If the lreg dump is still sane that indicates that the problem was not caused, by the patch, merely a latent bug was triggered. emit_no_conflict_block should not be called during reload. to double-check, you can set a breakpoint on emit_no_conflict_block when reload starts. However, while looking for possible connections, I found that emit_libcall_block has a similar bug as emit_no_conflict_block used to have, i.e. it ignores the result of SETs if an insn has more than one. FWIW, emit_libcall_block is used in i386.c in legitimize_tls_address, which in turn is used in ix86_expand_move, which is used by the i386.md move expanders, which are used by emit_move_insn_1, which is used by gen_move_insn, which is used by gen_reload.
Confirmed with gcc version 4.1.0 20050919 (experimental) on i686-pc-linux-gnu.
(In reply to comment #3) > A regression hunt on i686-linux showed the failure starting with this patch > from amylaar@gcc.gnu.org: > > http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00640.html The i386.md x86_shrd_1 pattern is bogus, is should use lshiftrt, not ashiftrt. However, that is not the reason for this regression. The reason is that we have a bogus REG_NO_CONFLICT block. The generic rotate expander expects emit_no_conflict_block to re-order its instructions, but that is now suppresed by a match between clobbers of the flags register.
Created attachment 9783 [details] proposed patch
Subject: RFA: fix PR 23837 (Re: [4.0/4.1 regression] Wrong code with -fschedule-insns) I have regtested the attached patch on i686-pc-linux-gnu in mainline from 2005-09-19 18:00 UTC. 2005-09-22 J"orn Rennecke <joern.rennecke@st.com> PR rtl-optimization/23837 * optabs.c (no_conflict_move_test): Don't set must_stay for a clobber / clobber match between dest and p->first. Index: optabs.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/optabs.c,v retrieving revision 1.291 diff -p -r1.291 optabs.c *** optabs.c 15 Sep 2005 21:51:13 -0000 1.291 --- optabs.c 22 Sep 2005 17:31:13 -0000 *************** no_conflict_move_test (rtx dest, rtx set *** 3021,3027 **** return; /* If this insn sets / clobbers a register that feeds one of the insns already in the list, this insn has to stay too. */ ! else if (reg_mentioned_p (dest, PATTERN (p->first)) || reg_used_between_p (dest, p->first, p->insn) /* Likewise if this insn depends on a register set by a previous insn in the list. */ --- 3021,3028 ---- return; /* If this insn sets / clobbers a register that feeds one of the insns already in the list, this insn has to stay too. */ ! else if (reg_overlap_mentioned_p (dest, PATTERN (p->first)) ! || (CALL_P (p->first) && (find_reg_fusage (p->first, USE, dest))) || reg_used_between_p (dest, p->first, p->insn) /* Likewise if this insn depends on a register set by a previous insn in the list. */
Subject: Bug 23837 CVSROOT: /cvs/gcc Module name: gcc Changes by: amylaar@gcc.gnu.org 2005-09-23 12:05:58 Modified files: gcc : ChangeLog optabs.c Log message: PR rtl-optimization/23837 * optabs.c (no_conflict_move_test): Don't set must_stay for a clobber / clobber match between dest and p->first. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10007&r2=2.10008 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&r1=1.291&r2=1.292
Only a hppa-linux bug now, which means this should be targetted at 4.2.0.
How do you know this is hppa-linux only now? There's a bit if information missing about how you got to that conclusion. Is there some simple way to test this bug on HPPA? (Maybe add a HPPA maintainer to the CC: list? ;-)
Folks, what's up with this bug? Is it HPPA-only now? Can someone comment on the status of this bug please?
Fails on hppa2.0w-hp-hpux11.11 with 4.0.0, 4.0.1 and 4.0.2. However, it doesn't fail using 3.4.4.
This also effects hppa2.0w-hp-hpux11.11 which means this effects a primary target.
I can reproduce this on hppa2.0-suse-linux-gnu with the "4.2-20051210" snapshot.
(In reply to comment #15) > I can reproduce this on hppa2.0-suse-linux-gnu with the "4.2-20051210" > snapshot. Could you make a full set of debugging dumps (i.e. from compiling with -da) available somewhere?
Created attachment 10461 [details] Instruction stream (stripped) before scheduling
Created attachment 10462 [details] Instruction stream (stripped) after scheduling
The dumps before and after scheduling look OK to me. There are three groups of instructions for libcalls: 1) 19-14-18-20 inputs: regs 95, 99, and 102 (all of them for x) result: reg 97 clobbers: nothing 2) 29-24-28-30 inputs: regs 95, 99, and 102 result: reg 104 clobbers: nothing 3) 37-34-35-36-38 inputs: regs 95, and 113 result: reg 112 clobbers: nothing Surrounding those, you have the following additional insns: * insn 6 loads %r25 into reg 95 * insn 11 and insn 15 load the low and high parts of 95 into new regs 99 and 102, so they must go before groups 1 and 2 * insn 33 sets reg 113 <- reg 102 << 31, so it goes before group 3 * insn 31 sets reg 111 <- reg 97 XOR reg 104, so groups 1 and two need to be scheduled before this instruction * insn 46 sets reg %28 <- reg 111 XOR reg 112, so it goes after insn 31 and also after group 3. Before scheduling, the order is, insn 6 --> out 95 insn 11 --> out 99, in 95 insn 15 --> out 102, in 95 group 1 --> out 97, in 95 99 102 group 2 --> out 104, in 95 99 102 insn 31 --> out 111, in 97 104 insn 33 --> out 113, in 102 group 3 --> out 112, in 95 113 insn 46 --> out %26, in 111 112 After scheduling, the order is, insn 6 --> out 95 insn 15 --> out 102, in 95 insn 33 --> out 113, in 102 insn 11 --> out 99, in 95 group 3 --> out 112, in 95 113 group 1 --> out 97, in 95 99 102 group 2 --> out 104, in 95 99 102 insn 31 --> out 111, in 97 104 insn 46 --> out %26, in 111 112 The groups are scheduled properly as groups, and the required regs are all scheduled before the groups that they are needed in. In short, unless I'm getting it all wrong, the scheduler is not to blame here.
Created attachment 10463 [details] a full set of debugging dumps Re. comment #16, sorry, I didn't read it until after going through the scheduler dumps. Here is the complete set of dumps.
(In reply to comment #17) > Created an attachment (id=10461) [edit] > Instruction stream (stripped) before scheduling > (insn/s 24 0 (set (subreg:SI (reg:DI 104) 0) (ior:SI (ashift:SI (reg:SI 102) (const_int 16 [0x10])) (lshiftrt:SI (subreg:SI (reg/v:DI 95 [ x ]) 4) (const_int 16 [0x10])))) 181 {*pa.md:6669} (insn_list:REG_DEP_ANTI 29 (nil)) (expr_list:REG_NO_CONFLICT (reg/v:DI 95 [ x ]) (nil))) (insn/s 28 0 (set (subreg:SI (reg:DI 104) 4) (ior:SI (ashift:SI (reg:SI 99 [ x ]) (const_int 16 [0x10])) (lshiftrt:SI (subreg:SI (reg/v:DI 95 [ x ]) 0) (const_int 16 [0x10])))) 181 {*pa.md:6669} (insn_list:REG_DEP_ANTI 24 (nil)) (expr_list:REG_NO_CONFLICT (reg/v:DI 95 [ x ]) (nil))) If (reg/v:DI 95 [ x ]) and (reg:DI 104) are assigned to the same hard register, (subreg:SI (reg/v:DI 95 [ x ]) 0) gets clobbered in insn 24, before it is used in insn 28. Therefore, the REG_NO_CONFLICT block containing these two instructions is bogus.
(In reply to comment #20) > Created an attachment (id=10463) [edit] > a full set of debugging dumps > > Re. comment #16, sorry, I didn't read it until after going through the > scheduler dumps. Here is the complete set of dumps. > The REG_NO_CONFLICT block in f from insn 29 to insn 30 is still sound in t.c.26.life1. In t.c.27.combine, it is bogus. Apparently, the problem arises because insns 25 and 27 have been combined into insn 28. Suppressing this combination or removing the violated REG_NO_CONFLICT note when the combination is done should solve this problem. It appears that local-alloc has different algorithms for combining pseudos with and without REG_NO_CONFLICT notes. Therefore, I suspect that having a REG_NO_CONFLICT note on only some of the insns of a REG_NO_CONFLICT block is worse than not having the note at all inside that block. Thus, if the REG_NO_CONFLICT note is removed, it might be best to remove it in all insns of the affected REG_NO_CONLFICT block, and dissolve the block if that was the last REG_NO_CONFLICT note.
lreg decides to do this: ;; Register 95 in 25. ;; Register 97 in 28. ;; Register 99 in 22. ;; Register 102 in 21. ;; Register 104 in 25. ;; Register 111 in 28. ;; Register 112 in 19. ;; Register 113 in 28. and reload/greg agree (note the missing page breaks in the .greg dump btw): ;; Register dispositions: 95 in 25 97 in 28 99 in 22 102 in 21 104 in 25 111 in 28 112 in 19 113 in 28 So reg 95 and reg 104 are indeed assigned the same hard register.
I think we can blame combine for this bug.
Smarter folks than me (iant ;-) suggest that "a multi-word rotate will normally need all the input bits when setting any of the output bits", so the entire no-conflict thing doesn't make sense here. So, let's not do that: --- optabs.c 2005-12-15 01:49:23.000000000 +0100 +++ optabs.c.jj 2005-12-15 01:49:55.000000000 +0100 @@ -1525,7 +1525,14 @@ expand_binop (enum machine_mode mode, op else equiv_value = 0; - emit_insn (insns); + /* We can't make this a no conflict block if this is a word swap, + because the word swap case fails if the input and output values + are in the same register. */ + if (shift_count != BITS_PER_WORD) + emit_no_conflict_block (insns, target, op0, op1, equiv_value); + else + emit_insn (insns); + return target; }
Needless to say really, but the patchlet in comment #25 is inverted...
accept while testing a patch...
(In reply to comment #25) > Smarter folks than me (iant ;-) suggest that "a multi-word rotate will normally > need all the input bits when setting any of the output bits", so the entire > no-conflict thing doesn't make sense here. This is not really relevant, since the the insns that are using other words than the result word are computing their (intermediate) results into pseudos, and thus get emitted in front of (i.e. outside) the REG_NO_CONFLICT block. combine is really to blame for combining insns into a REG_NO_CONFLICT block in a way to invalidate the REG_NO_CONFLICT note without removing the note.
Subject: Bug 23837 Author: steven Date: Fri Dec 16 22:19:09 2005 New Revision: 108690 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108690 Log: PR rtl-optimization/23837 * optabs.c (expand_binop): For a multi-word rotate, never emit a REG_NO_CONFLICT block. Modified: trunk/gcc/ChangeLog trunk/gcc/optabs.c
Should be fixed on the trunk. I guess the patch could be backported to all active branches, the bug is latent but present in all GCCs releases since 2.early.
Subject: Bug 23837 Author: danglin Date: Sat Dec 17 17:27:11 2005 New Revision: 108718 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108718 Log: PR rtl-optimization/23837 * optabs.c (expand_binop): For a multi-word rotate, never emit a REG_NO_CONFLICT block. Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/optabs.c
Subject: Bug 23837 Author: danglin Date: Sat Dec 17 17:29:57 2005 New Revision: 108719 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108719 Log: PR rtl-optimization/23837 * optabs.c (expand_binop): For a multi-word rotate, never emit a REG_NO_CONFLICT block. Modified: branches/gcc-4_0-branch/gcc/ChangeLog branches/gcc-4_0-branch/gcc/optabs.c
Fixed also in 4.1.0.
Fixed overall.
Subject: Bug 23837 Author: danglin Date: Sat Dec 17 17:32:36 2005 New Revision: 108721 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108721 Log: PR rtl-optimization/23837 * optabs.c (expand_binop): For a multi-word rotate, never emit a REG_NO_CONFLICT block. Modified: branches/gcc-3_4-branch/gcc/ChangeLog branches/gcc-3_4-branch/gcc/optabs.c
While the patch will stop the bug from being triggered by the test case, it does not fix th underlying combiner bug. Any REG_NO_CONFLICT block could potentially be rendered invalid by an inappropriate instruction combination. The REG_NO_CONFLICT note must be removed when such a combination is made.
That would be a different bug, and the fix would still be to not have a no-conflict block to begin with.
Subject: Re: [4.0 regression] Wrong code with REG_NO_CONFLICT notes (caused by combine) steven at gcc dot gnu dot org wrote: >------- Comment #37 from steven at gcc dot gnu dot org 2005-12-19 12:36 ------- >That would be a different bug, and the fix would still be to not have a >no-conflict block to begin with. > > No, it's the same bug. combine rendering REG_NO_CONFLICT notes invalid. And you seem to advocate removing REG_NO_CLONFLICT notes from gcc altogether here.