Bug 46614

Summary: [4.5 Regression] gcc.dg/vect/vect-strided-u8-i8-gap4.c FAILs with -fno-rename-registers -fsched2-use-superblocks
Product: gcc Reporter: Zdenek Sojka <zsojka>
Component: rtl-optimizationAssignee: Jakub Jelinek <jakub>
Status: RESOLVED FIXED    
Severity: normal CC: hjl.tools, rth
Priority: P2 Keywords: wrong-code
Version: 4.5.1   
Target Milestone: 4.5.2   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40552
Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu
Build: Known to work: 4.4.6, 4.6.0
Known to fail: 4.5.0, 4.5.2 Last reconfirmed: 2010-11-23 01:52:12
Attachments: auto-reduced testcase
more reduced testcase
pr46614.c
gcc46-pr46614.patch

Description Zdenek Sojka 2010-11-22 23:42:23 UTC
Flags:
-O2 -fno-rename-registers -fsched2-use-superblocks -ftree-vectorize -funroll-loops
or
-O -fgcse -fno-rename-registers -fsched2-use-superblocks -fschedule-insns2 -ftree-vectorize -funroll-all-loops

Output:
$ gcc -O -fgcse -fno-rename-registers -fsched2-use-superblocks -fschedule-insns2 -ftree-vectorize -funroll-loops vect-strided-u8-i8-gap4.i
$ valgrind -q ./a.out 
==7908== Conditional jump or move depends on uninitialised value(s)
==7908==    at 0x400CBB: main1 (vect-strided-u8-i8-gap4.c:45)
==7908==    by 0x401862: main (vect-strided-u8-i8-gap4.c:96)
==7908== 
Aborted

Tested revisions:
r167054 - fail
r158095 - fail
r153685 - OK
4.5 r166509 - fail
4.4 r166509 - OK
Comment 1 Zdenek Sojka 2010-11-22 23:44:09 UTC
Created attachment 22489 [details]
auto-reduced testcase

$ gcc -O -fgcse -fno-rename-registers -fsched2-use-superblocks -fschedule-insns2 -ftree-vectorize -funroll-loops testcase.c
$ ./a.out
Aborted
Comment 2 Zdenek Sojka 2010-11-22 23:47:55 UTC
Created attachment 22490 [details]
more reduced testcase

This one needs less flags, but it gives warnings with -Wall -W - though they should be harmless.

$ gcc -O -fno-rename-registers -fsched2-use-superblocks -fschedule-insns2 -funroll-loops testcase2.c -Wall -W
testcase2.c: In function 'main1':
testcase2.c:30:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
testcase2.c:31:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
testcase2.c:38:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
$ ./a.out 
Aborted
Comment 3 H.J. Lu 2010-11-23 01:52:12 UTC
It is caused by revision 154667:

http://gcc.gnu.org/ml/gcc-cvs/2009-11/msg00890.html
Comment 4 Jakub Jelinek 2010-11-23 09:42:26 UTC
Created attachment 22492 [details]
pr46614.c

Adjusted testcase which reports no warnings.  Looking into it...
Comment 5 Uroš Bizjak 2010-11-23 10:01:03 UTC
It looks that sched2 pass is to blame. For the original case, we have:

_.split4:

;; Function main1 (main1)
...
  207 [sp:DI+0x30]=xmm8:V16QI
      REG_DEAD: xmm8:V16QI
 1158 xmm1:V16QI=xmm6:V16QI
  208 xmm1:V16QI=vec_select
      REG_EQUIV: [frame:DI-0x60]
  209 [sp:DI+0x40]=xmm1:V16QI
      REG_DEAD: xmm1:V16QI
  210 xmm6:V16QI=vec_select
      REG_DEAD: xmm5:V16QI
      REG_EQUIV: [frame:DI-0x50]
  211 [sp:DI+0x50]=xmm6:V16QI
      REG_DEAD: xmm6:V16QI
...
  468 NOTE_INSN_BASIC_BLOCK
 1327 r12:SI=zero_extend([di:DI+0x35])
 1328 r13:SI=zero_extend([sp:DI+0x50])
  471 {bp:SI=bp:SI+r12:SI;clobber flags:CC;}
...

_.sched2:

;; Function main1 (main1)
...
 1406 NOTE_INSN_PROLOGUE_END
  223 dx:QI=[di:DI+0x2]
  221 ax:QI=[di:DI+0x1]
    8 xmm2:V16QI=unspec[[di:DI+0x11]] 40
 1339 r12:SI=zero_extend([sp:DI+0x48])
 1328 r13:SI=zero_extend([sp:DI+0x50])
    6 xmm1:V16QI=unspec[[di:DI+0x1]] 40
...
  211 [sp:DI+0x50]=xmm6:V16QI
      REG_DEAD: xmm6:V16QI
 1383 r10:SI=zero_extend([sp:DI+0x28])
  209 [sp:DI+0x40]=xmm1:V16QI
      REG_DEAD: xmm1:V16QI
...

See how (insn 1339) and (insn 1328) passed (insn 209) and (insn 211).
Comment 6 Jakub Jelinek 2010-11-23 13:07:37 UTC
Yeah, on the #c4 testcase it works with additional
--param max-pending-list-length=33
(compared to the default 32), so it is related to the flush of pending list.
Comment 7 Jakub Jelinek 2010-11-23 14:47:43 UTC
So, the bug is that we have two kinds of INSNs in deps->last_pending_memory_flush
1) jumps added there via:
            deps->last_pending_memory_flush
              = alloc_INSN_LIST (insn, deps->last_pending_memory_flush);
   in deps_analyze_insn
2) any instructions added there via:
      deps->last_pending_memory_flush = alloc_INSN_LIST (insn, NULL_RTX);
   in flush_pending_lists

This list is used to add dependencies in flush_pending_lists (correct), sched_analyze_1 (for writes, also correct, and one correct for write spot in sched_analyze_insn) and then in sched_analyze_2 and sched_analyze_insn for reads/DEBUG_INSNs, where it differentiates between 1) and 2) above using JUMP_P test on the insn in the last_pending_memory_flush list.  E.g. in sched_analyze_2
it adds deps always for the (assumed) category 2) and only adds something for category 1) if it the memory can trap.  The problem is that JUMP_P is not a good
test between 1) and 2), as flush_pending_lists can be also called on JUMP_Ps:
      /* Flush pending lists on jumps, but not on speculative checks.  */
      if (JUMP_P (insn) && !(sel_sched_p ()
                             && sel_insn_is_speculation_check (insn)))
        flush_pending_lists (deps, insn, true, true);
and (what happens here):
      if (!deps->readonly
          && JUMP_P (insn)
          && !(sel_sched_p ()
               && sel_insn_is_speculation_check (insn)))
        {
          /* Keep the list a reasonable size.  */
          if (deps->pending_flush_length++ > MAX_PENDING_LIST_LENGTH)
            flush_pending_lists (deps, insn, true, true);
          else
            deps->last_pending_memory_flush
              = alloc_INSN_LIST (insn, deps->last_pending_memory_flush);
        }

What happens on the testcase is that pending_flush_length gets too large and
so we flush_pending_lists and flushes everything, adding needed dependencies
and sets deps->last_pending_memory_flush to contain just that jump.
A few insns later we process a read and go through last_pending_memory_flush,
see it only contains a JUMP_P and that the memory can't trap and don't add any dependency.  But in this case, because the jump was added as result of flush_pending_lists and thus represents all the memory writes before it as well,
it is wrong, we need to add a dependency.

I guess we need to keep the information whether a jump was added as part of flush_pending_lists or for 1) somewhere, wonder if mode of the INSN_LIST wouldn't be usable in this case for it.
Comment 8 Jakub Jelinek 2010-11-23 15:12:54 UTC
Created attachment 22497 [details]
gcc46-pr46614.patch

Untested fix that cures this testcase.
Comment 9 Jakub Jelinek 2010-11-24 16:56:49 UTC
Author: jakub
Date: Wed Nov 24 16:56:44 2010
New Revision: 167121

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167121
Log:
	PR rtl-optimization/46614
	* sched-deps.c (NON_FLUSH_JUMP_KIND, NON_FLUSH_JUMP_P): Define.
	 (deps_analyze_insn): Mark JUMP_INSNs in
	last_pending_memory_flush that weren't added through
	flush_pending_lists with NON_FLUSH_JUMP_KIND.
	(sched_analyze_2, sched_analyze_insn): Check NON_FLUSH_JUMP_P
	on INSN_LIST instead of JUMP_P check on its operand.
	* sched-rgn.c (concat_INSN_LIST): Copy over REG_NOTE_KIND.

	* gcc.dg/pr46614.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr46614.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/sched-deps.c
    trunk/gcc/sched-rgn.c
    trunk/gcc/testsuite/ChangeLog
Comment 10 Jakub Jelinek 2010-11-24 17:08:26 UTC
Fixed on the trunk so far.
Comment 11 Jakub Jelinek 2010-12-07 15:25:53 UTC
Author: jakub
Date: Tue Dec  7 15:25:48 2010
New Revision: 167546

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167546
Log:
	Backport from mainline
	2010-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/46614
	* sched-deps.c (NON_FLUSH_JUMP_KIND, NON_FLUSH_JUMP_P): Define.
	 (deps_analyze_insn): Mark JUMP_INSNs in
	last_pending_memory_flush that weren't added through
	flush_pending_lists with NON_FLUSH_JUMP_KIND.
	(sched_analyze_2, sched_analyze_insn): Check NON_FLUSH_JUMP_P
	on INSN_LIST instead of JUMP_P check on its operand.
	* sched-rgn.c (concat_INSN_LIST): Copy over REG_NOTE_KIND.

	PR rtl-optimization/46614
	* gcc.dg/pr46614.c: New test.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/pr46614.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/sched-deps.c
    branches/gcc-4_5-branch/gcc/sched-rgn.c
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Comment 12 Jakub Jelinek 2010-12-07 15:45:08 UTC
Fixed.