The following is reduced from a Gelato superblock scheduling test case that produces similar interesting predicated code. See http://gcc.gelato.org/SuperblockScheduling_2fInvestigation20050401 for the original. $ cat t.c extern void exit(int) __attribute__((noreturn)); char *inbuf, *outbuf; int __attribute__((noreturn)) foo (text_len) { char c; int i; for (i = 0; i < text_len; i++) { c = inbuf[i]; if (c >= 'a' && c <= 'm') c += 13; else c -= 13; outbuf[i] = c; } exit (0); } $ ./cc1 -O2 t.c $ cat t.s .file "t.c" .pred.safe_across_calls p1-p5,p16-p63 .text .align 16 .global foo# .proc foo# foo: .prologue 12, 33 .mmi .save ar.pfs, r34 alloc r34 = ar.pfs, 1, 4, 1, 0 adds r14 = -1, r32 .save ar.lc, r36 mov r36 = ar.lc .mmi addl r18 = @ltoffx(inbuf#), r1 addl r17 = @ltoffx(outbuf#), r1 cmp4.ge p6, p7 = 0, r32 ;; .mmb addp4 r14 = r14, r0 ld8.mov r18 = [r18], inbuf# nop 0 .mmi mov r16 = r0 ld8.mov r17 = [r17], outbuf# .save rp, r33 mov r33 = b0 .body .mib nop 0 nop 0 (p6) br.cond.dpnt .L2 ;; .mii nop 0 mov ar.lc = r14 nop 0 .L4: .mmi ld8 r14 = [r18] ;; add r14 = r16, r14 nop 0 ;; .mmi ld1 r14 = [r14] ;; nop 0 sxt1 r14 = r14 ;; .mii mov r15 = r14 adds r14 = -97, r14 ;; zxt1 r14 = r14 ;; .mmi cmp4.ltu p6, p7 = 12, r14 ;; (p7) adds r14 = 13, r15 (p6) adds r14 = -13, r15 ;; .mii nop 0 (p7) sxt1 r14 = r14 (p6) sxt1 r14 = r14 ;; .mmi nop 0 (p7) mov r15 = r14 (p6) mov r15 = r14 .mmi ld8 r14 = [r17] ;; add r14 = r16, r14 adds r16 = 1, r16 ;; .mib st1 [r14] = r15 nop 0 br.cloop.sptk.few .L4 .L2: .mib nop 0 mov r37 = r0 br.call.sptk.many b0 = exit# ;; break.f 0 ;; .endp foo# .common inbuf#,8,8 .common outbuf#,8,8 .ident "GCC: (GNU) 4.1.0 20050528 (experimental)" Notice this incredibly charming predicated code in there: .mmi cmp4.ltu p6, p7 = 12, r14 ;; (p7) adds r14 = 13, r15 (p6) adds r14 = -13, r15 ;; .mii nop 0 (p7) sxt1 r14 = r14 (p6) sxt1 r14 = r14 ;; .mmi nop 0 (p7) mov r15 = r14 (p6) mov r15 = r14
The resulting code is much better with --param min-crossjump-insns=1.
One possible fix would be to look for common tail (and head?) sequences in cond_exec_process_if_block. The code for tail merging in cfgcleanup.c could be used for this.
*** Bug 42496 has been marked as a duplicate of this bug. ***
Patch at http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01536.html Only tested on ARM (same issue as PR42496), but should also solve the ia64 problem.
I had saw this too and I thought I filed a bug about this same problem but I cannot find it right now.
I am testing this patch on ia64 now.
With the patch linked to in commment #4, I get an ICE on ia64: ../../trunk/gcc/fortran/trans-intrinsic.c: In function 'gfc_conv_intrinsic_minmaxloc': ../../trunk/gcc/fortran/trans-intrinsic.c:2529:1: internal compiler error: in cond_exec_process_insns, at ifcvt.c:273 Please submit a full bug report, with preprocessed source if appropriate. This is ifcvt.c of trunk r157868. The line where the ICE happens is this: gcc_assert(NONJUMP_INSN_P (insn) || CALL_P (insn));
Subject: Bug 21803 Author: bernds Date: Wed Apr 14 20:42:02 2010 New Revision: 158357 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158357 Log: gcc/ PR target/21803 * ifcvt.c (cond_exec_process_if_block): Look for identical sequences at the start and end of the then/else blocks, and omit them from the conversion. * cfgcleanup.c (flow_find_cross_jump): No longer static. Remove MODE argument; all callers changed. Pass zero to old_insns_match_p instead. (flow_find_head_matching_sequence): New function. (old_insns_match_p): Check REG_EH_REGION notes for calls. * basic-block.h (flow_find_cross_jump, flow_find_head_matching_sequence): Declare functions. gcc/testsuite/ PR target/21803 * gcc.target/arm/pr42496.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr42496.c Modified: trunk/gcc/ChangeLog trunk/gcc/basic-block.h trunk/gcc/cfgcleanup.c trunk/gcc/ifcvt.c trunk/gcc/testsuite/ChangeLog
.