21803 – [ia64] gcc produces really odd predicated code

Bug 21803 - [ia64] gcc produces really odd predicated code

Summary: [ia64] gcc produces really odd predicated code

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.1.0

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Duplicates (1):	42496 (view as bug list)
Depends on:	20070
Blocks:
	Show dependency tree / graph

Reported:	2005-05-28 23:48 UTC by Steven Bosscher
Modified:	2010-05-28 10:24 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:	ia64
Build:
Known to work:
Known to fail:
Last reconfirmed:	2005-10-24 03:19:46

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Steven Bosscher 2005-05-28 23:48:22 UTC

The following is reduced from a Gelato superblock scheduling test case 
that produces similar interesting predicated code. 
See http://gcc.gelato.org/SuperblockScheduling_2fInvestigation20050401 
for the original. 
 
$ cat t.c 
extern void exit(int) __attribute__((noreturn)); 
 
char *inbuf, *outbuf; 
 
int __attribute__((noreturn)) 
foo (text_len) 
{ 
  char c; 
  int i; 
 
  for (i = 0; i < text_len; i++) 
    { 
      c = inbuf[i]; 
 
      if (c >= 'a' && c <= 'm') 
        c += 13; 
      else 
        c -= 13; 
 
      outbuf[i] = c; 
    } 
 
  exit (0); 
} 
 
$ ./cc1 -O2 t.c 
$ cat t.s 
        .file   "t.c" 
        .pred.safe_across_calls p1-p5,p16-p63 
        .text 
        .align 16 
        .global foo# 
        .proc foo# 
foo: 
        .prologue 12, 33 
        .mmi 
        .save ar.pfs, r34 
        alloc r34 = ar.pfs, 1, 4, 1, 0 
        adds r14 = -1, r32 
        .save ar.lc, r36 
        mov r36 = ar.lc 
        .mmi 
        addl r18 = @ltoffx(inbuf#), r1 
        addl r17 = @ltoffx(outbuf#), r1 
        cmp4.ge p6, p7 = 0, r32 
        ;; 
 
        .mmb 
        addp4 r14 = r14, r0 
        ld8.mov r18 = [r18], inbuf# 
        nop 0 
        .mmi 
        mov r16 = r0 
        ld8.mov r17 = [r17], outbuf# 
        .save rp, r33 
        mov r33 = b0 
        .body 
        .mib 
        nop 0 
        nop 0 
        (p6) br.cond.dpnt .L2 
        ;; 
        .mii 
        nop 0 
        mov ar.lc = r14 
        nop 0 
.L4: 
        .mmi 
        ld8 r14 = [r18] 
        ;; 
        add r14 = r16, r14 
        nop 0 
        ;; 
        .mmi 
        ld1 r14 = [r14] 
        ;; 
        nop 0 
        sxt1 r14 = r14 
        ;; 
        .mii 
        mov r15 = r14 
        adds r14 = -97, r14 
        ;; 
        zxt1 r14 = r14 
        ;; 
        .mmi 
        cmp4.ltu p6, p7 = 12, r14 
        ;; 
        (p7) adds r14 = 13, r15 
        (p6) adds r14 = -13, r15 
        ;; 
        .mii 
        nop 0 
        (p7) sxt1 r14 = r14 
        (p6) sxt1 r14 = r14 
        ;; 
        .mmi 
        nop 0 
        (p7) mov r15 = r14 
        (p6) mov r15 = r14 
        .mmi 
        ld8 r14 = [r17] 
        ;; 
        add r14 = r16, r14 
        adds r16 = 1, r16 
        ;; 
        .mib 
        st1 [r14] = r15 
        nop 0 
        br.cloop.sptk.few .L4 
.L2: 
        .mib 
        nop 0 
        mov r37 = r0 
        br.call.sptk.many b0 = exit# 
        ;; 
        break.f 0 
        ;; 
        .endp foo# 
        .common inbuf#,8,8 
        .common outbuf#,8,8 
        .ident  "GCC: (GNU) 4.1.0 20050528 (experimental)" 
 
 
Notice this incredibly charming predicated code in there: 
 
        .mmi 
        cmp4.ltu p6, p7 = 12, r14 
        ;; 
        (p7) adds r14 = 13, r15 
        (p6) adds r14 = -13, r15 
        ;; 
        .mii 
        nop 0 
        (p7) sxt1 r14 = r14 
        (p6) sxt1 r14 = r14 
        ;; 
        .mmi 
        nop 0 
        (p7) mov r15 = r14 
        (p6) mov r15 = r14

Comment 1 Steven Bosscher 2005-05-29 00:02:23 UTC

The resulting code is much better with --param min-crossjump-insns=1.

Comment 2 Steven Bosscher 2005-06-26 13:12:25 UTC

One possible fix would be to look for common tail (and head?) sequences 
in cond_exec_process_if_block.  The code for tail merging in cfgcleanup.c 
could be used for this.

Comment 3 Steven Bosscher 2009-12-27 00:45:55 UTC

*** Bug 42496 has been marked as a duplicate of this bug. ***

Comment 4 Bernd Schmidt 2010-03-31 21:36:12 UTC

Patch at
  http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01536.html

Only tested on ARM (same issue as PR42496), but should also solve the ia64 problem.

Comment 5 Andrew Pinski 2010-03-31 21:37:07 UTC

I had saw this too and I thought I filed a bug about this same problem but I cannot find it right now.

Comment 6 Steven Bosscher 2010-04-01 16:17:38 UTC

I am testing this patch on ia64 now.

Comment 7 Steven Bosscher 2010-04-01 17:57:10 UTC

With the patch linked to in commment #4, I get an ICE on ia64:

../../trunk/gcc/fortran/trans-intrinsic.c: In function 'gfc_conv_intrinsic_minmaxloc':
../../trunk/gcc/fortran/trans-intrinsic.c:2529:1: internal compiler error: in cond_exec_process_insns, at ifcvt.c:273
Please submit a full bug report,
with preprocessed source if appropriate.


This is ifcvt.c of trunk r157868. The line where the ICE happens is this:

      gcc_assert(NONJUMP_INSN_P (insn) || CALL_P (insn));

Comment 8 Bernd Schmidt 2010-04-14 20:42:15 UTC

Subject: Bug 21803

Author: bernds
Date: Wed Apr 14 20:42:02 2010
New Revision: 158357

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158357
Log:
gcc/
	PR target/21803
	* ifcvt.c (cond_exec_process_if_block): Look for identical sequences
	at the start and end of the then/else blocks, and omit them from the
	conversion.
	* cfgcleanup.c (flow_find_cross_jump): No longer static.  Remove MODE
	argument; all callers changed.  Pass zero to old_insns_match_p instead.
	(flow_find_head_matching_sequence): New function.
	(old_insns_match_p): Check REG_EH_REGION notes for calls.
	* basic-block.h (flow_find_cross_jump,
	flow_find_head_matching_sequence): Declare functions.

gcc/testsuite/
	PR target/21803
	* gcc.target/arm/pr42496.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/arm/pr42496.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/basic-block.h
    trunk/gcc/cfgcleanup.c
    trunk/gcc/ifcvt.c
    trunk/gcc/testsuite/ChangeLog

Comment 9 Steven Bosscher 2010-05-28 10:24:23 UTC