This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFA: Improve doloop_begin support


ARCompact is one of the architectures that have zero-overhead loops that
are initiated with an instruction at the loop top.  There is a way to
set up loops before jumping into their middle, by poking values into
control registers, but that method is more costly and thus only pays off
with a higher minimum iteration count.
Thus, the iteration count to doloop_end isn't all that helpful without an
indication if the loop is entered at its top.

Also, loops that are well-formed ones entered at the top can at rtl expansion
time can get mangled by the rtl optimizers, and/or their doloop_begin pattern
moved away so that it no longer matches the loop.  In order to give the port
a chance at machine_dependent_reorg / instruction output time to verify if
there are matching patterns present, it first has to be able to record which
doloop_begin and doloop_end statements belong together.

The patch attached below as doloop-patch-2-2 addresses these two issue by
adding an operand to doloop_end to indicate if the loop is entered at its
top, and one to doloop_begin which is the doloop_end instruction.

I have tested this patch with a variant of contrib/config-list.mk that
I trimmed to test configurations existing ports with doloop_end patterns, i.e.:
LIST = \
arm-linux-androideabi arm-uclinux_eabi arm-eabi \
arm-symbianelf \
bfin-elf bfin-uclinux bfin-linux-uclibc bfin-rtems bfin-openbsd \
c6x-elf c6x-uclinux \
ia64-elf \
ia64-freebsd6 ia64-linux ia64-hpux \
mep-elf \
powerpc-darwin8 \
powerpc-darwin7 powerpc64-darwin powerpc-freebsd6 powerpc-netbsd \
powerpc-eabispe powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
powerpc-rtems4.11OPT-enable-threads=yes powerpc-linux_spe \
powerpc-linux_paired powerpc64-linux_altivec \
powerpc-wrs-vxworks powerpc-wrs-vxworksae powerpc-lynxos powerpcle-elf \
powerpcle-eabisim powerpcle-eabi rs6000-ibm-aix4.3 rs6000-ibm-aix5.1.0 \
rs6000-ibm-aix5.2.0 rs6000-ibm-aix5.3.0 rs6000-ibm-aix6.0 \
s390-linux-gnu s390x-linux-gnu s390x-ibm-tpf sh-elf \
shle-linux sh-netbsdelf sh-superh-elf sh5el-netbsd sh64-netbsd sh64-linux \
sh64-elfOPT-with-newlib sh-rtems sh-wrs-vxworks \
spu-elf tilegx-linux-gnu tilepro-linux-gnu \


FWIW, I left out arm-wrs-vxworks, arm-netbsdelf and ia64-hp-vms because
these configurations are currently broken and there are sufficient working configurations to cover arm / ia64.


As baseline for testing I used revision 191658, with a patch set to get
c6x / mep / rs6000 tilegx / tilepro to build, to be found in the
second attachment loop-patch-2-1 .  This allows the above mentioned list of
configurations to build, except for powerpc*-darwin*.

To be clear, I'm asking here for approval of doloop-patch-2-2, not the
collection of patches to get a working baseline.
If/how c6x / mep / rs6000 / tilegx / tilepro ports should be fixed
would be subject to separate discussions.
2012-09-26  J"orn Rennecke  <joern.rennecke@arc.com>

        * loop-doloop.c (doloop_modify): Pass doloop_end pattern to
        gen_doloop_begin.
        * loop-doloop.c (doloop_optimize): Pass flag to indicate if loop is
        entered at top to gen_doloop_end.
	* config/arm/thumb2.md (doloop_end): Accept extra operand.
	* config/bfin/bfin.md (doloop_end): Likewise.
	* config/c6x/c6x.md (doloop_end): Likewise.
	* config/ia64/ia64.md (doloop_end): Likewise.
	* config/mep/mep.md (doloop_begin, doloop_end): Likewise.
	* config/rs6000/rs6000.md (doloop_end): Likewise.
	* config/s390/s390.md (doloop_end): Likewise.
	* config/sh/sh.md (doloop_end): Likewise.
	* config/spu/spu.md (doloop_end): Likewise.
	* config/tilegx/tilegx.md (doloop_end): Likewise.
	* config/tilepro/tilepro.md (doloop_end): Likewise.
	* doc/md.texi (doloop_end): Document new operand.

Index: gcc/gcc/config/arm/thumb2.md
===================================================================
--- gcc/gcc/config/arm/thumb2.md	(revision 191658)
+++ gcc/gcc/config/arm/thumb2.md	(working copy)
@@ -996,7 +996,8 @@ (define_expand "doloop_end"
    (use (match_operand 1 "" ""))      ; iterations; zero if unknown
    (use (match_operand 2 "" ""))      ; max iterations
    (use (match_operand 3 "" ""))      ; loop level
-   (use (match_operand 4 "" ""))]     ; label
+   (use (match_operand 4 "" ""))      ; label
+   (use (match_operand 5 "" ""))]     ; flag: 1 if loop entered at top, else 0
   "TARGET_32BIT"
   "
  {
Index: gcc/gcc/config/bfin/bfin.md
===================================================================
--- gcc/gcc/config/bfin/bfin.md	(revision 191658)
+++ gcc/gcc/config/bfin/bfin.md	(working copy)
@@ -1933,6 +1933,7 @@ (define_insn "*tablejump_internal"
 ; operand 2 is the maximum number of loop iterations
 ; operand 3 is the number of levels of enclosed loops
 ; operand 4 is the label to jump to at the top of the loop
+; operand 5 indicates if the loop is entered at the top
 (define_expand "doloop_end"
   [(parallel [(set (pc) (if_then_else
 			  (ne (match_operand:SI 0 "" "")
@@ -1943,7 +1944,7 @@ (define_expand "doloop_end"
 		   (plus:SI (match_dup 0)
 			    (const_int -1)))
 	      (unspec [(const_int 0)] UNSPEC_LSETUP_END)
-	      (clobber (match_scratch:SI 5 ""))])]
+	      (clobber (match_operand 5 ""))])] ; match_scratch
   ""
 {
   /* The loop optimizer doesn't check the predicates... */
@@ -1956,6 +1957,7 @@ (define_expand "doloop_end"
       && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
     FAIL;
   bfin_hardware_loop ();
+  operands[5] = gen_rtx_SCRATCH (SImode);
 })
 
 (define_insn "loop_end"
Index: gcc/gcc/config/c6x/c6x.md
===================================================================
--- gcc/gcc/config/c6x/c6x.md	(revision 191658)
+++ gcc/gcc/config/c6x/c6x.md	(working copy)
@@ -1425,6 +1425,7 @@ (define_insn_and_split "eh_return"
 ; operand 2 is the maximum number of loop iterations
 ; operand 3 is the number of levels of enclosed loops
 ; operand 4 is the label to jump to at the top of the loop
+; operand 5 indicates if the loop is entered at the top
 (define_expand "doloop_end"
   [(parallel [(set (pc) (if_then_else
 			  (ne (match_operand:SI 0 "" "")
@@ -1434,12 +1435,13 @@ (define_expand "doloop_end"
 	      (set (match_dup 0)
 		   (plus:SI (match_dup 0)
 			    (const_int -1)))
-	      (clobber (match_scratch:SI 5 ""))])]
+	      (clobber (match_operand 5 ""))])] ; match_scratch
   "TARGET_INSNS_64PLUS && optimize"
 {
   /* The loop optimizer doesn't check the predicates... */
   if (GET_MODE (operands[0]) != SImode)
     FAIL;
+  operands[5] = gen_rtx_SCRATCH (SImode);
 })
 
 (define_insn "mvilc"
Index: gcc/gcc/config/ia64/ia64.md
===================================================================
--- gcc/gcc/config/ia64/ia64.md	(revision 191658)
+++ gcc/gcc/config/ia64/ia64.md	(working copy)
@@ -3960,7 +3960,8 @@ (define_expand "doloop_end"
    (use (match_operand 1 "" ""))	; iterations; zero if unknown
    (use (match_operand 2 "" ""))	; max iterations
    (use (match_operand 3 "" ""))	; loop level
-   (use (match_operand 4 "" ""))]	; label
+   (use (match_operand 4 "" ""))	; label
+   (use (match_operand 5 "" ""))]	; flag: 1 if loop entered at top, else 0
   ""
 {
   /* Only use cloop on innermost loops.  */
Index: gcc/gcc/config/mep/mep.md
===================================================================
--- gcc/gcc/config/mep/mep.md	(revision 191658)
+++ gcc/gcc/config/mep/mep.md	(working copy)
@@ -2079,7 +2079,8 @@ (define_expand "doloop_begin"
   [(use (match_operand 0 "register_operand" ""))
    (use (match_operand:QI 1 "const_int_operand" ""))
    (use (match_operand:QI 2 "const_int_operand" ""))
-   (use (match_operand:QI 3 "const_int_operand" ""))]
+   (use (match_operand:QI 3 "const_int_operand" ""))
+   (use (match_operand 4 "" ""))]
   "!profile_arc_flag && TARGET_OPT_REPEAT"
   "if (INTVAL (operands[3]) > 1)
      FAIL;
@@ -2115,7 +2116,8 @@ (define_expand "doloop_end"
    (use (match_operand:QI 1 "const_int_operand" ""))
    (use (match_operand:QI 2 "const_int_operand" ""))
    (use (match_operand:QI 3 "const_int_operand" ""))
-   (use (label_ref (match_operand 4 "" "")))]
+   (use (label_ref (match_operand 4 "" "")))
+   (use (match_operand 5 "" ""))]
   "!profile_arc_flag && TARGET_OPT_REPEAT"
   "if (INTVAL (operands[3]) > 1)
      FAIL;
Index: gcc/gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/gcc/config/rs6000/rs6000.md	(revision 191658)
+++ gcc/gcc/config/rs6000/rs6000.md	(working copy)
@@ -13158,7 +13158,8 @@ (define_expand "doloop_end"
    (use (match_operand 1 "" ""))	; iterations; zero if unknown
    (use (match_operand 2 "" ""))	; max iterations
    (use (match_operand 3 "" ""))	; loop level
-   (use (match_operand 4 "" ""))]	; label
+   (use (match_operand 4 "" ""))	; label
+   (use (match_operand 5 "" ""))]	; flag: 1 if loop entered at top, else 0
   ""
   "
 {
Index: gcc/gcc/config/s390/s390.md
===================================================================
--- gcc/gcc/config/s390/s390.md	(revision 191658)
+++ gcc/gcc/config/s390/s390.md	(working copy)
@@ -8093,7 +8093,8 @@ (define_expand "doloop_end"
    (use (match_operand 1 "" ""))        ; iterations; zero if unknown
    (use (match_operand 2 "" ""))        ; max iterations
    (use (match_operand 3 "" ""))        ; loop level
-   (use (match_operand 4 "" ""))]       ; label
+   (use (match_operand 4 "" ""))        ; label
+   (use (match_operand 5 "" ""))]       ; flag: 1 if loop entered at top, else 0
   ""
 {
   if (GET_MODE (operands[0]) == SImode && !TARGET_CPU_ZARCH)
Index: gcc/gcc/config/sh/sh.md
===================================================================
--- gcc/gcc/config/sh/sh.md	(revision 191658)
+++ gcc/gcc/config/sh/sh.md	(working copy)
@@ -8223,11 +8223,14 @@ (define_expand "doloop_end"
 			  (pc)))
 	      (set (match_dup 0)
 		   (plus:SI (match_dup 0) (const_int -1)))
-	      (clobber (reg:SI T_REG))])]
+	      (clobber (reg:SI T_REG))])
+   (match_operand 5 "" "")]
   "TARGET_SH2"
 {
   if (GET_MODE (operands[0]) != SImode)
     FAIL;
+  emit_insn (gen_doloop_end_split (operands[0], operands[4], operands[0]));
+  DONE;
 })
 
 (define_insn_and_split "doloop_end_split"
Index: gcc/gcc/config/spu/spu.md
===================================================================
--- gcc/gcc/config/spu/spu.md	(revision 191658)
+++ gcc/gcc/config/spu/spu.md	(working copy)
@@ -4490,7 +4490,8 @@ (define_insn "dsync"
     (use (match_operand 1 "" ""))      ; iterations; zero if unknown
     (use (match_operand 2 "" ""))      ; max iterations
     (use (match_operand 3 "" ""))      ; loop level
-    (use (match_operand 4 "" ""))]     ; label
+    (use (match_operand 4 "" ""))      ; label
+    (match_operand 5 "" "")]
    ""
    "
  {
Index: gcc/gcc/config/tilegx/tilegx.md
===================================================================
--- gcc/gcc/config/tilegx/tilegx.md	(revision 191658)
+++ gcc/gcc/config/tilegx/tilegx.md	(working copy)
@@ -2316,7 +2316,8 @@ (define_expand "doloop_end"
    (use (match_operand 1 "" ""))    ;; iterations; zero if unknown
    (use (match_operand 2 "" ""))    ;; max iterations
    (use (match_operand 3 "" ""))    ;; loop level
-   (use (match_operand 4 "" ""))]   ;; label
+   (use (match_operand 4 "" ""))    ;; label
+   (use (match_operand 5 "" ""))]   ;; flag: 1 if loop entered at top, else 0
    ""
 {
   if (optimize > 0 && flag_modulo_sched)
Index: gcc/gcc/config/tilepro/tilepro.md
===================================================================
--- gcc/gcc/config/tilepro/tilepro.md	(revision 191658)
+++ gcc/gcc/config/tilepro/tilepro.md	(working copy)
@@ -1322,7 +1322,8 @@ (define_expand "doloop_end"
    (use (match_operand 1 "" ""))    ;; iterations; zero if unknown
    (use (match_operand 2 "" ""))    ;; max iterations
    (use (match_operand 3 "" ""))    ;; loop level
-   (use (match_operand 4 "" ""))]   ;; label
+   (use (match_operand 4 "" ""))    ;; label
+   (use (match_operand 5 "" ""))]   ;; flag: 1 if loop entered at top, else 0
    ""
 {
   if (optimize > 0)
Index: gcc/gcc/doc/md.texi
===================================================================
--- gcc/gcc/doc/md.texi	(revision 191658)
+++ gcc/gcc/doc/md.texi	(working copy)
@@ -5501,7 +5501,9 @@ minus the smallest one (both inclusive).
 determined until run-time; operand 2 is the actual or estimated maximum
 number of iterations as a @code{const_int}; operand 3 is the number of
 enclosed loops as a @code{const_int} (an innermost loop has a value of
-1); operand 4 is the label to jump to if the register is nonzero.
+1); operand 4 is the label to jump to if the register is nonzero;
+operand 5 is const1_rtx if the loop in entered at its top, const0_rtx
+otherwise.
 @xref{Looping Patterns}.
 
 This optional instruction pattern should be defined for machines with
Index: gcc/gcc/loop-doloop.c
===================================================================
--- gcc/gcc/loop-doloop.c	(revision 191658)
+++ gcc/gcc/loop-doloop.c	(working copy)
@@ -551,7 +551,8 @@ doloop_modify (struct loop *loop, struct
     init = gen_doloop_begin (counter_reg,
 			     desc->const_iter ? desc->niter_expr : const0_rtx,
 			     GEN_INT (desc->niter_max),
-			     GEN_INT (level));
+			     GEN_INT (level),
+			     doloop_seq);
     if (init)
       {
 	start_sequence ();
@@ -608,6 +609,7 @@ doloop_optimize (struct loop *loop)
   struct niter_desc *desc;
   unsigned word_mode_size;
   unsigned HOST_WIDE_INT word_mode_max;
+  int entered_at_top;
 
   if (dump_file)
     fprintf (dump_file, "Doloop: Processing loop %d.\n", loop->num);
@@ -666,8 +668,10 @@ doloop_optimize (struct loop *loop)
      not like.  */
   start_label = block_label (desc->in_edge->dest);
   doloop_reg = gen_reg_rtx (mode);
+  entered_at_top = loop_preheader_edge (loop)->dest == desc->in_edge->dest;
   doloop_seq = gen_doloop_end (doloop_reg, iterations, iterations_max,
-			       GEN_INT (level), start_label);
+			       GEN_INT (level), start_label,
+			       GEN_INT (entered_at_top));
 
   word_mode_size = GET_MODE_PRECISION (word_mode);
   word_mode_max
@@ -697,7 +701,8 @@ doloop_optimize (struct loop *loop)
 	}
       PUT_MODE (doloop_reg, word_mode);
       doloop_seq = gen_doloop_end (doloop_reg, iterations, iterations_max,
-				   GEN_INT (level), start_label);
+				   GEN_INT (level), start_label,
+				   GEN_INT (entered_at_top));
     }
   if (! doloop_seq)
     {

Attachment: doloop-patch-2-1
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]