This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix code quality regression on UltraSPARC


Hello,

I regularly inspect the code emitted for the main loop of gzip to assert the 
effectiveness of the loop optimizers on UltraSPARC.  I did it yesterday and 
came up with the following results:

32-bit:
cc: "-fast -xarch=v8plusb -xchip=ultra3" (Sun ONE Studio 8)
gcc: "-O2 -mcpu=ultrasparc3" (3.3.6pre, 3.4.4pre, 4.0.0pre)

./gzip33 gcc.tar  28.99s user 0.78s system 98% cpu 30.343 total
./gzip33 gcc.tar  29.03s user 0.64s system 98% cpu 29.989 total
./gzip33 gcc.tar  29.01s user 0.64s system 98% cpu 29.994 total

./gzip34 gcc.tar  24.24s user 0.70s system 98% cpu 25.383 total
./gzip34 gcc.tar  24.29s user 0.65s system 98% cpu 25.296 total
./gzip34 gcc.tar  24.30s user 0.64s system 98% cpu 25.231 total

./gzip40 gcc.tar  24.76s user 0.63s system 98% cpu 25.737 total
./gzip40 gcc.tar  24.75s user 0.63s system 98% cpu 25.668 total
./gzip40 gcc.tar  24.75s user 0.58s system 98% cpu 25.696 total

./gzipcc gcc.tar  24.53s user 0.99s system 95% cpu 26.663 total
./gzipcc gcc.tar  24.45s user 0.73s system 98% cpu 25.476 total
./gzipcc gcc.tar  24.51s user 0.65s system 98% cpu 25.513 total


64-bit:
cc: "-fast -xarch=v9b -xchip=ultra3" (Sun ONE Studio 8)
gcc: "-O2 -mcpu=ultrasparc3" (3.3.6pre, 3.4.4pre, 4.0.0pre)

./gzip33 gcc.tar  33.54s user 0.65s system 99% cpu 34.458 total
./gzip33 gcc.tar  33.40s user 0.64s system 99% cpu 34.154 total
./gzip33 gcc.tar  33.45s user 0.54s system 99% cpu 34.101 total

./gzip34 gcc.tar  28.54s user 0.71s system 99% cpu 29.365 total
./gzip34 gcc.tar  28.64s user 0.60s system 99% cpu 29.345 total
./gzip34 gcc.tar  28.78s user 0.43s system 99% cpu 29.295 total

./gzip40 gcc.tar  30.77s user 0.48s system 99% cpu 31.369 total
./gzip40 gcc.tar  30.51s user 0.72s system 99% cpu 31.367 total
./gzip40 gcc.tar  30.65s user 0.62s system 99% cpu 31.378 total

./gzipcc gcc.tar  26.67s user 0.83s system 98% cpu 27.840 total
./gzipcc gcc.tar  26.69s user 0.67s system 99% cpu 27.438 total
./gzipcc gcc.tar  26.58s user 0.75s system 99% cpu 27.457 total


So GCC 4.0.0pre has regressed with regard to 3.4.4pre.  Upon closer 
inspection, results are more mixed.  GCC 4.0.0pre better optimizes induction 
variables and is able to hoist one more insn out of the main loop.  But the 
final code is plagued by kind of a trampoline in the CFG, very similar to 
what plagues it in the 3.3.x series.

The problem originates in t027.ch: the copy header pass turns

<L2>:;
  [...]
  if (prev_length.2_25 >= good_match.8_43) goto <L3>; else goto <L4>;

<L3>:;
  chain_length_120 = chain_length_19 >> 2;
  [...]

<L4>:;
  [...]
  if (scan_end_12 != D.1622_51) goto <L22>; else goto <L6>;

into

<L2>:;
  [...]
  if (prev_length.2_25 >= good_match.8_43) goto <L3>; else goto <L4>;

<L4>:;
  goto <bb 5> (<L27>);

<L3>:;
  chain_length_120 = chain_length_19 >> 2;
  goto <bb 3> (<L4>);

<L27>:;
  [...]
  if (scan_end_12 != D.1622_51) goto <L22>; else goto <L6>;

and no subsequent pass is able to repair the damage.  It is even aggravated by 
the RTL BB reordering pass, which moves the <L3> BB just before the epilogue.


The code responsible for the change is in create_preheader:

  /* Reorganize blocks so that the preheader is not stuck in the middle of the
      loop.  */
  FOR_EACH_EDGE (e, ei, dummy->preds)
    if (e->src != loop->latch)
      break;
  move_block_after (dummy, e->src);

As demonstrated by the testcase, I think it should be more careful in placing 
the newly created preheader.  However, I'm not really sure what would be the 
best approach: walking the edges backwards? ranking somehow the predecessors?
So I've come up with the following trick, which makes sure that the preheader 
is not inserted in the middle of consecutive predecessors.  It is sufficient 
to fix the code quality regression:

32-bit
./gzip40.1 gcc.tar  23.97s user 0.68s system 98% cpu 24.993 total
./gzip40.1 gcc.tar  23.95s user 0.61s system 98% cpu 24.820 total
./gzip40.1 gcc.tar  23.72s user 0.78s system 98% cpu 24.813 total

64-bit:
./gzip40.1 gcc.tar  26.64s user 0.80s system 98% cpu 27.817 total
./gzip40.1 gcc.tar  26.64s user 0.66s system 99% cpu 27.383 total
./gzip40.1 gcc.tar  26.51s user 0.80s system 99% cpu 27.371 total


Bootstrapped/regtested on amd64-mandrake-linux-gnu.


2004-11-29  Eric Botcazou  <ebotcazou@libertysurf.fr>

	PR tree-optimization/18707
	* cfgloopmanip.c (create_preheader): Make sure the preheader
	is not inserted in the middle of consecutive predecessors.


-- 
Eric Botcazou
Index: cfgloopmanip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloopmanip.c,v
retrieving revision 1.38
diff -u -p -r1.38 cfgloopmanip.c
--- cfgloopmanip.c	22 Nov 2004 17:13:59 -0000	1.38
+++ cfgloopmanip.c	29 Nov 2004 11:18:03 -0000
@@ -1137,7 +1137,7 @@ mfb_update_loops (basic_block jump)
 static basic_block
 create_preheader (struct loop *loop, int flags)
 {
-  edge e, fallthru;
+  edge e, fallthru, best_pred = 0;
   basic_block dummy;
   struct loop *cloop, *ploop;
   int nentry = 0;
@@ -1179,9 +1179,19 @@ create_preheader (struct loop *loop, int
   /* Reorganize blocks so that the preheader is not stuck in the middle of the
      loop.  */
   FOR_EACH_EDGE (e, ei, dummy->preds)
-    if (e->src != loop->latch)
-      break;
-  move_block_after (dummy, e->src);
+    {
+      if (e->src == loop->latch)
+	continue;
+
+      /* Try to be a bit clever though, that is not to put the preheader
+	 between two consecutive predecessors.  */
+      if (!best_pred || best_pred->src->next_bb == e->src)
+	best_pred = e;
+      else
+	break;
+    }
+
+  move_block_after (dummy, best_pred->src);
 
   loop->header->loop_father = loop;
   add_bb_to_loop (dummy, cloop);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]