This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Fix code quality regression on UltraSPARC
- From: Eric Botcazou <ebotcazou at libertysurf dot fr>
- To: gcc-patches at gcc dot gnu dot org
- Date: Mon, 29 Nov 2004 19:38:37 +0100
- Subject: [PATCH] Fix code quality regression on UltraSPARC
Hello,
I regularly inspect the code emitted for the main loop of gzip to assert the
effectiveness of the loop optimizers on UltraSPARC. I did it yesterday and
came up with the following results:
32-bit:
cc: "-fast -xarch=v8plusb -xchip=ultra3" (Sun ONE Studio 8)
gcc: "-O2 -mcpu=ultrasparc3" (3.3.6pre, 3.4.4pre, 4.0.0pre)
./gzip33 gcc.tar 28.99s user 0.78s system 98% cpu 30.343 total
./gzip33 gcc.tar 29.03s user 0.64s system 98% cpu 29.989 total
./gzip33 gcc.tar 29.01s user 0.64s system 98% cpu 29.994 total
./gzip34 gcc.tar 24.24s user 0.70s system 98% cpu 25.383 total
./gzip34 gcc.tar 24.29s user 0.65s system 98% cpu 25.296 total
./gzip34 gcc.tar 24.30s user 0.64s system 98% cpu 25.231 total
./gzip40 gcc.tar 24.76s user 0.63s system 98% cpu 25.737 total
./gzip40 gcc.tar 24.75s user 0.63s system 98% cpu 25.668 total
./gzip40 gcc.tar 24.75s user 0.58s system 98% cpu 25.696 total
./gzipcc gcc.tar 24.53s user 0.99s system 95% cpu 26.663 total
./gzipcc gcc.tar 24.45s user 0.73s system 98% cpu 25.476 total
./gzipcc gcc.tar 24.51s user 0.65s system 98% cpu 25.513 total
64-bit:
cc: "-fast -xarch=v9b -xchip=ultra3" (Sun ONE Studio 8)
gcc: "-O2 -mcpu=ultrasparc3" (3.3.6pre, 3.4.4pre, 4.0.0pre)
./gzip33 gcc.tar 33.54s user 0.65s system 99% cpu 34.458 total
./gzip33 gcc.tar 33.40s user 0.64s system 99% cpu 34.154 total
./gzip33 gcc.tar 33.45s user 0.54s system 99% cpu 34.101 total
./gzip34 gcc.tar 28.54s user 0.71s system 99% cpu 29.365 total
./gzip34 gcc.tar 28.64s user 0.60s system 99% cpu 29.345 total
./gzip34 gcc.tar 28.78s user 0.43s system 99% cpu 29.295 total
./gzip40 gcc.tar 30.77s user 0.48s system 99% cpu 31.369 total
./gzip40 gcc.tar 30.51s user 0.72s system 99% cpu 31.367 total
./gzip40 gcc.tar 30.65s user 0.62s system 99% cpu 31.378 total
./gzipcc gcc.tar 26.67s user 0.83s system 98% cpu 27.840 total
./gzipcc gcc.tar 26.69s user 0.67s system 99% cpu 27.438 total
./gzipcc gcc.tar 26.58s user 0.75s system 99% cpu 27.457 total
So GCC 4.0.0pre has regressed with regard to 3.4.4pre. Upon closer
inspection, results are more mixed. GCC 4.0.0pre better optimizes induction
variables and is able to hoist one more insn out of the main loop. But the
final code is plagued by kind of a trampoline in the CFG, very similar to
what plagues it in the 3.3.x series.
The problem originates in t027.ch: the copy header pass turns
<L2>:;
[...]
if (prev_length.2_25 >= good_match.8_43) goto <L3>; else goto <L4>;
<L3>:;
chain_length_120 = chain_length_19 >> 2;
[...]
<L4>:;
[...]
if (scan_end_12 != D.1622_51) goto <L22>; else goto <L6>;
into
<L2>:;
[...]
if (prev_length.2_25 >= good_match.8_43) goto <L3>; else goto <L4>;
<L4>:;
goto <bb 5> (<L27>);
<L3>:;
chain_length_120 = chain_length_19 >> 2;
goto <bb 3> (<L4>);
<L27>:;
[...]
if (scan_end_12 != D.1622_51) goto <L22>; else goto <L6>;
and no subsequent pass is able to repair the damage. It is even aggravated by
the RTL BB reordering pass, which moves the <L3> BB just before the epilogue.
The code responsible for the change is in create_preheader:
/* Reorganize blocks so that the preheader is not stuck in the middle of the
loop. */
FOR_EACH_EDGE (e, ei, dummy->preds)
if (e->src != loop->latch)
break;
move_block_after (dummy, e->src);
As demonstrated by the testcase, I think it should be more careful in placing
the newly created preheader. However, I'm not really sure what would be the
best approach: walking the edges backwards? ranking somehow the predecessors?
So I've come up with the following trick, which makes sure that the preheader
is not inserted in the middle of consecutive predecessors. It is sufficient
to fix the code quality regression:
32-bit
./gzip40.1 gcc.tar 23.97s user 0.68s system 98% cpu 24.993 total
./gzip40.1 gcc.tar 23.95s user 0.61s system 98% cpu 24.820 total
./gzip40.1 gcc.tar 23.72s user 0.78s system 98% cpu 24.813 total
64-bit:
./gzip40.1 gcc.tar 26.64s user 0.80s system 98% cpu 27.817 total
./gzip40.1 gcc.tar 26.64s user 0.66s system 99% cpu 27.383 total
./gzip40.1 gcc.tar 26.51s user 0.80s system 99% cpu 27.371 total
Bootstrapped/regtested on amd64-mandrake-linux-gnu.
2004-11-29 Eric Botcazou <ebotcazou@libertysurf.fr>
PR tree-optimization/18707
* cfgloopmanip.c (create_preheader): Make sure the preheader
is not inserted in the middle of consecutive predecessors.
--
Eric Botcazou
Index: cfgloopmanip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloopmanip.c,v
retrieving revision 1.38
diff -u -p -r1.38 cfgloopmanip.c
--- cfgloopmanip.c 22 Nov 2004 17:13:59 -0000 1.38
+++ cfgloopmanip.c 29 Nov 2004 11:18:03 -0000
@@ -1137,7 +1137,7 @@ mfb_update_loops (basic_block jump)
static basic_block
create_preheader (struct loop *loop, int flags)
{
- edge e, fallthru;
+ edge e, fallthru, best_pred = 0;
basic_block dummy;
struct loop *cloop, *ploop;
int nentry = 0;
@@ -1179,9 +1179,19 @@ create_preheader (struct loop *loop, int
/* Reorganize blocks so that the preheader is not stuck in the middle of the
loop. */
FOR_EACH_EDGE (e, ei, dummy->preds)
- if (e->src != loop->latch)
- break;
- move_block_after (dummy, e->src);
+ {
+ if (e->src == loop->latch)
+ continue;
+
+ /* Try to be a bit clever though, that is not to put the preheader
+ between two consecutive predecessors. */
+ if (!best_pred || best_pred->src->next_bb == e->src)
+ best_pred = e;
+ else
+ break;
+ }
+
+ move_block_after (dummy, best_pred->src);
loop->header->loop_father = loop;
add_bb_to_loop (dummy, cloop);