This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
Re: optimization/8092: cross-jump triggers too often
- From: Anton Ertl <anton at a0 dot complang dot tuwien dot ac dot at>
- To: bernd dot paysan at gmx dot de, rth at gcc dot gnu dot org, gcc-bugs at gcc dot gnu dot org, gcc-prs at gcc dot gnu dot org, obody at gcc dot gnu dot org, gcc-gnats at gcc dot gnu dot org
- Date: Sat, 5 Oct 2002 11:57:34 +0200 (MET DST)
- Subject: Re: optimization/8092: cross-jump triggers too often
- Reply-to: anton at mips dot complang dot tuwien dot ac dot at
PR 8092 reports essentially the same problems as
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&pr=7953
Here's some additional data to give you an idea how far the thumb is
sticking out:
Here you see user times of four benchmarks in seconds (i.e. smaller is
better) on a Pentium 4 2.26GHz:
Overall slowdown from gcc-3.2 is around a factor of 5:
0.26 0.29 0.32 0.37 gcc-2.95.3 with explicit reg vars
1.20 1.46 1.79 1.96 gcc-3.2 with explicit reg vars
Indirect slowdown from disabling Piumarta-style interpreter inlining
is around a factor 2.5 (interpreter inlining does not work with
gcc-3.2 thanks to cross-jumping):
0.26 0.29 0.32 0.37 gcc-2.95.3 with reg vars
0.62 0.76 1.15 0.89 gcc-2.95.3 with reg vars, --no-dynamic
Direct slowdown from gcc-3.2 pessimisations is around a factor of 2:
0.62 0.76 1.15 0.89 gcc-2.95.3 with reg vars, --no-dynamic
1.20 1.46 1.79 1.96 gcc-3.2 with explicit reg vars
Note that at least the direct slowdowns will apply to all threaded
code interpreters (e.g. Gforth, Ocaml bytecode, various Prolog
implementations, possibly Perl6), and maybe to other interpreters as
well.
As for the explicit reg vars, these are optional and used just because
it produces faster code than gcc's register allocation; for platforms
where gcc does well on its own, we don't define explicit register
variables; hopefully one day the 386 platform will be among those.
Giving an internal compiler error when explicit register allocation
does not work is fine with me; in some cases I have also found wrong
code generated, but did not consider it important enough to report a
bug.
On the Pentium 4 register allocation seems to be unimportant:
0.26 0.29 0.32 0.37 gcc-2.95.3 with explicit reg vars
0.24 0.31 0.28 0.40 gcc-2.95.3 without explicit reg vars
1.20 1.46 1.79 1.96 gcc-3.2 with explicit reg vars
1.33 1.59 1.89 1.76 gcc-3.2 without explicit reg vars
However, on the Athlon and the Pentium III register allocation has a
large influence on performance (timings from an Athlon 1200):
0.37 0.55 0.25 0.61 gcc-2.95.1, reg vars
0.77 1.06 1.34 1.31 gcc-2.95.1, no reg vars
And here's my wishlist:
1) Add a -fno-cross-jump flag or similar, as in Bernd's patch.
2) Fix the bug that moves unrelated code into virtual machine
instructions even with -fno-gcse; we can work around that in the
present case (not yet done in the timings above), but at least I did
not find the source of this code and thus the workaround.
3) Make register allocation good enough that explicit reg vars don't
pay off even on the Athlon.:-)
- anton