This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: optimization/8092: cross-jump triggers too often

From: Anton Ertl <anton at a0 dot complang dot tuwien dot ac dot at>
To: bernd dot paysan at gmx dot de, rth at gcc dot gnu dot org, gcc-bugs at gcc dot gnu dot org, gcc-prs at gcc dot gnu dot org, obody at gcc dot gnu dot org, gcc-gnats at gcc dot gnu dot org
Date: Sat, 5 Oct 2002 11:57:34 +0200 (MET DST)
Subject: Re: optimization/8092: cross-jump triggers too often
Reply-to: anton at mips dot complang dot tuwien dot ac dot at

PR 8092 reports essentially the same problems as

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&pr=7953

Here's some additional data to give you an idea how far the thumb is
sticking out:

Here you see user times of four benchmarks in seconds (i.e. smaller is
better) on a Pentium 4 2.26GHz:

Overall slowdown from gcc-3.2 is around a factor of 5:

0.26    0.29    0.32    0.37 gcc-2.95.3 with explicit reg vars
1.20    1.46    1.79    1.96 gcc-3.2 with explicit reg vars

Indirect slowdown from disabling Piumarta-style interpreter inlining
is around a factor 2.5 (interpreter inlining does not work with
gcc-3.2 thanks to cross-jumping):

0.26    0.29    0.32    0.37 gcc-2.95.3 with reg vars
0.62    0.76    1.15    0.89 gcc-2.95.3 with reg vars, --no-dynamic

Direct slowdown from gcc-3.2 pessimisations is around a factor of 2:

0.62    0.76    1.15    0.89 gcc-2.95.3 with reg vars, --no-dynamic
1.20    1.46    1.79    1.96 gcc-3.2 with explicit reg vars

Note that at least the direct slowdowns will apply to all threaded
code interpreters (e.g. Gforth, Ocaml bytecode, various Prolog
implementations, possibly Perl6), and maybe to other interpreters as
well.

As for the explicit reg vars, these are optional and used just because
it produces faster code than gcc's register allocation; for platforms
where gcc does well on its own, we don't define explicit register
variables; hopefully one day the 386 platform will be among those.
Giving an internal compiler error when explicit register allocation
does not work is fine with me; in some cases I have also found wrong
code generated, but did not consider it important enough to report a
bug.

On the Pentium 4 register allocation seems to be unimportant:

0.26    0.29    0.32    0.37 gcc-2.95.3 with explicit reg vars
0.24    0.31    0.28    0.40 gcc-2.95.3 without explicit reg vars

1.20    1.46    1.79    1.96 gcc-3.2 with explicit reg vars
1.33    1.59    1.89    1.76 gcc-3.2 without explicit reg vars

However, on the Athlon and the Pentium III register allocation has a
large influence on performance (timings from an Athlon 1200):

0.37    0.55    0.25    0.61 gcc-2.95.1, reg vars
0.77    1.06    1.34    1.31 gcc-2.95.1, no reg vars

And here's my wishlist:

1) Add a -fno-cross-jump flag or similar, as in Bernd's patch.

2) Fix the bug that moves unrelated code into virtual machine
instructions even with -fno-gcse; we can work around that in the
present case (not yet done in the timings above), but at least I did
not find the source of this code and thus the workaround.

3) Make register allocation good enough that explicit reg vars don't
pay off even on the Athlon.:-)

- anton

Follow-Ups:
- Re: optimization/8092: cross-jump triggers too often
  - From: Bernd Paysan
- Re: optimization/8092: cross-jump triggers too often
  - From: Bernd Paysan

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]