This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Slowdowns in code generated by GCC>=3.3


On Wednesday 20 October 2004 14:34, Remko Troncon wrote:
> Hi,
>
> I am a developer of a bytecode emulator for the Prolog language. With the
> release of GCC-3.3, our emulator was slowed down by a factor of 3 on x86
> with -O3 turned on (we didn't measure other platforms; the optimization
> flag doesn't seem to matter).

Which x86 architecture variant?

> We were hoping this was a temporary issue,
> but the situation didn't improve in any of the newer releases :(
> I don't know whether i should file this as a bug report, so i first ask
> for advice here.

Filing a bug report is only going to be useful if you can report your
problem in a way such that we can reproduce it: test case, output of
"gcc -v", etc.  See http://gcc.gnu.org/bugs.html for the details ;-)


> I'll try to explain on a high level what happens. If this isn't sufficient,
> i can try to give some code, but this will take me some time to isolate the
> code. This is the situation:
> - Since the program counter in our emulator is very crucial, we use the
>   'register' and 'asm ("bx")' hints.

Is the program counter a global variable, or local?  And if you remove
those hints, does that make your code worse?
I would actually expect it to improve if you remove those hints.  x86
is a register starved architecture, and as the documentation mentions:

  "Defining such a register variable does not reserve the register; it
remains available for other uses in places where flow control determines
the variable's value is not live.  However, these registers are made
unavailable for use in the reload pass; excessive use of this feature
leaves the compiler too few available registers to compile certain
functions."
(see "info gcc", look for "Explicit Reg Vars")

For an architecture with basically only 6 registers, taking up just one
is probably "Excessive use" already.


> - For each instruction in the bytecode, we store the address of the label
>   of the code which has to be executed for the instruction. Therefore,
>   the program counter always contains points to an address of code to
>   be executed, and after each instruction we do a
> 	goto  **(void **)program_counter
> Previous versions of GCC keep the program counter in ebx, and do a
> jmp *(%ebx) after the instructions (as expected). The newer GCCs seem
> to unnecessarily move the program counter around between registers, and
> don't do the jmp*(%ebx) after each instruction, but seem to jump to a
> 'common' piece of code doing this jump.

Yes.  Indirect jumps are incredibly expensive at compile time, so what
the compiler does is "factor" the computed jump, i.e. given,

  goto *x;
  [ ... ]

  goto *x;
  [ ... ]

  goto *x;
  [ ... ]


the compiler factors the computed jumps results in the following code
sequence which has a much simpler control flow graph:

  goto y;
  [ ... ]

  goto y;
  [ ... ]

  goto y;
  [ ... ]

y:
  goto *x;

The compiler is supposed to unfactor this in the basic block reordering
pass, perhaps that is not happening for your code for some reason.


> Looking at the changelog of gcc-3.3, i can only deduce this has to do with
> the new DFA scheduler, but of course i can not tell for sure.

I can tell almost for sure that this is not the problem.  In GCC 3.3,
only the pentium has a DFA scheduler description, all other architecture
variants still use the old scheduler.  Besides, scheduling on i386 is a
local list scheduling and your problem seems to be control flow related.

> I don't know if any of this information is useful, but we could use some
> pointers in places to look where things are going wrong in the code
> generation. The factor 3 of slowdown is really a lot.

I would first try to remove that "register ... asm (...)" junk, and try
to optimize for something more advanced than i386 (which is the default
x86 architecture, see the manual, -march=*).  If that does not help,
please file a bug report including a test case as explained on bugs.html.

Gr.
Steven




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]