This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Slowdowns in code generated by GCC>=3.3
- From: Földy Lajos <foldy at rmki dot kfki dot hu>
- To: remko dot troncon at cs dot kuleuven dot ac dot be
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 20 Oct 2004 18:20:13 +0200 (CEST)
- Subject: Re: Slowdowns in code generated by GCC>=3.3
> Hi,
>
> I am a developer of a bytecode emulator for the Prolog language. With
> the release of GCC-3.3, our emulator was slowed down by a factor of 3 on
> x86 with -O3 turned on (we didn't measure other platforms; the
> optimization flag doesn't seem to matter). We were hoping this was a
> temporary issue, but the situation didn't improve in any of the newer
> releases :( I don't know whether i should file this as a bug report, so
> i first ask for advice her.
>
> I'll try to explain on a high level what happens. If this isn't
> sufficient, i can try to give some code, but this will take me some time
> to isolate the code. This is the situation:
> - Since the program counter in our emulator is very crucial, we use the
> 'register' and 'asm ("bx")' hints.
> - For each instruction in the bytecode, we store the address of the
> label of the code which has to be executed for the instruction.
> Therefore, the program counter always contains points to an address of
> code to be executed, and after each instruction we do a
> goto **(void **)program_counter
> Previous versions of GCC keep the program counter in ebx, and do a
> jmp *(%ebx) after the instructions (as expected). The newer GCCs seem
> to unnecessarily move the program counter around between registers, and
> don't do the jmp*(%ebx) after each instruction, but seem to jump to a
> 'common' piece of code doing this jump.
>
> Looking at the changelog of gcc-3.3, i can only deduce this has to do
> with the new DFA scheduler, but of course i can not tell for sure.
>
> I don't know if any of this information is useful, but we could use some
> pointers in places to look where things are going wrong in the code
> generation. The factor 3 of slowdown is really a lot.
>
> Does anyone have any ideas ?
>
> thanks a lot,
> Remko
Hi,
not portable, but on i386 you can try using the good old inline assembly:
void* pc;
...
pc=&&lab;
__asm__("jmp *%0" : : "a" (pc));
...
lab:
best regards,
lajos foldy