Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 15242
Product:  
Component:  
Status: RESOLVED
Resolution: FIXED
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: anton@mips.complang.tuwien.ac.at
Add CC:
CC:
Remove selected CCs
Build:
Patch URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
ef.i test input (file ef.i mentioned in the bug report) text/plain 2004-05-01 14:30 46.15 KB Edit
patch Updated patch for this problem patch 2005-01-27 08:33 2.24 KB Edit | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 15242 depends on: 15243 16585 20648 Show dependency tree
Show dependency graph
Bug 15242 blocks:

Additional Comments:






View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2005-10-07 03:18 Opened: 2004-05-01 14:26
This is essentially the 3.4.0 incarnation of the problem discussed at
the end of PR8092.  mmitchel suggested to create a new bug report, so
here it is:

Code example and observed behaviour
-----------------------------------

In the attached code, the fragment between I_question_branch and
J_question_branch_lp_plus_store_number contains the following code
(slightly edited for readability):

I_question_branch:   
{
  Cell * a_target;
  Bool f;
  ;
  (( a_target )=(Cell *)( ( (* (ip) )   )  )) ;
  (( f )=(Bool)( (sp[0])  )) ;
  ({ ip+=( 1 );}) ;
  sp += 1;
  {
    if (f==0) {
      ({ip=( (Xt *)a_target );  ;}) ;
      ;
      (ip++) ;
      ;
      ({asm("":"=X"(cfa));  goto **(ip-1);}) ;
    }
    ;
  }
  ;
  (ip++) ;
  ;
K_question_branch : 
  ({asm("":"=X"(cfa));  goto **(ip-1);}) ;
}
J_question_branch_lp_plus_store_number :

When the file is compiled with

gcc-3.4.0  -I../../gforth-0.6.2/engine/../arch/386 -I. -Wall -O2
-fomit-frame-pointer -fforce-addr -fforce-mem -march=pentium -DHAVE_CONFIG_H
-DDEFAULTPATH='".:/usr/local/lib/gforth/site-forth:/usr/local/share/gforth/site-forth:/usr/local/lib/gforth/0.6.2:/usr/local/share/gforth/0.6.2"'
-fno-gcse -fno-strict-aliasing -fno-defer-pop -fcaller-saves -fno-inline -S ef.i
-o ef-3.4.0-def.s

this produces the following code:

.L20:
	movl	(%edi), %eax
	movl	(%ebp), %edx
	addl	$4, %edi
	addl	$4, %ebp
	testl	%eax, %eax
	jne	.L1073
	leal	4(%edx), %ebp
	movl	-4(%ebp), %eax
	jmp	.L1373
.L739:
... #actually the following fragment comes earlier
.L1073:
	addl	$4, %ebp
.L376:
	movl	-4(%ebp), %eax
	jmp	.L1373
... #the following fragment comes even earlier.
.L1373: 
	jmp	*%eax


What is the problem?
--------------------

The main problem here is that the code jumps to a shared "jmp *eax"
instead of just putting "jmp *eax" (or, in this case, preferably "jmp
*-4(%ebp)") where the jump to the shared indirect jump is.  All
indirect jumps are converted into jumps to a single shared indirect
jump in this way.

This obstructs effective usage of the BTB for branch prediction on
CPUs such as the Athlon, Opteron, and Pentium III (see below), and
most interpreters using GCC's labels-as-values extension will be
affected by this.

It also makes it impossible for us to use dynamic superinstructions
portably (aka selective inlining, PLDI'98 p. 291).  I could eventually
find ways to work around the problems we have had with 3.2 and 3.3,
but this time I don't see a workaround (and a quick look for new
options revealed nothing).

A smaller problem is that the basic blocks are distributed throughout
the code unless we use -fno-reorder-blocks; this reduces the number of
cases where we can apply dynamic superinstructions.  For gcc-3.3,
using -fno-crossjumping suppressed the main problem, but not in
combination with -fno-reorder-blocks.  So if you fix the main problem,
it would be nice if it also worked with -fno-reorder-blocks.


How much does this cost in performance?
---------------------------------------

Here are the results of doing a

configure --enable-force-reg CC=...; make; make onebench
ENGINE_FAST="./gforth-fast --dynamic"; make onebench ENGINE_FAST="./gforth-fast
--no-dynamic"

for gforth-0.6.2 with various compilers (The ef.i file used for the
code example above was also created from gforth-0.6.2, with gcc-2.95
and without --enable-force-reg):

Pentium III 1000MHz; numbers are times in seconds user time
	   gcc-2.95.3	   gcc-3.3	gcc-3.4.0
	dynamic no-dyn	dynamic	no-dyn	no-dyn
siev    0.56    0.85    0.67    0.90    2.31
bubble  0.72	1.28	0.93	1.35	2.68
matrix  0.35	1.43	0.35	1.38	1.90
fib     0.76	1.11	0.95	1.27	2.90

Here you see a factor 3.7-5.4 between the best result (gcc-2.95
dynamic) and the gcc-3.4.0 result for matrix.  This slowdown has two
main components:

- we cannot apply the dynamic optimizations (and I don't see a
portable way to work around this problem), resulting in a slowdown
factors of 1.5-4.

- from the shared indirect jump; the jump to this indirect jump costs
something, but the more important component of the slowdown is
probably the reduction in branch prediction accuracy for the indirect
jumps; the resulting slowdown is about a factor of 1.4-2.5.

On the Athlon I have seen similar results in the past.

Here are results from a Pentium 4:

Pentium 4 2.26GHz; numbers are times in seconds user time
	   gcc-2.95.3	   gcc-3.3	gcc-3.4.0
	dynamic no-dyn	dynamic	no-dyn	no-dyn
siev    0.24    0.48    0.31    0.47    0.50
bubble  0.30	0.78	0.36	0.77	0.78
matrix  0.19	0.94	0.17	0.92	0.96
fib     0.34	0.57	0.41	0.58	0.59

On the Pentium 4 the second component apparently does not play a big
role, probably due to the effects of the trace cache, but the first
component results in even higher slowdowns (1.7-5) than on the
PentiumIII.


Do flag variations change the problem?
--------------------------------------

Adding -fno-reorder-buffers moves .L1073 right below the first "jmp
.L1373" (as it should), but the jumps to .L1373 are still there.
Adding -fno-crossjumping in either case changes essentially nothing.
Reducing the command line to

gcc-3.4.0 -Wall -O2 -fomit-frame-pointer -S ef.i -o ef-3.4.0-nofs.s

produces similar results (but the register allocation is worse).


Suggested fix
-------------

AFAIK gcc introduces the shared indirect jump for the benefit of
data-flow analysis.  In gcc-3.3 with some combinations of flags this
change was undone later (but the pass that did this was suppressed
with -fno-reorder-jumps).  In 3.4.0 the undoing is apparently
completely disabled.

One fix would be to put the undoing into a pass that is invoked after
the data-flow analysis, but before the last combining pass, and that
is independent of flags such as -fno-reorder-blocks (maybe a separate
pass?).  An alternative would be to produce the desired control-flow
graph for data-flow analysis without changing the code (i.e., without
introducing shared jumps).

I might give writing one of these approaches pass a try, but would
prefer to see it done by people who know more about gcc internals than
I do.  In case I do it, I would apreciate feedback and suggestions on
how to proceed.

- anton

------- Comment #1 From anton@mips.complang.tuwien.ac.at 2004-05-01 14:30 -------
Created an attachment (id=6206) [edit]
test input (file ef.i mentioned in the bug report)

------- Comment #2 From Andrew Pinski 2004-05-01 14:38 -------
Note jumping out of a statement expression is really illegal and should not be
done.

------- Comment #3 From Andrew Pinski 2004-05-01 14:42 -------
One more thing statement expressions should end with a non-void expression.

------- Comment #4 From anton@a0.complang.tuwien.ac.at 2004-05-01 15:03 -------
Subject: Re:  pessimization of "goto *"

pinskia at gcc dot gnu dot org wrote:
> Note jumping out of a statement expression is really illegal and should not be done.

Statement expressions are used here just to produce a way to include a
statement sequence in a macro.  Do you recommend that we should switch
from 

({ ... })

to

do { ... } while (0)

?

Or do you have a different suggestion?

BTW, looking in the gcc-3.4.0 manual, I don't see the restrictions you
mention, and gcc-3.4.0 -Wall also does not complain about this usage.
Actually the manual explicitly says:

|(If you use some other kind of statement last within the braces, the
|construct has type `void', and thus effectively no value.)

which I would interpret as "Anything goes, as long as you don't use
the value of the statement expression".

- anton

------- Comment #5 From Andrew Pinski 2004-05-01 15:13 -------
The reason why I say this is because of on the tree-ssa, it does warn about
using about void statements 
as the end, and on the tree-ssa right now ICEs on the tree-ssa, see PR 15243. 
Also this is not 
documented there is another bug for that, PR 772.

------- Comment #6 From anton@a0.complang.tuwien.ac.at 2004-05-02 07:24 -------
Subject: Re:  pessimization of "goto *"

pinskia at gcc dot gnu dot org wrote:
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-05-01 15:13 -------
> The reason why I say this is because of on the tree-ssa, it does warn about using about void statements 
> as the end, and on the tree-ssa right now ICEs on the tree-ssa, see PR 15243.  Also this is not 
> documented there is another bug for that, PR 772.

And that refers to a discussion in the mailing list, and the only
mention of a problem with jumping out that I found there was in
<http://gcc.gnu.org/ml/gcc/2000-07/msg00721.html>.  For this example
the semantics seem pretty obvious and well-defined to me (in contrast
to the jump-into example), and this was just a compiler bug.

If you find this bug hard to fix, it's ok with me if you document that
jumping out does not work and give a warning (and eventually an error)
on compiling it, and I will work around this restriction.

If you want, I can also prepare a version of the test file that uses
do...while(0) instead of statement expressions in some places.

- anton

------- Comment #7 From Joseph S. Myers 2004-05-02 19:06 -------
Subject: Re:  pessimization of "goto *"

On Sun, 2 May 2004, anton at a0 dot complang dot tuwien dot ac dot at wrote:

> And that refers to a discussion in the mailing list, and the only
> mention of a problem with jumping out that I found there was in
> <http://gcc.gnu.org/ml/gcc/2000-07/msg00721.html>.  For this example
> the semantics seem pretty obvious and well-defined to me (in contrast
> to the jump-into example), and this was just a compiler bug.

The semantics are neither obvious nor well-defined, because other parts of
the larger expression containing the statement expression might or might
not have been evaluated.  The only thing in C giving rise to such a
situation (where parts of an expression might or might not have been
evaluated when a jump occurs) is setjmp/longjmp, and you should be able to
use longjmp in statement expressions; goto can't have such effects in C so
it's dubious allowing them in GNU C.


------- Comment #8 From anton@a0.complang.tuwien.ac.at 2004-05-04 15:06 -------
Subject: Re:  pessimization of "goto *"

jsm at polyomino dot org dot uk wrote:
> 
> 
> ------- Additional Comments From jsm at polyomino dot org dot uk  2004-05-02 19:06 -------
> Subject: Re:  pessimization of "goto *"
> 
> On Sun, 2 May 2004, anton at a0 dot complang dot tuwien dot ac dot at wrote:
> 
> > And that refers to a discussion in the mailing list, and the only
> > mention of a problem with jumping out that I found there was in
> > <http://gcc.gnu.org/ml/gcc/2000-07/msg00721.html>.  For this example
> > the semantics seem pretty obvious and well-defined to me (in contrast
> > to the jump-into example), and this was just a compiler bug.
> 
> The semantics are neither obvious nor well-defined, because other parts of
> the larger expression containing the statement expression might or might
> not have been evaluated.

The example is:

  a:
     f ("%d\n", ({goto a;})

The only other part that might or might not have been evaluated is the
"%d\n"; since this has no side effect, it does not matter whether it
is evaluated or not.  So the obvious (to me) and well-defined
semantics of this example is to be equivalent to

a: goto a;

In general, true, if the other parts of the expression have side
effects, the whole expression is undefined, but that's just the same
as in other cases with side effects in expression evaluation, so the
semantics still seem obvious to me.

>  The only thing in C giving rise to such a
> situation (where parts of an expression might or might not have been
> evaluated when a jump occurs) is setjmp/longjmp, and you should be able to
> use longjmp in statement expressions; goto can't have such effects in C so
> it's dubious allowing them in GNU C.

Variable definitions also give rise to such situations (where parts of
an expression might or might not have been evaluated when the
definition occurs), and this may also lead to undefined behaviour, and
this cannot happen in C.  Does this make allowing variable definitions
in statement expressions dubious in GNU C?

I don't think so (after all, they are used in the first example for
statement expressions), and I don't think that this kind of argument
is valid for other things that cannot occur in C expressions, either.

If only things were allowed in statement expressions that are allowed
in C expressions, then statement expressions would be redundant; you
could simply write "(expr, expr, expr)" instead of "({expr; expr;
expr;})".

- anton



------- Comment #9 From Andrew Pinski 2004-05-06 04:45 -------
Confirmed a reduced testcase:
int f(int i, int j)
{
  static void *a[5] = {&&b,&&c,&&d,&&e,&&f};

    if (i >=5) i = 0; if (i<=0)i=5;
    goto *a[i];
  e:
    i++;  if (i >=5) i = 0; if (i<=0)i=5;
    j++;
    goto *a[i];
  f:
    i--;  if (i >=5) i = 0; if (i<=0)i=5;
    j--;
    goto *a[i];
  b:
    i+=2;  if (i >=5) i = 0; if (i<=0)i=5;
    goto *a[i];
  c:
    i*=2;  if (i >=5) i = 0; if (i<=0)i=5;
    goto *a[i];
  d:;
}

------- Comment #10 From Josef Zlomek 2004-05-24 14:03 -------
Proposed patch:
http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01526.html

------- Comment #11 From Andrew Pinski 2004-05-26 12:29 -------
Newest Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01670.html>.

------- Comment #12 From anton@mips.complang.tuwien.ac.at 2004-07-06 20:35 -------
Subject: Re:  pessimization of "goto *"

pinskia at gcc dot gnu dot org wrote:
> Newest Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01670.html>.

I tried this patch with gcc-3.5-20040523, and the problem was still
there, for both the reduced test case, and a variant (work around PR
15243) of the original testcase.

I also tried this patch with gcc-core-3.5-20040704, and it applies
cleanly, but then produces a compilation error:

../../gcc-3.5-20040704/gcc/bb-reorder.c: In function `duplicate_computed_gotos':
../../gcc-3.5-20040704/gcc/bb-reorder.c:1989: error: structure has no member named `computed_goto_common_label'
make[2]: *** [bb-reorder.o] Error 1
make[2]: Leaving directory `/nfs/nfstmp/anton/gcc-patched/gcc-3.5-20040704-i386/gcc'

- anton

------- Comment #13 From Andrew Pinski 2004-07-06 22:00 -------
It looks like you can remove:
+   if (!cfun->computed_goto_common_label)
+     return;

from the patch.

------- Comment #14 From zlomj9am@artax.karlin.mff.cuni.cz 2004-07-07 04:49 -------
Subject: Re:  pessimization of "goto *"

> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-07-06 22:00 -------
> It looks like you can remove:
> +   if (!cfun->computed_goto_common_label)
> +     return;
> 
> from the patch.

cfun->computed_goto_common_label can be replaced by cfun->has_computed_jump.
The test is there not to scan the function if the computed goto is not
used in the function.

Josef

------- Comment #15 From Josef Zlomek 2004-07-07 13:34 -------
> I tried this patch with gcc-3.5-20040523, and the problem was still
there

The optimization is enabled at -O2 (or -O -freorder-blocks).
If there is a uncond jump to a block containing a computed jump and the block is
small enough (several instructions) the block is copied to eliminate the uncond
jump.

When I was testing the patch with current mainline I have noticed that the
current_function_has_computed_jump flag is incorrectly unset in make_edges()
when compiling ef.i. IMHO the flag should not be set initialized to 0 (or 1 if
flag_reorder_blocks_and_partition) in make_edges() but somewhere earlier because
make_edges() can be called several times and only a subset of blocks may be
scanned for computed jumps in it.

------- Comment #16 From anton@mips.complang.tuwien.ac.at 2004-07-08 13:03 -------
Subject: Re:  pessimization of "goto *"

zlomek at gcc dot gnu dot org wrote:
> The optimization is enabled at -O2

Not as far as I can tell (tried it on i386 with patched versions of
gcc-3.5-20040704 and gcc-3.5-20040523).

> (or -O -freorder-blocks).

That seems to work, though.

We would like to use -fno-reorder-blocks with
duplicate_computed_gotos, either through an extra option, or by just
having duplicate_computed_gotos all the time (at least for our code
these gotos were duplicated already in the source code, so duplicating
them again in the back end is closer to the idea of
-fno-reorder-blocks than not duplicating them).

> If there is a uncond jump to a block containing a computed jump and the block is
> small enough (several instructions) the block is copied to eliminate the uncond
> jump.
> 
> When I was testing the patch with current mainline I have noticed that the
> current_function_has_computed_jump flag is incorrectly unset in make_edges()
> when compiling ef.i.

Ok, that explains why this patch does not work for ef.i.

One other thing I noticed is that apparently there is no combining
pass (or its tree-ssa equivalent) after duplicate_computed_gotos,
resulting in compiling code like

    goto *a[i];

into

        movl    a.0(,%ebx,4), %eax
        jmp     *%eax

instead of

	jmp	*a.0(,%ebx,4)

Thanks for your efforts,

- anton

------- Comment #17 From zlomj9am@artax.karlin.mff.cuni.cz 2004-07-16 07:01 -------
Subject: Re:  pessimization of "goto *"

> > The optimization is enabled at -O2
> Not as far as I can tell (tried it on i386 with patched versions of
> gcc-3.5-20040704 and gcc-3.5-20040523).

Yes, it is enabled by the patch. The patch duplicates block with computed goto
to place where a uncond jump to the conputed goto block is if the block is
small enough.
The computed jump in the small testcase is not duplicated, because the uncond
jumps from other places do not jump to a basic block which contains the
computed jump but to other block:

f:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        movl    8(%ebp), %ebx
        .p2align 2,,3
.L37:
        cmpl    $4, %ebx
        jle     .L39
.L34:
        movl    $5, %ebx
.L9:
        movl    a.1118(,%ebx,4), %eax
        jmp     *%eax
        .p2align 2,,3
.L39:
        testl   %ebx, %ebx
        jg      .L9
        jmp     .L34
        .p2align 2,,3
.L3:
        popl    %ebx
        leave
        ret
        .p2align 2,,3
.L2:
        sall    %ebx
        jmp     .L37
        .p2align 2,,3
.L1:
        addl    $2, %ebx
        jmp     .L37
        .p2align 2,,3
.L5:
        decl    %ebx
        jmp     .L37
        .p2align 2,,3
.L4:
        incl    %ebx
        jmp     .L37

However, I see that after the L39 there is a cond jump to the computed jump block.
This is not handled by my patch but could be.

> We would like to use -fno-reorder-blocks with
> duplicate_computed_gotos, either through an extra option, or by just
> having duplicate_computed_gotos all the time (at least for our code
> these gotos were duplicated already in the source code, so duplicating
> them again in the back end is closer to the idea of
> -fno-reorder-blocks than not duplicating them).

Adding an new flag is possible. However, I do not see why you explicitelly want
to disable reorder-blocks.

> One other thing I noticed is that apparently there is no combining
> pass (or its tree-ssa equivalent) after duplicate_computed_gotos,
> resulting in compiling code like
> 
>     goto *a[i];
> 
> into
> 
>         movl    a.0(,%ebx,4), %eax
>         jmp     *%eax
> 
> instead of
> 
> 	jmp	*a.0(,%ebx,4)

The combiner should have already combined these insns before the duplication of
computed gotos is run. This pass just duplicates some insns.

Josef

------- Comment #18 From anton@mips.complang.tuwien.ac.at 2004-07-16 08:17 -------
Subject: Re:  pessimization of "goto *"

zlomj9am at artax dot karlin dot mff dot cuni dot cz wrote:
> > > The optimization is enabled at -O2
> > Not as far as I can tell (tried it on i386 with patched versions of
> > gcc-3.5-20040704 and gcc-3.5-20040523).
> 
> Yes, it is enabled by the patch. The patch duplicates block with computed goto
> to place where a uncond jump to the conputed goto block is if the block is
> small enough.
> The computed jump in the small testcase is not duplicated, because the uncond
> jumps from other places do not jump to a basic block which contains the
> computed jump but to other block:
...

Thanks for opening my eyes.

> > We would like to use -fno-reorder-blocks with
> > duplicate_computed_gotos, either through an extra option, or by just
> > having duplicate_computed_gotos all the time (at least for our code
> > these gotos were duplicated already in the source code, so duplicating
> > them again in the back end is closer to the idea of
> > -fno-reorder-blocks than not duplicating them).
> 
> Adding an new flag is possible. However, I do not see why you explicitelly want
> to disable reorder-blocks.

We need it for the dynamic superinstruction optimization (aka
selective inlining, PLDI'98 p. 291) in Gforth and further
optimizations building on that (other people working on efficient
interpreters/dynamic code generators have the same problem; e.g., I
recently talked to Etienne Gagnon (http://sablevm.org/) about this;
qemu also uses -fno-reorder-blocks).

For dynamic superinstructions we need to ensure that the machine code
corresponding to source code between two labels is between the
addresses that gcc produces for the labels.  The flag
-fno-reorder-blocks almost gives us that (but the sharing of the
"goto *"s breaks that).

Even for just plain interpreters (without dynamic superinstructions),
the sharing of the "goto *"s resulting from the merging of code after
conditional jumps (as shown in the simplified testcase) will often
reduce performance significantly through lower BTB accuracy.  So for
performance one probably wants to use -fno-reorder-blocks there,
unless this results in merging the "goto *"s, as it does now.

> > One other thing I noticed is that apparently there is no combining
> > pass (or its tree-ssa equivalent) after duplicate_computed_gotos,
> > resulting in compiling code like
> > 
> >     goto *a[i];
> > 
> > into
> > 
> >         movl    a.0(,%ebx,4), %eax
> >         jmp     *%eax
> > 
> > instead of
> > 
> > 	jmp	*a.0(,%ebx,4)
> 
> The combiner should have already combined these insns before the duplication of
> computed gotos is run. This pass just duplicates some insns.

My guess is that the combiner does not combine these insns before
duplication, because they are in different basic blocks before
duplication.  Without duplication the code would look like:

         movl    a.0(,%ebx,4), %eax
         jmp     .L9
         ...
.L9:
         jmp     *%eax

- anton

------- Comment #19 From zlomj9am@artax.karlin.mff.cuni.cz 2004-07-16 08:42 -------
Subject: Re:  pessimization of "goto *"

> > Adding an new flag is possible. However, I do not see why you explicitelly want
> > to disable reorder-blocks.
> 
> We need it for the dynamic superinstruction optimization (aka
> selective inlining, PLDI'98 p. 291) in Gforth and further
> optimizations building on that (other people working on efficient
> interpreters/dynamic code generators have the same problem; e.g., I
> recently talked to Etienne Gagnon (http://sablevm.org/) about this;
> qemu also uses -fno-reorder-blocks).
> 
> For dynamic superinstructions we need to ensure that the machine code
> corresponding to source code between two labels is between the
> addresses that gcc produces for the labels.  The flag
> -fno-reorder-blocks almost gives us that (but the sharing of the
> "goto *"s breaks that).

I see.

> Even for just plain interpreters (without dynamic superinstructions),
> the sharing of the "goto *"s resulting from the merging of code after
> conditional jumps (as shown in the simplified testcase) will often
> reduce performance significantly through lower BTB accuracy.  So for
> performance one probably wants to use -fno-reorder-blocks there,
> unless this results in merging the "goto *"s, as it does now.

I see. So duplication of computed jumps probably should be at -O
and have a special flag too.

Josef

------- Comment #20 From bernd.paysan@gmx.de 2004-10-03 20:25 -------
On Wednesday 07 July 2004 06:49, zlomj9am at artax dot karlin dot mff dot cuni  
dot cz wrote: 
> cfun->computed_goto_common_label can be replaced by 
> cfun->has_computed_jump. The test is there not to scan the function if the 
> computed goto is not used in the function. 

Since your patch is still not in the gcc 4.0.0-xxx codebase, I've modified it  
to work with the current CVS version. There's still the problem that the jump  
does not combine with the previous instruction, and that therefore one  
valuable (especially on x86!) register is used up. As trigger condition for  
goto duplication I used the -fno-crossjumping switch, together with  
optimization. 

I've put the patch online under 

http://www.jwdt.com/~paysan/gcc-4.0.0.patch 

------- Comment #21 From Giovanni Bajo 2004-10-04 12:21 -------
Bernd, the patch needs to be posted to gcc-patches@ after a full bootstrap & 
test cycle to gather attention. Probably Josef should take care of this (being 
the assigneed of this bug).

------- Comment #22 From Steven Bosscher 2004-10-25 21:10 -------
I have a patch that fixes this.  I need to figure out a way to do it in a way 
that doesn't require you to pass "-fno-crossjumping -fno-gcse" but I'll figure 
out something... 
 
 

------- Comment #23 From Steven Bosscher 2004-12-18 16:46 -------
Comes from factoring computed gotos, so this is a code quality regression. 
 
 
 
 

------- Comment #24 From Steven Bosscher 2005-01-27 08:33 -------
Created an attachment (id=8079) [edit]
Updated patch for this problem

------- Comment #25 From Andrew Pinski 2005-01-27 13:44 -------
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2005-01/msg02001.html>.

------- Comment #26 From CVS Commits 2005-02-01 10:03 -------
Subject: Bug 15242

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	steven@gcc.gnu.org	2005-02-01 10:03:27

Modified files:
	gcc            : ChangeLog basic-block.h bb-reorder.c 
	                 cfgcleanup.c params.def passes.c 
	gcc/doc        : invoke.texi 

Log message:
	PR optimization/15242
	* params.def (PARAM_MAX_GOTO_DUPLICATION_INSNS): New param.
	* basic-block.h (duplicate_computed_gotos): Add prototype.
	* bb-reorder.c (duplicate_computed_gotos): New function to
	duplicate sufficiently small blocks ending in a computed jump.
	* passes.c (rest_of_compilation): Call duplicate_computed_gotos
	if not optimizing for size.
	* cfgcleanup.c (try_crossjump_bb): If not optimizing for size,
	never do tail merging for blocks ending in a computed jump.
	* doc/invoke.texi: Document the max-goto-duplication-insns param.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7359&r2=2.7360
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/basic-block.h.diff?cvsroot=gcc&r1=1.236&r2=1.237
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/bb-reorder.c.diff?cvsroot=gcc&r1=1.88&r2=1.89
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cfgcleanup.c.diff?cvsroot=gcc&r1=1.138&r2=1.139
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/params.def.diff?cvsroot=gcc&r1=1.53&r2=1.54
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/passes.c.diff?cvsroot=gcc&r1=2.64&r2=2.65
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doc/invoke.texi.diff?cvsroot=gcc&r1=1.571&r2=1.572


------- Comment #27 From Steven Bosscher 2005-02-01 10:11 -------
Should be fixed on mainline now. 

------- Comment #28 From Kazu Hirata 2005-02-07 05:28 -------
Is this PR fixed?  If so, please close it.

------- Comment #29 From Andrew Pinski 2005-02-07 05:32 -------
(In reply to comment #28)
> Is this PR fixed?  If so, please close it.

it is fixed on the mainline but still not fixed on the 3.4 branch as this is a regression we should keep it 
open until The RM says otherwise.

------- Comment #30 From anton@mips.complang.tuwien.ac.at 2005-02-27 10:47 -------
Subject: Re:  [3.3/3.4/4.0 regression] pessimization of "goto *"

steven at gcc dot gnu dot org wrote:
> Updated patch for this problem

Ok, I have now tried it with gcc-4.0-20050220 and gforth-0.6.2 on i386
and it works ok.  There is a minor remaining regression related to this
issue:

gcc-4.0 compiles "goto **(ip-1);" into

0x804b50d <engine+1341>:        mov    0xfffffffc(%ebx),%eax
0x804b510 <engine+1344>:        jmp    *%eax

gcc-2.95 compiles it into

0x804b103 <engine+227>: jmp    *0xfffffffc(%ebx)

which looks better.

Maybe there should be another combining pass after the duplication of the indirect jumps.  Should I create another PR for this?

Here are updated timing results (gforth-0.6.2 configured with
--enable-force-reg):

Pentium 4 2.26GHz; numbers are times in seconds user time
                                                  gcc-4.0-20050220 
	   gcc-2.95.3	   gcc-3.3	gcc-3.4.0 default no-reorder-blocks
	dynamic no-dyn	dynamic	no-dyn	no-dyn	  dynamic dynamic
siev    0.24    0.48    0.31    0.47    0.50      0.37    0.28
bubble  0.30	0.78	0.36	0.77	0.78	  0.40	  0.36
matrix  0.19	0.94	0.17	0.92	0.96	  0.18	  0.18
fib     0.34	0.57	0.41	0.58	0.59	  0.43	  0.40

- anton



------- Comment #31 From Steven Bosscher 2005-03-10 12:48 -------
> Maybe there should be another combining pass after the duplication
> of the indirect jumps.  Should I create another PR for this?

There should not be another "combining" pass (you really mean constant
propagation).  This new unfactoring stuff runs after register allocation,
so such a pass would not really help, except maybe to make the code look
prettier to you.

But, is this:

        mov    0xfffffffc(%ebx),%eax
        jmp    *%eax

slower than this:

        jmp    *0xfffffffc(%ebx)

or have you not tried that (e.g. by hacking the assembly by hand)?

------- Comment #32 From anton@mips.complang.tuwien.ac.at 2005-03-12 21:38 -------
Subject: Re:  [3.3/3.4 regression] pessimization of "goto *"

steven at gcc dot gnu dot org wrote:
> 
> 
> ------- Additional Comments From steven at gcc dot gnu dot org  2005-03-10 12:48 -------
> > Maybe there should be another combining pass after the duplication
> > of the indirect jumps.  Should I create another PR for this?
> 
> There should not be another "combining" pass (you really mean constant
> propagation).

I meant "Instruction combination (`combine.c')".  Not sure if this is
replaced by something else in the recent gccs.  Why do you think I
mean constant propagation?

>  This new unfactoring stuff runs after register allocation,
> so such a pass would not really help, except maybe to make the code look
> prettier to you.

Ouch.  No way to fix that?  That's the cost we wanted to avoid.

> But, is this:
> 
>         mov    0xfffffffc(%ebx),%eax
>         jmp    *%eax
> 
> slower than this:
> 
>         jmp    *0xfffffffc(%ebx)
> 
> or have you not tried that (e.g. by hacking the assembly by hand)?

Ok, I hacked the assembly by hand, and this is what I got:

All numbers are user times in seconds for gforth-fast-0.6.2:

Pentium-4 2.26 GHz (i386 code):
           no-dynamic  no-super   dynamic    
combined?  yes  no     yes  no	  yes  no    
siev       0.47 0.49   0.36 0.36  0.33 0.33  
bubble     0.81 0.81   0.52 0.53  0.47 0.47  
matrix     1.03 1.01   0.30 0.30  0.36 0.35  
fib        0.70 0.68   0.75 0.60  0.53 0.58  

Opteron 2GHz (i386 code):
           no-dynamic  no-super   dynamic    
combined?  yes  no     yes  no	  yes  no    
siev       0.46 0.47   0.37 0.36  0.33 0.32
bubble     0.73	0.74   0.50 0.51  0.50 0.51
matrix     0.93	0.95   0.35 0.34  0.31 0.32
fib        0.63	0.64   0.49 0.50  0.44 0.45

"No-super" performs the same number of indirect branches (and anything
else) as "no-dynamic", but has better branch prediction.  "Dynamic" is
like "no-super", but eliminates many of the indirect branches.

So, overall the instruction combination alone does not make much of a
difference on these CPUs.

- anton


------- Comment #33 From Steven Bosscher 2005-03-12 21:54 -------
Subject: Re:  [3.3/3.4 regression] pessimization of "goto *"

Combine runs before register allocation.  You cannot run it after
register allocation.  I don't think you can run it twice, even.

And no, after register allocation you are not magically going to
win back that register.  Tough luck.

Given your numbers, replacing the reg with its known value might
still be a win.  But I'm not going to do it ;-)

------- Comment #34 From Steven Bosscher 2005-10-07 14:09 -------
This is basically unfixable without serious hacking that is not appropriate for
GCC 3.4, so Gaby, with your permission, I'd like to close this one as
WONTFIX...

------- Comment #35 From Gabriel Dos Reis 2005-10-07 20:00 -------
Subject: Re:  [3.4 regression] pessimization of "goto *"

"steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

| This is basically unfixable without serious hacking that is not
| appropriate for GCC 3.4, so Gaby, with your permission, I'd like to
| close this one as WONTFIX...

I agree with your analysis.  

Thanks,

-- Gaby

------- Comment #36 From Steven Bosscher 2005-10-07 20:06 -------
Not really fixable for 3.4.

------- Comment #37 From Andrew Pinski 2006-04-26 19:13 -------
Reopening to close as ...

------- Comment #38 From Andrew Pinski 2006-04-26 19:14 -------
Fixed in 4.0.0.

------- Comment #39 From Andrew Pinski 2006-04-26 19:14 -------
*** Bug 8092 has been marked as a duplicate of this bug. ***

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug