This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
This is essentially the 3.4.0 incarnation of the problem discussed at the end of PR8092. mmitchel suggested to create a new bug report, so here it is: Code example and observed behaviour ----------------------------------- In the attached code, the fragment between I_question_branch and J_question_branch_lp_plus_store_number contains the following code (slightly edited for readability): I_question_branch: { Cell * a_target; Bool f; ; (( a_target )=(Cell *)( ( (* (ip) ) ) )) ; (( f )=(Bool)( (sp[0]) )) ; ({ ip+=( 1 );}) ; sp += 1; { if (f==0) { ({ip=( (Xt *)a_target ); ;}) ; ; (ip++) ; ; ({asm("":"=X"(cfa)); goto **(ip-1);}) ; } ; } ; (ip++) ; ; K_question_branch : ({asm("":"=X"(cfa)); goto **(ip-1);}) ; } J_question_branch_lp_plus_store_number : When the file is compiled with gcc-3.4.0 -I../../gforth-0.6.2/engine/../arch/386 -I. -Wall -O2 -fomit-frame-pointer -fforce-addr -fforce-mem -march=pentium -DHAVE_CONFIG_H -DDEFAULTPATH='".:/usr/local/lib/gforth/site-forth:/usr/local/share/gforth/site-forth:/usr/local/lib/gforth/0.6.2:/usr/local/share/gforth/0.6.2"' -fno-gcse -fno-strict-aliasing -fno-defer-pop -fcaller-saves -fno-inline -S ef.i -o ef-3.4.0-def.s this produces the following code: .L20: movl (%edi), %eax movl (%ebp), %edx addl $4, %edi addl $4, %ebp testl %eax, %eax jne .L1073 leal 4(%edx), %ebp movl -4(%ebp), %eax jmp .L1373 .L739: ... #actually the following fragment comes earlier .L1073: addl $4, %ebp .L376: movl -4(%ebp), %eax jmp .L1373 ... #the following fragment comes even earlier. .L1373: jmp *%eax What is the problem? -------------------- The main problem here is that the code jumps to a shared "jmp *eax" instead of just putting "jmp *eax" (or, in this case, preferably "jmp *-4(%ebp)") where the jump to the shared indirect jump is. All indirect jumps are converted into jumps to a single shared indirect jump in this way. This obstructs effective usage of the BTB for branch prediction on CPUs such as the Athlon, Opteron, and Pentium III (see below), and most interpreters using GCC's labels-as-values extension will be affected by this. It also makes it impossible for us to use dynamic superinstructions portably (aka selective inlining, PLDI'98 p. 291). I could eventually find ways to work around the problems we have had with 3.2 and 3.3, but this time I don't see a workaround (and a quick look for new options revealed nothing). A smaller problem is that the basic blocks are distributed throughout the code unless we use -fno-reorder-blocks; this reduces the number of cases where we can apply dynamic superinstructions. For gcc-3.3, using -fno-crossjumping suppressed the main problem, but not in combination with -fno-reorder-blocks. So if you fix the main problem, it would be nice if it also worked with -fno-reorder-blocks. How much does this cost in performance? --------------------------------------- Here are the results of doing a configure --enable-force-reg CC=...; make; make onebench ENGINE_FAST="./gforth-fast --dynamic"; make onebench ENGINE_FAST="./gforth-fast --no-dynamic" for gforth-0.6.2 with various compilers (The ef.i file used for the code example above was also created from gforth-0.6.2, with gcc-2.95 and without --enable-force-reg): Pentium III 1000MHz; numbers are times in seconds user time gcc-2.95.3 gcc-3.3 gcc-3.4.0 dynamic no-dyn dynamic no-dyn no-dyn siev 0.56 0.85 0.67 0.90 2.31 bubble 0.72 1.28 0.93 1.35 2.68 matrix 0.35 1.43 0.35 1.38 1.90 fib 0.76 1.11 0.95 1.27 2.90 Here you see a factor 3.7-5.4 between the best result (gcc-2.95 dynamic) and the gcc-3.4.0 result for matrix. This slowdown has two main components: - we cannot apply the dynamic optimizations (and I don't see a portable way to work around this problem), resulting in a slowdown factors of 1.5-4. - from the shared indirect jump; the jump to this indirect jump costs something, but the more important component of the slowdown is probably the reduction in branch prediction accuracy for the indirect jumps; the resulting slowdown is about a factor of 1.4-2.5. On the Athlon I have seen similar results in the past. Here are results from a Pentium 4: Pentium 4 2.26GHz; numbers are times in seconds user time gcc-2.95.3 gcc-3.3 gcc-3.4.0 dynamic no-dyn dynamic no-dyn no-dyn siev 0.24 0.48 0.31 0.47 0.50 bubble 0.30 0.78 0.36 0.77 0.78 matrix 0.19 0.94 0.17 0.92 0.96 fib 0.34 0.57 0.41 0.58 0.59 On the Pentium 4 the second component apparently does not play a big role, probably due to the effects of the trace cache, but the first component results in even higher slowdowns (1.7-5) than on the PentiumIII. Do flag variations change the problem? -------------------------------------- Adding -fno-reorder-buffers moves .L1073 right below the first "jmp .L1373" (as it should), but the jumps to .L1373 are still there. Adding -fno-crossjumping in either case changes essentially nothing. Reducing the command line to gcc-3.4.0 -Wall -O2 -fomit-frame-pointer -S ef.i -o ef-3.4.0-nofs.s produces similar results (but the register allocation is worse). Suggested fix ------------- AFAIK gcc introduces the shared indirect jump for the benefit of data-flow analysis. In gcc-3.3 with some combinations of flags this change was undone later (but the pass that did this was suppressed with -fno-reorder-jumps). In 3.4.0 the undoing is apparently completely disabled. One fix would be to put the undoing into a pass that is invoked after the data-flow analysis, but before the last combining pass, and that is independent of flags such as -fno-reorder-blocks (maybe a separate pass?). An alternative would be to produce the desired control-flow graph for data-flow analysis without changing the code (i.e., without introducing shared jumps). I might give writing one of these approaches pass a try, but would prefer to see it done by people who know more about gcc internals than I do. In case I do it, I would apreciate feedback and suggestions on how to proceed. - anton
Created an attachment (id=6206) [edit] test input (file ef.i mentioned in the bug report)
Note jumping out of a statement expression is really illegal and should not be done.
One more thing statement expressions should end with a non-void expression.
Subject: Re: pessimization of "goto *" pinskia at gcc dot gnu dot org wrote: > Note jumping out of a statement expression is really illegal and should not be done. Statement expressions are used here just to produce a way to include a statement sequence in a macro. Do you recommend that we should switch from ({ ... }) to do { ... } while (0) ? Or do you have a different suggestion? BTW, looking in the gcc-3.4.0 manual, I don't see the restrictions you mention, and gcc-3.4.0 -Wall also does not complain about this usage. Actually the manual explicitly says: |(If you use some other kind of statement last within the braces, the |construct has type `void', and thus effectively no value.) which I would interpret as "Anything goes, as long as you don't use the value of the statement expression". - anton
The reason why I say this is because of on the tree-ssa, it does warn about using about void statements as the end, and on the tree-ssa right now ICEs on the tree-ssa, see PR 15243. Also this is not documented there is another bug for that, PR 772.
Subject: Re: pessimization of "goto *" pinskia at gcc dot gnu dot org wrote: > > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-05-01 15:13 ------- > The reason why I say this is because of on the tree-ssa, it does warn about using about void statements > as the end, and on the tree-ssa right now ICEs on the tree-ssa, see PR 15243. Also this is not > documented there is another bug for that, PR 772. And that refers to a discussion in the mailing list, and the only mention of a problem with jumping out that I found there was in <http://gcc.gnu.org/ml/gcc/2000-07/msg00721.html>. For this example the semantics seem pretty obvious and well-defined to me (in contrast to the jump-into example), and this was just a compiler bug. If you find this bug hard to fix, it's ok with me if you document that jumping out does not work and give a warning (and eventually an error) on compiling it, and I will work around this restriction. If you want, I can also prepare a version of the test file that uses do...while(0) instead of statement expressions in some places. - anton
Subject: Re: pessimization of "goto *" On Sun, 2 May 2004, anton at a0 dot complang dot tuwien dot ac dot at wrote: > And that refers to a discussion in the mailing list, and the only > mention of a problem with jumping out that I found there was in > <http://gcc.gnu.org/ml/gcc/2000-07/msg00721.html>. For this example > the semantics seem pretty obvious and well-defined to me (in contrast > to the jump-into example), and this was just a compiler bug. The semantics are neither obvious nor well-defined, because other parts of the larger expression containing the statement expression might or might not have been evaluated. The only thing in C giving rise to such a situation (where parts of an expression might or might not have been evaluated when a jump occurs) is setjmp/longjmp, and you should be able to use longjmp in statement expressions; goto can't have such effects in C so it's dubious allowing them in GNU C.
Subject: Re: pessimization of "goto *" jsm at polyomino dot org dot uk wrote: > > > ------- Additional Comments From jsm at polyomino dot org dot uk 2004-05-02 19:06 ------- > Subject: Re: pessimization of "goto *" > > On Sun, 2 May 2004, anton at a0 dot complang dot tuwien dot ac dot at wrote: > > > And that refers to a discussion in the mailing list, and the only > > mention of a problem with jumping out that I found there was in > > <http://gcc.gnu.org/ml/gcc/2000-07/msg00721.html>. For this example > > the semantics seem pretty obvious and well-defined to me (in contrast > > to the jump-into example), and this was just a compiler bug. > > The semantics are neither obvious nor well-defined, because other parts of > the larger expression containing the statement expression might or might > not have been evaluated. The example is: a: f ("%d\n", ({goto a;}) The only other part that might or might not have been evaluated is the "%d\n"; since this has no side effect, it does not matter whether it is evaluated or not. So the obvious (to me) and well-defined semantics of this example is to be equivalent to a: goto a; In general, true, if the other parts of the expression have side effects, the whole expression is undefined, but that's just the same as in other cases with side effects in expression evaluation, so the semantics still seem obvious to me. > The only thing in C giving rise to such a > situation (where parts of an expression might or might not have been > evaluated when a jump occurs) is setjmp/longjmp, and you should be able to > use longjmp in statement expressions; goto can't have such effects in C so > it's dubious allowing them in GNU C. Variable definitions also give rise to such situations (where parts of an expression might or might not have been evaluated when the definition occurs), and this may also lead to undefined behaviour, and this cannot happen in C. Does this make allowing variable definitions in statement expressions dubious in GNU C? I don't think so (after all, they are used in the first example for statement expressions), and I don't think that this kind of argument is valid for other things that cannot occur in C expressions, either. If only things were allowed in statement expressions that are allowed in C expressions, then statement expressions would be redundant; you could simply write "(expr, expr, expr)" instead of "({expr; expr; expr;})". - anton
Confirmed a reduced testcase: int f(int i, int j) { static void *a[5] = {&&b,&&c,&&d,&&e,&&f}; if (i >=5) i = 0; if (i<=0)i=5; goto *a[i]; e: i++; if (i >=5) i = 0; if (i<=0)i=5; j++; goto *a[i]; f: i--; if (i >=5) i = 0; if (i<=0)i=5; j--; goto *a[i]; b: i+=2; if (i >=5) i = 0; if (i<=0)i=5; goto *a[i]; c: i*=2; if (i >=5) i = 0; if (i<=0)i=5; goto *a[i]; d:; }
Proposed patch: http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01526.html
Newest Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01670.html>.
Subject: Re: pessimization of "goto *" pinskia at gcc dot gnu dot org wrote: > Newest Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01670.html>. I tried this patch with gcc-3.5-20040523, and the problem was still there, for both the reduced test case, and a variant (work around PR 15243) of the original testcase. I also tried this patch with gcc-core-3.5-20040704, and it applies cleanly, but then produces a compilation error: ../../gcc-3.5-20040704/gcc/bb-reorder.c: In function `duplicate_computed_gotos': ../../gcc-3.5-20040704/gcc/bb-reorder.c:1989: error: structure has no member named `computed_goto_common_label' make[2]: *** [bb-reorder.o] Error 1 make[2]: Leaving directory `/nfs/nfstmp/anton/gcc-patched/gcc-3.5-20040704-i386/gcc' - anton
It looks like you can remove: + if (!cfun->computed_goto_common_label) + return; from the patch.
Subject: Re: pessimization of "goto *" > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-07-06 22:00 ------- > It looks like you can remove: > + if (!cfun->computed_goto_common_label) > + return; > > from the patch. cfun->computed_goto_common_label can be replaced by cfun->has_computed_jump. The test is there not to scan the function if the computed goto is not used in the function. Josef
> I tried this patch with gcc-3.5-20040523, and the problem was still there The optimization is enabled at -O2 (or -O -freorder-blocks). If there is a uncond jump to a block containing a computed jump and the block is small enough (several instructions) the block is copied to eliminate the uncond jump. When I was testing the patch with current mainline I have noticed that the current_function_has_computed_jump flag is incorrectly unset in make_edges() when compiling ef.i. IMHO the flag should not be set initialized to 0 (or 1 if flag_reorder_blocks_and_partition) in make_edges() but somewhere earlier because make_edges() can be called several times and only a subset of blocks may be scanned for computed jumps in it.
Subject: Re: pessimization of "goto *" zlomek at gcc dot gnu dot org wrote: > The optimization is enabled at -O2 Not as far as I can tell (tried it on i386 with patched versions of gcc-3.5-20040704 and gcc-3.5-20040523). > (or -O -freorder-blocks). That seems to work, though. We would like to use -fno-reorder-blocks with duplicate_computed_gotos, either through an extra option, or by just having duplicate_computed_gotos all the time (at least for our code these gotos were duplicated already in the source code, so duplicating them again in the back end is closer to the idea of -fno-reorder-blocks than not duplicating them). > If there is a uncond jump to a block containing a computed jump and the block is > small enough (several instructions) the block is copied to eliminate the uncond > jump. > > When I was testing the patch with current mainline I have noticed that the > current_function_has_computed_jump flag is incorrectly unset in make_edges() > when compiling ef.i. Ok, that explains why this patch does not work for ef.i. One other thing I noticed is that apparently there is no combining pass (or its tree-ssa equivalent) after duplicate_computed_gotos, resulting in compiling code like goto *a[i]; into movl a.0(,%ebx,4), %eax jmp *%eax instead of jmp *a.0(,%ebx,4) Thanks for your efforts, - anton
Subject: Re: pessimization of "goto *" > > The optimization is enabled at -O2 > Not as far as I can tell (tried it on i386 with patched versions of > gcc-3.5-20040704 and gcc-3.5-20040523). Yes, it is enabled by the patch. The patch duplicates block with computed goto to place where a uncond jump to the conputed goto block is if the block is small enough. The computed jump in the small testcase is not duplicated, because the uncond jumps from other places do not jump to a basic block which contains the computed jump but to other block: f: pushl %ebp movl %esp, %ebp pushl %ebx movl 8(%ebp), %ebx .p2align 2,,3 .L37: cmpl $4, %ebx jle .L39 .L34: movl $5, %ebx .L9: movl a.1118(,%ebx,4), %eax jmp *%eax .p2align 2,,3 .L39: testl %ebx, %ebx jg .L9 jmp .L34 .p2align 2,,3 .L3: popl %ebx leave ret .p2align 2,,3 .L2: sall %ebx jmp .L37 .p2align 2,,3 .L1: addl $2, %ebx jmp .L37 .p2align 2,,3 .L5: decl %ebx jmp .L37 .p2align 2,,3 .L4: incl %ebx jmp .L37 However, I see that after the L39 there is a cond jump to the computed jump block. This is not handled by my patch but could be. > We would like to use -fno-reorder-blocks with > duplicate_computed_gotos, either through an extra option, or by just > having duplicate_computed_gotos all the time (at least for our code > these gotos were duplicated already in the source code, so duplicating > them again in the back end is closer to the idea of > -fno-reorder-blocks than not duplicating them). Adding an new flag is possible. However, I do not see why you explicitelly want to disable reorder-blocks. > One other thing I noticed is that apparently there is no combining > pass (or its tree-ssa equivalent) after duplicate_computed_gotos, > resulting in compiling code like > > goto *a[i]; > > into > > movl a.0(,%ebx,4), %eax > jmp *%eax > > instead of > > jmp *a.0(,%ebx,4) The combiner should have already combined these insns before the duplication of computed gotos is run. This pass just duplicates some insns. Josef
Subject: Re: pessimization of "goto *" zlomj9am at artax dot karlin dot mff dot cuni dot cz wrote: > > > The optimization is enabled at -O2 > > Not as far as I can tell (tried it on i386 with patched versions of > > gcc-3.5-20040704 and gcc-3.5-20040523). > > Yes, it is enabled by the patch. The patch duplicates block with computed goto > to place where a uncond jump to the conputed goto block is if the block is > small enough. > The computed jump in the small testcase is not duplicated, because the uncond > jumps from other places do not jump to a basic block which contains the > computed jump but to other block: ... Thanks for opening my eyes. > > We would like to use -fno-reorder-blocks with > > duplicate_computed_gotos, either through an extra option, or by just > > having duplicate_computed_gotos all the time (at least for our code > > these gotos were duplicated already in the source code, so duplicating > > them again in the back end is closer to the idea of > > -fno-reorder-blocks than not duplicating them). > > Adding an new flag is possible. However, I do not see why you explicitelly want > to disable reorder-blocks. We need it for the dynamic superinstruction optimization (aka selective inlining, PLDI'98 p. 291) in Gforth and further optimizations building on that (other people working on efficient interpreters/dynamic code generators have the same problem; e.g., I recently talked to Etienne Gagnon (http://sablevm.org/) about this; qemu also uses -fno-reorder-blocks). For dynamic superinstructions we need to ensure that the machine code corresponding to source code between two labels is between the addresses that gcc produces for the labels. The flag -fno-reorder-blocks almost gives us that (but the sharing of the "goto *"s breaks that). Even for just plain interpreters (without dynamic superinstructions), the sharing of the "goto *"s resulting from the merging of code after conditional jumps (as shown in the simplified testcase) will often reduce performance significantly through lower BTB accuracy. So for performance one probably wants to use -fno-reorder-blocks there, unless this results in merging the "goto *"s, as it does now. > > One other thing I noticed is that apparently there is no combining > > pass (or its tree-ssa equivalent) after duplicate_computed_gotos, > > resulting in compiling code like > > > > goto *a[i]; > > > > into > > > > movl a.0(,%ebx,4), %eax > > jmp *%eax > > > > instead of > > > > jmp *a.0(,%ebx,4) > > The combiner should have already combined these insns before the duplication of > computed gotos is run. This pass just duplicates some insns. My guess is that the combiner does not combine these insns before duplication, because they are in different basic blocks before duplication. Without duplication the code would look like: movl a.0(,%ebx,4), %eax jmp .L9 ... .L9: jmp *%eax - anton
Subject: Re: pessimization of "goto *" > > Adding an new flag is possible. However, I do not see why you explicitelly want > > to disable reorder-blocks. > > We need it for the dynamic superinstruction optimization (aka > selective inlining, PLDI'98 p. 291) in Gforth and further > optimizations building on that (other people working on efficient > interpreters/dynamic code generators have the same problem; e.g., I > recently talked to Etienne Gagnon (http://sablevm.org/) about this; > qemu also uses -fno-reorder-blocks). > > For dynamic superinstructions we need to ensure that the machine code > corresponding to source code between two labels is between the > addresses that gcc produces for the labels. The flag > -fno-reorder-blocks almost gives us that (but the sharing of the > "goto *"s breaks that). I see. > Even for just plain interpreters (without dynamic superinstructions), > the sharing of the "goto *"s resulting from the merging of code after > conditional jumps (as shown in the simplified testcase) will often > reduce performance significantly through lower BTB accuracy. So for > performance one probably wants to use -fno-reorder-blocks there, > unless this results in merging the "goto *"s, as it does now. I see. So duplication of computed jumps probably should be at -O and have a special flag too. Josef
On Wednesday 07 July 2004 06:49, zlomj9am at artax dot karlin dot mff dot cuni dot cz wrote: > cfun->computed_goto_common_label can be replaced by > cfun->has_computed_jump. The test is there not to scan the function if the > computed goto is not used in the function. Since your patch is still not in the gcc 4.0.0-xxx codebase, I've modified it to work with the current CVS version. There's still the problem that the jump does not combine with the previous instruction, and that therefore one valuable (especially on x86!) register is used up. As trigger condition for goto duplication I used the -fno-crossjumping switch, together with optimization. I've put the patch online under http://www.jwdt.com/~paysan/gcc-4.0.0.patch
Bernd, the patch needs to be posted to gcc-patches@ after a full bootstrap & test cycle to gather attention. Probably Josef should take care of this (being the assigneed of this bug).
I have a patch that fixes this. I need to figure out a way to do it in a way that doesn't require you to pass "-fno-crossjumping -fno-gcse" but I'll figure out something...
Comes from factoring computed gotos, so this is a code quality regression.
Created an attachment (id=8079) [edit] Updated patch for this problem
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2005-01/msg02001.html>.
Subject: Bug 15242 CVSROOT: /cvs/gcc Module name: gcc Changes by: steven@gcc.gnu.org 2005-02-01 10:03:27 Modified files: gcc : ChangeLog basic-block.h bb-reorder.c cfgcleanup.c params.def passes.c gcc/doc : invoke.texi Log message: PR optimization/15242 * params.def (PARAM_MAX_GOTO_DUPLICATION_INSNS): New param. * basic-block.h (duplicate_computed_gotos): Add prototype. * bb-reorder.c (duplicate_computed_gotos): New function to duplicate sufficiently small blocks ending in a computed jump. * passes.c (rest_of_compilation): Call duplicate_computed_gotos if not optimizing for size. * cfgcleanup.c (try_crossjump_bb): If not optimizing for size, never do tail merging for blocks ending in a computed jump. * doc/invoke.texi: Document the max-goto-duplication-insns param. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7359&r2=2.7360 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/basic-block.h.diff?cvsroot=gcc&r1=1.236&r2=1.237 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/bb-reorder.c.diff?cvsroot=gcc&r1=1.88&r2=1.89 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cfgcleanup.c.diff?cvsroot=gcc&r1=1.138&r2=1.139 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/params.def.diff?cvsroot=gcc&r1=1.53&r2=1.54 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/passes.c.diff?cvsroot=gcc&r1=2.64&r2=2.65 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doc/invoke.texi.diff?cvsroot=gcc&r1=1.571&r2=1.572
Should be fixed on mainline now.
Is this PR fixed? If so, please close it.
(In reply to comment #28) > Is this PR fixed? If so, please close it. it is fixed on the mainline but still not fixed on the 3.4 branch as this is a regression we should keep it open until The RM says otherwise.
Subject: Re: [3.3/3.4/4.0 regression] pessimization of "goto *" steven at gcc dot gnu dot org wrote: > Updated patch for this problem Ok, I have now tried it with gcc-4.0-20050220 and gforth-0.6.2 on i386 and it works ok. There is a minor remaining regression related to this issue: gcc-4.0 compiles "goto **(ip-1);" into 0x804b50d <engine+1341>: mov 0xfffffffc(%ebx),%eax 0x804b510 <engine+1344>: jmp *%eax gcc-2.95 compiles it into 0x804b103 <engine+227>: jmp *0xfffffffc(%ebx) which looks better. Maybe there should be another combining pass after the duplication of the indirect jumps. Should I create another PR for this? Here are updated timing results (gforth-0.6.2 configured with --enable-force-reg): Pentium 4 2.26GHz; numbers are times in seconds user time gcc-4.0-20050220 gcc-2.95.3 gcc-3.3 gcc-3.4.0 default no-reorder-blocks dynamic no-dyn dynamic no-dyn no-dyn dynamic dynamic siev 0.24 0.48 0.31 0.47 0.50 0.37 0.28 bubble 0.30 0.78 0.36 0.77 0.78 0.40 0.36 matrix 0.19 0.94 0.17 0.92 0.96 0.18 0.18 fib 0.34 0.57 0.41 0.58 0.59 0.43 0.40 - anton
> Maybe there should be another combining pass after the duplication > of the indirect jumps. Should I create another PR for this? There should not be another "combining" pass (you really mean constant propagation). This new unfactoring stuff runs after register allocation, so such a pass would not really help, except maybe to make the code look prettier to you. But, is this: mov 0xfffffffc(%ebx),%eax jmp *%eax slower than this: jmp *0xfffffffc(%ebx) or have you not tried that (e.g. by hacking the assembly by hand)?
Subject: Re: [3.3/3.4 regression] pessimization of "goto *" steven at gcc dot gnu dot org wrote: > > > ------- Additional Comments From steven at gcc dot gnu dot org 2005-03-10 12:48 ------- > > Maybe there should be another combining pass after the duplication > > of the indirect jumps. Should I create another PR for this? > > There should not be another "combining" pass (you really mean constant > propagation). I meant "Instruction combination (`combine.c')". Not sure if this is replaced by something else in the recent gccs. Why do you think I mean constant propagation? > This new unfactoring stuff runs after register allocation, > so such a pass would not really help, except maybe to make the code look > prettier to you. Ouch. No way to fix that? That's the cost we wanted to avoid. > But, is this: > > mov 0xfffffffc(%ebx),%eax > jmp *%eax > > slower than this: > > jmp *0xfffffffc(%ebx) > > or have you not tried that (e.g. by hacking the assembly by hand)? Ok, I hacked the assembly by hand, and this is what I got: All numbers are user times in seconds for gforth-fast-0.6.2: Pentium-4 2.26 GHz (i386 code): no-dynamic no-super dynamic combined? yes no yes no yes no siev 0.47 0.49 0.36 0.36 0.33 0.33 bubble 0.81 0.81 0.52 0.53 0.47 0.47 matrix 1.03 1.01 0.30 0.30 0.36 0.35 fib 0.70 0.68 0.75 0.60 0.53 0.58 Opteron 2GHz (i386 code): no-dynamic no-super dynamic combined? yes no yes no yes no siev 0.46 0.47 0.37 0.36 0.33 0.32 bubble 0.73 0.74 0.50 0.51 0.50 0.51 matrix 0.93 0.95 0.35 0.34 0.31 0.32 fib 0.63 0.64 0.49 0.50 0.44 0.45 "No-super" performs the same number of indirect branches (and anything else) as "no-dynamic", but has better branch prediction. "Dynamic" is like "no-super", but eliminates many of the indirect branches. So, overall the instruction combination alone does not make much of a difference on these CPUs. - anton
Subject: Re: [3.3/3.4 regression] pessimization of "goto *" Combine runs before register allocation. You cannot run it after register allocation. I don't think you can run it twice, even. And no, after register allocation you are not magically going to win back that register. Tough luck. Given your numbers, replacing the reg with its known value might still be a win. But I'm not going to do it ;-)
This is basically unfixable without serious hacking that is not appropriate for GCC 3.4, so Gaby, with your permission, I'd like to close this one as WONTFIX...
Subject: Re: [3.4 regression] pessimization of "goto *" "steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: | This is basically unfixable without serious hacking that is not | appropriate for GCC 3.4, so Gaby, with your permission, I'd like to | close this one as WONTFIX... I agree with your analysis. Thanks, -- Gaby
Not really fixable for 3.4.
Reopening to close as ...
Fixed in 4.0.0.
*** Bug 8092 has been marked as a duplicate of this bug. ***