This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch] Improving jump-thread pass for PR 54742
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Sebastian Pop <sebpop at gmail dot com>
- Cc: Jeff Law <law at redhat dot com>, James Greenhalgh <james dot greenhalgh at arm dot com>, Steve Ellcey <sellcey at mips dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 25 Nov 2014 10:37:10 +0100
- Subject: Re: [Patch] Improving jump-thread pass for PR 54742
- Authentication-results: sourceware.org; auth=none
- References: <CAFiYyc1bdsYrgf3=7Qttdek486YYrO1D+-NJq+nH27s7MX4o-w at mail dot gmail dot com> <53FB73C0 dot 7090507 at redhat dot com> <20140926201451 dot GA16704 at f1 dot c dot bardezibar dot internal> <20141026213433 dot GA22140 at f1 dot c dot bardezibar dot internal> <20141111011404 dot GA23088 at f1 dot c dot bardezibar dot internal> <CAFiYyc0xL4d=u-9-61U_bV64HUAfhiJu6V+zdkRdJSZcrj0U+Q at mail dot gmail dot com> <20141118221933 dot GA7298 at f1 dot c dot bardezibar dot internal> <5470E495 dot 7070601 at redhat dot com> <20141123222207 dot GA24073 at f1 dot c dot bardezibar dot internal> <54739C7C dot 7050603 at redhat dot com> <20141124220514 dot GA13630 at f1 dot c dot bardezibar dot internal>
On Mon, Nov 24, 2014 at 11:05 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Jeff Law wrote:
>> On 11/23/14 15:22, Sebastian Pop wrote:
>> >The second patch attached limits the search for FSM jump threads to loops. With
>> >that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap
>> >(and 424 jump threads on powerpc64-linux bootstrap.)
>> >
>> Yea, that was one of the things I was going to poke at as well as a
>> quick scan of your patch gave me the impression it wasn't limited to
>> loops.
>>
>> Again, I haven't looked much at the patch, but I got the impression
>> you're doing a backwards walk through the predecessors to discover
>> the result of the COND_EXPR. Correct?
>
> Yes.
>
>>
>> That's something I'd been wanting to do -- basically start with a
>> COND_EXPR, then walk the dataflow backwards substituting values into
>> the COND_EXPR (possibly creating non-gimple). Ultimately the goal
>> is to substitute and fold, getting to a constant :-)
>>
>> The forward exhaustive stuff we do now is, crazy. The backwards
>> approach could be decoupled from DOM & VRP into an independent pass,
>> which I think would be wise.
>>
>> Using a SEME region copier is also something I really wanted to do
>> long term. In fact, I believe a lot of tree-ssa-threadupdate.c
>> ought to be ripped out and replaced with a SEME based copier.
>
> I did an experiment around these lines over the week-end, and now that you
> mention it, I feel less shy to speak about; well the patch does not yet pass
> bootstrap, and there still are about 20 failing test-cases. I feel better
> reading the code generation part of jump-threading after this patch ;-)
> Basically I think all the tree-ssa-threadupdate.c can be replaced by
> duplicate_seme_region that generalizes the code generation.
Btw I once thought about doing on-the-fly lattice use/update and folding
during basic-block copying (or even re-generating expressions via
simplifying gimple_build ()). Or have a substitute-and-fold like
facility that can run on SEME regions and do this.
Richard.
>> It appears you've built at least parts of two pieces needed to all
>> this as a Bodik style optimizer. Which is exactly the long term
>> direction I think this code ought to take.
>>
>>
>> >
>> >One of the reasons I think we see more branches is that in sese region copying we
>> >do not use the knowledge of the value of the condition for the last branch in a
>> >jump-thread path: we rely on other propagation passes to remove the branch. The
>> >last attached patch adds:
>> >
>> > /* Remove the last branch in the jump thread path. */
>> > remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest);
>> That's certainly a possibility. But I would expect that even with
>> this limitation something would be picking up the fact that the
>> branch is statically computable (even if it's an RTL optimizer).
>> But it's definitely something to look for.
>>
>> >
>> >Please let me know if the attached patches are producing better results on gcc.
>>
>> For the trunk:
>> instructions:1339016494968
>> branches :243568982489
>>
>> First version of your patch:
>>
>> instructions:1339739533291
>> branches: 243806615986
>>
>> Latest version of your patch:
>>
>> instructions:1339749122609
>> branches: 243809838262
>
> I think I got about the same results.
>
> I got my scripts installed on the gcc-farm. I first used an x86_64 gcc75 and
> valgrind was crashing not recognizing how to decode an instruction. Then I
> moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus
> compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because
> there is a bit of noise in all these numbers)
>
> $ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 ~/alias.ii
>
> all 4 patches:
>
> ==153617== I refs: 13,914,038,211
> ==153617==
> ==153617== Branches: 1,926,407,760 (1,879,827,481 cond + 46,580,279 ind)
> ==153617== Mispredicts: 144,890,904 ( 132,094,105 cond + 12,796,799 ind)
> ==153617== Mispred rate: 7.5% ( 7.0% + 27.4% )
>
> ==34993== I refs: 13,915,335,629
> ==34993==
> ==34993== Branches: 1,926,597,919 (1,880,017,558 cond + 46,580,361 ind)
> ==34993== Mispredicts: 144,974,266 ( 132,177,440 cond + 12,796,826 ind)
> ==34993== Mispred rate: 7.5% ( 7.0% + 27.4% )
>
> ==140841== I refs: 13,915,334,459
> ==140841==
> ==140841== Branches: 1,926,597,819 (1,880,017,458 cond + 46,580,361 ind)
> ==140841== Mispredicts: 144,974,296 ( 132,177,470 cond + 12,796,826 ind)
> ==140841== Mispred rate: 7.5% ( 7.0% + 27.4% )
>
> patch 1:
>
> ==99902== I refs: 13,915,069,710
> ==99902==
> ==99902== Branches: 1,926,963,813 (1,880,376,148 cond + 46,587,665 ind)
> ==99902== Mispredicts: 145,501,564 ( 132,656,576 cond + 12,844,988 ind)
> ==99902== Mispred rate: 7.5% ( 7.0% + 27.5% )
>
> ==3907== I refs: 13,915,082,469
> ==3907==
> ==3907== Branches: 1,926,965,218 (1,880,377,471 cond + 46,587,747 ind)
> ==3907== Mispredicts: 145,501,569 ( 132,656,554 cond + 12,845,015 ind)
> ==3907== Mispred rate: 7.5% ( 7.0% + 27.5% )
>
> ==44271== I refs: 13,915,111,997
> ==44271==
> ==44271== Branches: 1,926,968,863 (1,880,380,952 cond + 46,587,911 ind)
> ==44271== Mispredicts: 145,501,858 ( 132,656,789 cond + 12,845,069 ind)
> ==44271== Mispred rate: 7.5% ( 7.0% + 27.5% )
>
> master no-patch:
>
> ==129233== I refs: 13,910,221,913
> ==129233==
> ==129233== Branches: 1,925,715,095 (1,879,277,776 cond + 46,437,319 ind)
> ==129233== Mispredicts: 144,133,332 ( 131,510,534 cond + 12,622,798 ind)
> ==129233== Mispred rate: 7.4% ( 6.9% + 27.1% )
>
> ==147659== I refs: 13,910,216,249
> ==147659==
> ==147659== Branches: 1,925,714,029 (1,879,276,708 cond + 46,437,321 ind)
> ==147659== Mispredicts: 144,127,970 ( 131,505,172 cond + 12,622,798 ind)
> ==147659== Mispred rate: 7.4% ( 6.9% + 27.1% )
>
> ==155206== I refs: 13,910,201,237
> ==155206==
> ==155206== Branches: 1,925,712,267 (1,879,275,030 cond + 46,437,237 ind)
> ==155206== Mispredicts: 144,128,313 ( 131,505,542 cond + 12,622,771 ind)
> ==155206== Mispred rate: 7.4% ( 6.9% + 27.1% )
>
>
> I think that there are about 5 million more instructions executed with the first
> patch, and the other patches on top do not really help.
>
>>
>> Which is in the noise for this test. Which makes me wonder if I
>> botched something on the latest run. It doesn't appear so, but I'm
>> re-running just to be sure. I'm also turning on -g so that I can
>> use cg_annotate to poke a bit deeper and perhaps identify one or
>> more concrete examples where your patch is making this worse.
>
> Thanks,
> Sebastian