This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][0/n] Merge from match-and-simplify
- From: Richard Biener <rguenther at suse dot de>
- To: Sebastian Pop <sebpop at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 17 Oct 2014 09:55:52 +0200 (CEST)
- Subject: Re: [PATCH][0/n] Merge from match-and-simplify
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LSU dot 2 dot 11 dot 1410151450430 dot 20733 at zhemvz dot fhfr dot qr> <20141016203852 dot GB29134 at f1 dot c dot bardezibar dot internal>
On Thu, 16 Oct 2014, Sebastian Pop wrote:
> Richard Biener wrote:
> >
> > I have posted 5 patches as part of a larger series to merge
> > (parts) from the match-and-simplify branch. While I think
> > there was overall consensus that the idea behind the project
> > is sound there are technical questions left for how the
> > thing should look in the end. I've raised them in 3/n
> > which is the only patch of the series that contains any
> > patterns sofar.
> >
> > To re-iterate here (as I expect most people will only look
> > at [0/n] patches ;)), the question is whether we are fine
> > with making fold-const (thus fold_{unary,binary,ternary})
> > not handle some cases it handles currently.
>
> I have tested on aarch64 all the code in the match-and-simplify against trunk as
> of the last merge at r216315:
>
> 2014-10-16 Richard Biener <rguenther@suse.de>
>
> Merge from trunk r216235 through r216315.
>
> Overall, I see a lot of perf regressions (about 2/3 of the tests) than
> improvements (1/3 of the tests). I will try to reduce tests.
Note that the branch goes much further in exercising the machinery
than I want to merge at this point (that applies mostly to all
passes using the SSA propagator such as CCP and VRP and passes
exercising value-numbering - FRE and PRE).
It may also simply show the effect of now folding all statements
from tree-ssa-forwprop.c. I have yet to investigate the testsuite
fallout of [1/n] to [5/n] - testresults have been very noisy lately
due to the C11 change and now ICF.
> For instance, saxpy regresses at -O3 on aarch64:
>
> void saxpy(double* x, double* y, double* z) {
> int i=0;
> for (i = 0 ; i < ARRAY_SIZE; i++) {
> z[i] = x[i] + scalar*y[i];
> }
> }
>
> $ diff -u base.s mas.s
> --- base.s 2014-10-16 15:30:15.351430000 -0500
> +++ mas.s 2014-10-16 15:30:16.183035000 -0500
> @@ -2,12 +2,14 @@
> add x1, x2, 800
> ldr q0, [x0, x2]
> add x3, x2, 1600
> + cmp x0, 784
> ldr q1, [x0, x1]
> + add x1, x0, 16
> fmla v0.2d, v1.2d, v2.2d
> str q0, [x0, x3]
> - add x0, x0, 16
> - cmp x0, 800
> + mov x0, x1
> bne .L140
> .LBE179:
> - subs w4, w4, #1
> + cmp w4, 1
> + sub w4, w4, #1
> bne .L139
I don't understand AARCH64 assembly very well but the above looks like
RTL issues and/or IVOPTs issues?
Thanks for doing performance measurements.
Richard.