This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Enabling -ftree-slp-vectorize on -O2/Os

From: Richard Biener <richard dot guenther at gmail dot com>
To: GCC Development <gcc at gcc dot gnu dot org>, Allan Sandfeld Jensen <linux at carewolf dot com>
Date: Mon, 28 May 2018 12:58:20 +0200
Subject: Re: Enabling -ftree-slp-vectorize on -O2/Os
References: <2659301.XPQk3P0qmd@twilight> <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com>

On Sat, May 26, 2018 at 12:36 PM Richard Biener <richard.guenther@gmail.com>
wrote:

> On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensen <
linux@carewolf.com> wrote:
> >I brought this subject up earlier, and was told to suggest it again for
> >gcc 9,
> >so I have attached the preliminary changes.
> >
> >My studies have show that with generic x86-64 optimization it reduces
> >binary
> >size with around 0.5%, and when optimizing for x64 targets with SSE4 or
> >
> >better, it reduces binary size by 2-3% on average. The performance
> >changes are
> >negligible however*, and I haven't been able to detect changes in
> >compile time
> >big enough to penetrate general noise on my platform, but perhaps
> >someone has
> >a better setup for that?
> >
> >* I believe that is because it currently works best on non-optimized
> >code, it
> >is better at big basic blocks doing all kinds of things than tightly
> >written
> >inner loops.
> >
> >Anythhing else I should test or report?

> If you have access to SPEC CPU I'd like to see performance, size and
compile-time effects of the patch on that. Embedded folks may want to rhn
their favorite benchmark and report results as well.

So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile
and run and the compile-time
effect where measurable (SPEC records on a second granularity) is within
one second per benchmark
apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
Performance-wise I notice significant
slowdowns for SPEC FP and some for SPEC INT (I only did a train run
sofar).  I'll re-run with ref input now
and will post those numbers.

binary size numbers show an increase for 403.gcc, 433.milc 444.namd and
otherwise decreases or
no changes.  The changes are in the sub-percentage area of course.

Overall 12583 "BBs" are vectorized.  I need to improve that reporting for
multiple (non-)overlapping instances.

I realize that combining -O2 with -march=haswell might not be what people
do but I tried to increase
the number of vectorized BBs.

Richard.

> Richard.

> >Best regards
> >'Allan
> >
> >
> >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> >index beba295bef5..05851229354 100644
> >--- a/gcc/doc/invoke.texi
> >+++ b/gcc/doc/invoke.texi
> >@@ -7612,6 +7612,7 @@ also turns on the following optimization flags:
> > -fstore-merging @gol
> > -fstrict-aliasing @gol
> > -ftree-builtin-call-dce @gol
> >+-ftree-slp-vectorize @gol
> > -ftree-switch-conversion -ftree-tail-merge @gol
> > -fcode-hoisting @gol
> > -ftree-pre @gol
> >@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following
> >optimization flags:
> > -floop-interchange @gol
> > -floop-unroll-and-jam @gol
> > -fsplit-paths @gol
> >--ftree-slp-vectorize @gol
> > -fvect-cost-model @gol
> > -ftree-partial-pre @gol
> > -fpeel-loops @gol
> >@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is
> >
> >enabled by default at
> > @item -ftree-slp-vectorize
> > @opindex ftree-slp-vectorize
> >Perform basic block vectorization on trees. This flag is enabled by
> >default
> >at
> >-@option{-O3} and when @option{-ftree-vectorize} is enabled.
> >+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled.
> >
> > @item -fvect-cost-model=@var{model}
> > @opindex fvect-cost-model
> >diff --git a/gcc/opts.c b/gcc/opts.c
> >index 33efcc0d6e7..11027b847e8 100644
> >--- a/gcc/opts.c
> >+++ b/gcc/opts.c
> >@@ -523,6 +523,7 @@ static const struct default_options
> >default_options_table[] =
> >     { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
> >     { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
> >     { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 },
> >+    { OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
> >
> >     /* -O3 optimizations.  */
> >    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> >@@ -539,7 +540,6 @@ static const struct default_options
> >default_options_table[] =
> >     { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 },
> >     { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
> >     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> >-    { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
> >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL,
> >VECT_COST_MODEL_DYNAMIC
> >},
> >     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
> >     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },

Follow-Ups:
- Re: Enabling -ftree-slp-vectorize on -O2/Os
  - From: Allan Sandfeld Jensen

References:
- Enabling -ftree-slp-vectorize on -O2/Os
  - From: Allan Sandfeld Jensen
- Re: Enabling -ftree-slp-vectorize on -O2/Os
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]