This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Propose moving vectorization from -O3 to -O2.

From: Richard Biener <richard dot guenther at gmail dot com>
To: Xinliang David Li <davidxl at google dot com>,Cong Hou <congh at google dot com>,Zdenek Dvorak <ook at ucw dot cz>
Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Mon, 19 Aug 2013 20:53:30 +0200
Subject: Re: Propose moving vectorization from -O3 to -O2.
References: <CAK=A3=1fd07wXzYFn-t+arozGeSFrGoKDMgzgbdCyzoJkz99og at mail dot gmail dot com> <CAAkRFZLnRdZvzGCLdj7-gGN7oysV900=2cexqxLFX3JsmZVGAQ at mail dot gmail dot com>

Xinliang David Li <davidxl@google.com> wrote:
>+cc auto-vectorizer maintainers.
>
>David
>
>On Mon, Aug 19, 2013 at 10:37 AM, Cong Hou <congh@google.com> wrote:
>> Nowadays, SIMD instructions play more and more important roles in our
>> daily computations. AVX and AVX2 have extended 128-bit registers to
>> 256-bit ones, and the newly announced AVX-512 further doubles the
>> size. The benefit we can get from vectorization will be larger and
>> larger. This is also a common practice in other compilers:
>>
>> 1) Intel's ICC turns on vectorizer at O2 by default and it has been
>> the case for many years;
>>
>> 2) Most recently, LLVM turns it on for both O2 and Os.
>>
>>
>> Here we propose moving vectorization from -O3 to -O2 in GCC. Three
>> main concerns about this change are: 1. Does vectorization greatly
>> increase the generated code size? 2. How much performance can be
>> improved? 3. Does vectorization increase  compile time significantly?
>>
>>
>> I have fixed GCC bootstrap failure with vectorizer turned on
>> (http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00497.html). To
>evaluate
>> the size and performance impact, experiments on SPEC06 and internal
>> benchmarks are done. Based on the data, I have tuned the parameters
>> for vectorizer which reduces the code bloat without sacrificing the
>> performance gain. There are some performance regressions in SPEC06,
>> and the root cause has been analyzed and understood. I will file bugs
>> tracking them independently. The experiments failed on three
>> benchmarks (please refer to
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993). The experiment
>> result is attached here as two pdf files. Below are our summaries of
>> the result:
>>
>>
>> 1) We noticed that vectorization could increase the generated code
>> size, so we tried to suppress this problem by doing some tunings,
>> which include setting a higher loop bound so that loops with small
>> iterations won't be vectorized, and disabling loop versioning. The
>> average size increase is decreased from 9.84% to 7.08% after our
>> tunings (13.93% to 10.75% for Fortran benchmarks, and 3.55% to 1.44%
>> for C/C++ benchmarks). The code size increase for Fortran benchmarks
>> can be significant (from 18.72% to 34.15%), but the performance gain
>> is also huge. Hence we think this size increase is reasonable. For
>> C/C++ benchmarks, the size increase is very small (below 3% except
>> 447.dealII).
>>
>>
>> 2) Vectorization improves the performance for most benchmarks by
>> around 2.5%-3% on average, and much more for Fortran benchmarks. On
>> Sandybridge machines, the improvement can be more if using
>> -march=corei7 (3.27% on average) and -march=corei7-avx (4.81% on
>> average) (Please see the attachment for details). We also noticed
>that
>> some performance degrades exist, and after investigation, we found
>> some are caused by the defects of GCC's vectorization (e.g. GCC's SLP
>> could not vectorize a group of accesses if the number of group cannot
>> be divided by VF http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955,
>> and any data dependence between statements can prevent
>vectorization),
>> which can be resolved in the future.
>>
>>
>> 3) As last, we found that introducing vectorization almost does not
>> affect the build time. GCC bootstrap time increase is negligible.
>>
>>
>> As a reference, Richard Biener is also proposing to move
>vectorization
>> to O2 by improving the cost model
>> (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00904.html).

And my conclusion is that we are not ready for this.  The compile time cost does not outweigh the benefit.

Richard.

>>
>> Vectorization has great performance potential -- the more people use
>> it, the likely it will be further improved -- turning it on at O2 is
>> the way to go ...
>>
>>
>> Thank you!
>>
>>
>> Cong Hou

Follow-Ups:
- Re: Propose moving vectorization from -O3 to -O2.
  - From: Xinliang David Li

References:
- Re: Propose moving vectorization from -O3 to -O2.
  - From: Xinliang David Li

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]