This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Question about vectorization limit
- From: Dehao Chen <dehao at google dot com>
- To: Toon Moene <toon at moene dot org>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Thu, 30 May 2013 13:45:58 -0700
- Subject: Re: Question about vectorization limit
- References: <CAO2gOZX7_-08m_+AEybF0RwG=8Y_qPG_+wjmgsq6ymVWTr3=Vw at mail dot gmail dot com> <51A7A26C dot 2070301 at moene dot org>
Actually, you need another patch to make this work:
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c (revision 199416)
+++ gcc/tree-vect-loop-manip.c (working copy)
@@ -855,7 +855,6 @@
/* All loops have an outer scope; the only case loop->outer is
NULL is for
the function itself. */
|| !loop_outer (loop)
- || loop->num_nodes != 2
|| !empty_block_p (loop->latch)
|| !single_exit (loop)
/* Verify that new loop exit condition can be trivially modified. */
Dehao
On Thu, May 30, 2013 at 12:03 PM, Toon Moene <toon@moene.org> wrote:
> On 05/30/2013 02:46 AM, Dehao Chen wrote:
>
>> In tree-vect-loop.c, it limits the vectorization only to loops that have 2
>> BBs:
>>
>> /* Inner-most loop. We currently require that the number of BBs is
>> exactly 2 (the header and latch). Vectorizable inner-most loops
>> look like this:
>>
>> (pre-header)
>> |
>> header<--------+
>> | | |
>> | +--> latch --+
>> |
>> (exit-bb) */
>>
>> if (loop->num_nodes != 2)
>> {
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> "not vectorized: control flow in loop.");
>> return NULL;
>> }
>
>
>> Any insights why the limit is set to 2? We found that removing this
>> limit actually improve performance for many applications.
>
>
> It might have been just "safety first" - we know how to do single basic
> block inner loops, let's stick with them for the moment (this development
> was started around a decade ago).
>
> Our 3.5 million lines of Fortran 90 code (mostly array expressions) and
> 125,000 lines of arbitrary C code is currently normally compiled with:
>
> $ gfortran -v
> Using built-in specs.
> COLLECT_GCC=gfortran
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.3-4'
> --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
> --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
> --program-suffix=-4.7 --enable-shared --enable-linker-build-id
> --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
> --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls
> --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin
> --with-system-zlib --enable-objc-gc --with-cloog --enable-cloog-backend=ppl
> --disable-cloog-version-check --disable-ppl-version-check --enable-multiarch
> --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32
> --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
> --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 4.7.3 (Debian 4.7.3-4)
>
> So I tried it with:
>
> $ /usr/snp/bin/gfortran -v
> Using built-in specs.
> COLLECT_GCC=/usr/snp/bin/gfortran
> COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.7.4/lto-wrapper
> Target: x86_64-unknown-linux-gnu
> Configured with: ../gcc-4_7-branch/configure --prefix=/usr/snp --with-gnu-as
> --with-gnu-ld --enable-languages=fortran --disable-libmudflap
> --disable-multilib --disable-nls --with-arch=native --with-tune=native
> Thread model: posix
> gcc version 4.7.4 20130530 (prerelease) (GCC)
>
> augmented by this single change:
>
> toon@super:~/compilers/gcc-4_7-branch/gcc$ svn diff
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c (revision 199454)
> +++ tree-vect-loop.c (working copy)
> @@ -1002,6 +1002,8 @@
> |
> (exit-bb) */
>
> + /* Disabled check
>
> +
> if (loop->num_nodes != 2)
> {
> if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
> @@ -1009,6 +1011,8 @@
> return NULL;
> }
>
> + */
> +
> if (empty_block_p (loop->header))
> {
> if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
>
> Amazingly enough, I didn't hit *any* ICE. Also, running the generated
> executables produced reasonable results (you have to trust me that it is
> *very hard* to fake correct meteorological results if you blow up the
> generated code).
>
> Unfortunately, the relative importance of conditional code in inner loops is
> not sufficient to show any speedup on our code.
>
> Nevertheless, it would be a huge improvement on *other* codes if we could
> lift this restriction.
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news