This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about vectorization limit


Actually, you need another patch to make this work:

Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c (revision 199416)
+++ gcc/tree-vect-loop-manip.c (working copy)
@@ -855,7 +855,6 @@
       /* All loops have an outer scope; the only case loop->outer is
NULL is for
          the function itself.  */
       || !loop_outer (loop)
-      || loop->num_nodes != 2
       || !empty_block_p (loop->latch)
       || !single_exit (loop)
       /* Verify that new loop exit condition can be trivially modified.  */

Dehao

On Thu, May 30, 2013 at 12:03 PM, Toon Moene <toon@moene.org> wrote:
> On 05/30/2013 02:46 AM, Dehao Chen wrote:
>
>> In tree-vect-loop.c, it limits the vectorization only to loops that have 2
>> BBs:
>>
>>        /* Inner-most loop.  We currently require that the number of BBs is
>>           exactly 2 (the header and latch).  Vectorizable inner-most loops
>>           look like this:
>>
>>                          (pre-header)
>>                             |
>>                            header<--------+
>>                             | |            |
>>                             | +-->  latch --+
>>                             |
>>                          (exit-bb)  */
>>
>>        if (loop->num_nodes != 2)
>>          {
>>            if (dump_enabled_p ())
>>              dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                               "not vectorized: control flow in loop.");
>>            return NULL;
>>          }
>
>
>> Any insights why the limit is set to 2? We found that removing this
>> limit actually improve performance for many applications.
>
>
> It might have been just "safety first" - we know how to do single basic
> block inner loops, let's stick with them for the moment (this development
> was started around a decade ago).
>
> Our 3.5 million lines of Fortran 90 code (mostly array expressions) and
> 125,000 lines of arbitrary C code is currently normally compiled with:
>
> $ gfortran -v
> Using built-in specs.
> COLLECT_GCC=gfortran
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.3-4'
> --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
> --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
> --program-suffix=-4.7 --enable-shared --enable-linker-build-id
> --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
> --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls
> --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin
> --with-system-zlib --enable-objc-gc --with-cloog --enable-cloog-backend=ppl
> --disable-cloog-version-check --disable-ppl-version-check --enable-multiarch
> --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32
> --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
> --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 4.7.3 (Debian 4.7.3-4)
>
> So I tried it with:
>
> $ /usr/snp/bin/gfortran -v
> Using built-in specs.
> COLLECT_GCC=/usr/snp/bin/gfortran
> COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.7.4/lto-wrapper
> Target: x86_64-unknown-linux-gnu
> Configured with: ../gcc-4_7-branch/configure --prefix=/usr/snp --with-gnu-as
> --with-gnu-ld --enable-languages=fortran --disable-libmudflap
> --disable-multilib --disable-nls --with-arch=native --with-tune=native
> Thread model: posix
> gcc version 4.7.4 20130530 (prerelease) (GCC)
>
> augmented by this single change:
>
> toon@super:~/compilers/gcc-4_7-branch/gcc$ svn diff
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c    (revision 199454)
> +++ tree-vect-loop.c    (working copy)
> @@ -1002,6 +1002,8 @@
>                             |
>                          (exit-bb)  */
>
> +      /* Disabled check
>
> +
>        if (loop->num_nodes != 2)
>          {
>            if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
> @@ -1009,6 +1011,8 @@
>            return NULL;
>          }
>
> +      */
> +
>        if (empty_block_p (loop->header))
>      {
>            if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
>
> Amazingly enough, I didn't hit *any* ICE.  Also, running the generated
> executables produced reasonable results (you have to trust me that it is
> *very hard* to fake correct meteorological results if you blow up the
> generated code).
>
> Unfortunately, the relative importance of conditional code in inner loops is
> not sufficient to show any speedup on our code.
>
> Nevertheless, it would be a huge improvement on *other* codes if we could
> lift this restriction.
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]