This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about vectorization limit


On 05/30/2013 02:46 AM, Dehao Chen wrote:

In tree-vect-loop.c, it limits the vectorization only to loops that have 2 BBs:

       /* Inner-most loop.  We currently require that the number of BBs is
          exactly 2 (the header and latch).  Vectorizable inner-most loops
          look like this:

                         (pre-header)
                            |
                           header<--------+
                            | |            |
                            | +-->  latch --+
                            |
                         (exit-bb)  */

       if (loop->num_nodes != 2)
         {
           if (dump_enabled_p ())
             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                              "not vectorized: control flow in loop.");
           return NULL;
         }

Any insights why the limit is set to 2? We found that removing this
limit actually improve performance for many applications.

It might have been just "safety first" - we know how to do single basic block inner loops, let's stick with them for the moment (this development was started around a decade ago).

Our 3.5 million lines of Fortran 90 code (mostly array expressions) and 125,000 lines of arbitrary C code is currently normally compiled with:

$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.3-4' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --with-system-zlib --enable-objc-gc --with-cloog --enable-cloog-backend=ppl --disable-cloog-version-check --disable-ppl-version-check --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.3 (Debian 4.7.3-4)

So I tried it with:

$ /usr/snp/bin/gfortran -v
Using built-in specs.
COLLECT_GCC=/usr/snp/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.7.4/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4_7-branch/configure --prefix=/usr/snp --with-gnu-as --with-gnu-ld --enable-languages=fortran --disable-libmudflap --disable-multilib --disable-nls --with-arch=native --with-tune=native
Thread model: posix
gcc version 4.7.4 20130530 (prerelease) (GCC)

augmented by this single change:

toon@super:~/compilers/gcc-4_7-branch/gcc$ svn diff
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 199454)
+++ tree-vect-loop.c	(working copy)
@@ -1002,6 +1002,8 @@
                            |
                         (exit-bb)  */

+      /* Disabled check
+
       if (loop->num_nodes != 2)
         {
           if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
@@ -1009,6 +1011,8 @@
           return NULL;
         }

+      */
+
       if (empty_block_p (loop->header))
     {
           if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))

Amazingly enough, I didn't hit *any* ICE. Also, running the generated executables produced reasonable results (you have to trust me that it is *very hard* to fake correct meteorological results if you blow up the generated code).

Unfortunately, the relative importance of conditional code in inner loops is not sufficient to show any speedup on our code.

Nevertheless, it would be a huge improvement on *other* codes if we could lift this restriction.

--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]