This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Question about vectorization limit

From: Toon Moene <toon at moene dot org>
To: Dehao Chen <dehao at google dot com>
Cc: GCC Development <gcc at gcc dot gnu dot org>
Date: Thu, 30 May 2013 21:03:08 +0200
Subject: Re: Question about vectorization limit
References: <CAO2gOZX7_-08m_+AEybF0RwG=8Y_qPG_+wjmgsq6ymVWTr3=Vw at mail dot gmail dot com>

On 05/30/2013 02:46 AM, Dehao Chen wrote:

In tree-vect-loop.c, it limits the vectorization only to loops that have 2 BBs:

       /* Inner-most loop.  We currently require that the number of BBs is
          exactly 2 (the header and latch).  Vectorizable inner-most loops
          look like this:

                         (pre-header)
                            |
                           header<--------+
                            | |            |
                            | +-->  latch --+
                            |
                         (exit-bb)  */

       if (loop->num_nodes != 2)
         {
           if (dump_enabled_p ())
             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                              "not vectorized: control flow in loop.");
           return NULL;
         }

Any insights why the limit is set to 2? We found that removing this
limit actually improve performance for many applications.

It might have been just "safety first" - we know how to do single basicblock inner loops, let's stick with them for the moment (thisdevelopment was started around a decade ago).

Our 3.5 million lines of Fortran 90 code (mostly array expressions) and125,000 lines of arbitrary C code is currently normally compiled with:


$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu

Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.3-4'--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr--program-suffix=-4.7 --enable-shared --enable-linker-build-id--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix--with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib--enable-nls --with-sysroot=/ --enable-clocale=gnu--enable-libstdcxx-debug --enable-libstdcxx-time=yes--enable-gnu-unique-object --enable-plugin --with-system-zlib--enable-objc-gc --with-cloog --enable-cloog-backend=ppl--disable-cloog-version-check --disable-ppl-version-check--enable-multiarch --with-arch-32=i586 --with-abi=m64--with-multilib-list=m32,m64,mx32 --with-tune=generic--enable-checking=release --build=x86_64-linux-gnu--host=x86_64-linux-gnu --target=x86_64-linux-gnu

Thread model: posix
gcc version 4.7.3 (Debian 4.7.3-4)

So I tried it with:

$ /usr/snp/bin/gfortran -v
Using built-in specs.
COLLECT_GCC=/usr/snp/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.7.4/lto-wrapper
Target: x86_64-unknown-linux-gnu

Configured with: ../gcc-4_7-branch/configure --prefix=/usr/snp--with-gnu-as --with-gnu-ld --enable-languages=fortran--disable-libmudflap --disable-multilib --disable-nls --with-arch=native--with-tune=native

Thread model: posix
gcc version 4.7.4 20130530 (prerelease) (GCC)

augmented by this single change:

toon@super:~/compilers/gcc-4_7-branch/gcc$ svn diff
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 199454)
+++ tree-vect-loop.c	(working copy)
@@ -1002,6 +1002,8 @@
                            |
                         (exit-bb)  */

+      /* Disabled check
+
       if (loop->num_nodes != 2)
         {
           if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
@@ -1009,6 +1011,8 @@
           return NULL;
         }

+      */
+
       if (empty_block_p (loop->header))
     {
           if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))

Amazingly enough, I didn't hit *any* ICE. Also, running the generatedexecutables produced reasonable results (you have to trust me that it is*very hard* to fake correct meteorological results if you blow up thegenerated code).

Unfortunately, the relative importance of conditional code in innerloops is not sufficient to show any speedup on our code.

Nevertheless, it would be a huge improvement on *other* codes if wecould lift this restriction.


--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

Follow-Ups:
- Re: Question about vectorization limit
  - From: Dehao Chen

References:
- Question about vectorization limit
  - From: Dehao Chen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]