This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Question about vectorization limit
- From: Toon Moene <toon at moene dot org>
- To: Dehao Chen <dehao at google dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Thu, 30 May 2013 21:03:08 +0200
- Subject: Re: Question about vectorization limit
- References: <CAO2gOZX7_-08m_+AEybF0RwG=8Y_qPG_+wjmgsq6ymVWTr3=Vw at mail dot gmail dot com>
On 05/30/2013 02:46 AM, Dehao Chen wrote:
In tree-vect-loop.c, it limits the vectorization only to loops that have 2 BBs:
/* Inner-most loop. We currently require that the number of BBs is
exactly 2 (the header and latch). Vectorizable inner-most loops
look like this:
(pre-header)
|
header<--------+
| | |
| +--> latch --+
|
(exit-bb) */
if (loop->num_nodes != 2)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: control flow in loop.");
return NULL;
}
Any insights why the limit is set to 2? We found that removing this
limit actually improve performance for many applications.
It might have been just "safety first" - we know how to do single basic
block inner loops, let's stick with them for the moment (this
development was started around a decade ago).
Our 3.5 million lines of Fortran 90 code (mostly array expressions) and
125,000 lines of arbitrary C code is currently normally compiled with:
$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.3-4'
--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.7 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-gnu-unique-object --enable-plugin --with-system-zlib
--enable-objc-gc --with-cloog --enable-cloog-backend=ppl
--disable-cloog-version-check --disable-ppl-version-check
--enable-multiarch --with-arch-32=i586 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.3 (Debian 4.7.3-4)
So I tried it with:
$ /usr/snp/bin/gfortran -v
Using built-in specs.
COLLECT_GCC=/usr/snp/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.7.4/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4_7-branch/configure --prefix=/usr/snp
--with-gnu-as --with-gnu-ld --enable-languages=fortran
--disable-libmudflap --disable-multilib --disable-nls --with-arch=native
--with-tune=native
Thread model: posix
gcc version 4.7.4 20130530 (prerelease) (GCC)
augmented by this single change:
toon@super:~/compilers/gcc-4_7-branch/gcc$ svn diff
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c (revision 199454)
+++ tree-vect-loop.c (working copy)
@@ -1002,6 +1002,8 @@
|
(exit-bb) */
+ /* Disabled check
+
if (loop->num_nodes != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
@@ -1009,6 +1011,8 @@
return NULL;
}
+ */
+
if (empty_block_p (loop->header))
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
Amazingly enough, I didn't hit *any* ICE. Also, running the generated
executables produced reasonable results (you have to trust me that it is
*very hard* to fake correct meteorological results if you blow up the
generated code).
Unfortunately, the relative importance of conditional code in inner
loops is not sufficient to show any speedup on our code.
Nevertheless, it would be a huge improvement on *other* codes if we
could lift this restriction.
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news