This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107
- From: "dominiq at lps dot ens.fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 12 Dec 2011 13:09:49 +0000
- Subject: [Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107
- Auto-submitted: auto-generated
- References: <bug-51497-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51497
--- Comment #4 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-12 13:09:49 UTC ---
> I can't see any vectorizer differences for the testcase in comment #2 and the
> patch you cite only (should) have debuginfo changes, no changes to the produced
> IL at statement level (eventually it has better type-based alias analysis).
>
> Not confirmed.
I have just done the following check:
(1) gfc -Ofast -funroll-loops nf.f90 -ftree-vectorizer-verbose=1 > & tmp1
(2) gfc -Ofast -funroll-loops nf.f90 -ftree-vectorizer-verbose=1 -flto > & tmp2
I noticed that the tmp2 file contains two sets of annotations, likely one for
the usual vectorization (up to line 334) and a second one for the lto stage.
(3) I have split the file tmp2 in a new tmp2 keeping only the first 334 lines
and a second one containing the second part.
(4) I have used diff to compare the files: tmp1 and the new tmp2 are identical,
while I see missing vectorizations in tmp3:
--- tmp1 2011-12-12 13:49:06.000000000 +0100
+++ tmp3 2011-12-12 13:54:12.000000000 +0100
...
-206: LOOP VECTORIZED.
-nf.f90:204: note: vectorized 7 loops in function.
...
-nf.f90:256: note: vectorized 3 loops in function.
+nf.f90:256: note: vectorized 2 loops in function.
...
-nf.f90:288: note: vectorized 3 loops in function.
+nf.f90:288: note: vectorized 2 loops in function.
This confirms what I have seen in the disassembled executable.
Questions:
(1) do you see the slowdown with -flto?
(2) can you reproduce the above?
> The two else if blocks are related, not independent, independently reverting
> them makes no sense.
I am not suggesting to remove one block. I was only interested in finding which
part of the patch caused/exposed the problem (which looks like yet another
instance of a bad choice of optimization for size: as pointed in 51499, the
vectorization generates two loops, one vectorized and one not, hence ~doubling
the code size).