[Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

manu at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sat Sep 10 09:52:00 GMT 2011


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Manuel López-Ibáñez <manu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |manu at gcc dot gnu.org

--- Comment #25 from Manuel López-Ibáñez <manu at gcc dot gnu.org> 2011-09-10 09:43:58 UTC ---
(In reply to comment #24)
> 
> The issue is that at -O3 the subroutine PD2VAL is not vectorized, while it is
> at -O2.

If you are interested in investigating why this is so by yourself, I would
suggest that you use the various -fdump- options to check what GCC is doing
differently between the two variants. 

1) Dump everything you can dump.

2) Then find the earliest optimization pass where they differ (you may even use
diff to make this faster).

3) Check subsequent dumps to see if that difference is actually what makes -O3
to not vectorize. (At this point you can play with -f* -fno-* to reduce the
differences further and isolate the trigger).



More information about the Gcc-bugs mailing list