This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug fortran/48636] Enable more inlining with -O2 and higher
- From: "dominiq at lps dot ens.fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 17 Apr 2011 14:12:32 +0000
- Subject: [Bug fortran/48636] Enable more inlining with -O2 and higher
- Auto-submitted: auto-generated
- References: <bug-48636-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636
--- Comment #5 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-04-17 14:12:30 UTC ---
I have investigated why test_fpu is slower with --param
max-inline-insns-auto=400 (11.18s) compared to -finline-limit=600 (10.84s) in
the timings of comment #2. This is due to the inlining of dgemm in the fourth
test Lapack 2:
[macbook] lin/test% gfc -Ofast -funroll-loops -fstack-arrays --param
max-inline-insns-auto=385 test_lap.f90
[macbook] lin/test% time a.out
Benchmark running, hopefully as only ACTIVE task
Test4 - Lapack 2 (1001x1001) inverts 2.6 sec Err= 0.000000000000250
total = 2.6 sec
2.824u 0.081s 0:02.90 100.0% 0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc -Ofast -funroll-loops -fstack-arrays --param
max-inline-insns-auto=386 test_lap.f90
[macbook] lin/test% time a.out
Benchmark running, hopefully as only ACTIVE task
Test4 - Lapack 2 (1001x1001) inverts 3.0 sec Err= 0.000000000000250
total = 3.0 sec
3.214u 0.082s 0:03.29 100.0% 0+0k 0+0io 0pf+0w
Looking at the assembly, I see 'call _dgemm_' three times for 385 and none
for 386 (note there are only two calls in the code one in dgetri always inlined
and one in dgetrf not inlined). It would be interesting to understand why
inlining dgemm slows down the code.