This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug fortran/48636] Enable more inlining with -O2 and higher


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636

--- Comment #5 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-04-17 14:12:30 UTC ---
I have investigated why test_fpu is slower with --param
max-inline-insns-auto=400 (11.18s) compared to -finline-limit=600 (10.84s) in
the timings of comment #2. This is due to the inlining of dgemm in the fourth
test Lapack 2:

[macbook] lin/test% gfc -Ofast -funroll-loops -fstack-arrays --param
max-inline-insns-auto=385 test_lap.f90
[macbook] lin/test% time a.out
  Benchmark running, hopefully as only ACTIVE task
Test4 - Lapack 2 (1001x1001) inverts  2.6 sec  Err= 0.000000000000250
                             total =  2.6 sec

2.824u 0.081s 0:02.90 100.0%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc -Ofast -funroll-loops -fstack-arrays --param
max-inline-insns-auto=386 test_lap.f90
[macbook] lin/test% time a.out
  Benchmark running, hopefully as only ACTIVE task
Test4 - Lapack 2 (1001x1001) inverts  3.0 sec  Err= 0.000000000000250
                             total =  3.0 sec

3.214u 0.082s 0:03.29 100.0%    0+0k 0+0io 0pf+0w

Looking at the assembly, I see 'call    _dgemm_' three times for 385 and none
for 386 (note there are only two calls in the code one in dgetri always inlined
and one in dgetrf not inlined). It would be interesting to understand why
inlining dgemm slows down the code.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]