[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
tkoenig at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue May 12 09:30:25 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #21 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #19)
> Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?).
Just plain -O2 (for size reasons), with matmul as an exception
where we add -funroll-loops and other optoins.
> so what's the speciality on POWER? Code growth should trigger with -O3 only.
> Given we have only a guessed profile (and that does not detect the inner
> loop as completely cold) we're allowing growth then. GCC has no idea the
> outer loop iterates more than the inner.
As a test, I changed the condition of the loop in question to
@@ -88,7 +88,7 @@ internal_pack_r4 (gfc_array_r4 * source)
count[0]++;
/* Advance to the next source element. */
index_type n = 0;
- while (count[n] == extent[n])
+ while (unlikely (count[n] == extent[n]))
{
/* When we get to the end of a dimension, reset it and increment
the next dimension. */
which then results in
while (__builtin_expect(!!(count[n] == extent[n]), 0))
and the loop is still completely peeled on POWER at -O2, which
I do not understand.
> Thomas - where did you measure the slowness? For which dimensionality?
Actually, I didn't, I just made an assumption that it would be
bad for speed as well. The tests that I ran then didn't show any
such slowdown, so I guess the POWER9 branch predictor is doing
a good job here.
However, this kind of loop is the standard way of accessing multi-
dimensional arrays of unknown dimension in libgfortran. It occurs
in around 400 files there, sometimes more than once, so the size issue
is significant. I haven't checked if there is an actual degradation
for other use cases.
> I'm quite sure the loop structure will be sub-optimal for certain
> input shapes... (stride0 == 1 could even use memcpy for the inner dimension).
Yes. I plan to revisit this when looking at PR 93114, where I have
to touch that part of the code anyway.
More information about the Gcc-bugs
mailing list