[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

tkoenig at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue May 12 09:30:25 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #21 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #19)
> Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?). 

Just plain -O2 (for size reasons), with matmul as an exception
where we add -funroll-loops and other optoins.

> so what's the speciality on POWER?  Code growth should trigger with -O3 only.
> Given we have only a guessed profile (and that does not detect the inner
> loop as completely cold) we're allowing growth then.  GCC has no idea the
> outer loop iterates more than the inner.

As a test, I changed the condition of the loop in question to

@@ -88,7 +88,7 @@ internal_pack_r4 (gfc_array_r4 * source)
       count[0]++;
       /* Advance to the next source element.  */
       index_type n = 0;
-      while (count[n] == extent[n])
+      while (unlikely (count[n] == extent[n]))
         {
           /* When we get to the end of a dimension, reset it and increment
              the next dimension.  */

which then results in

       while (__builtin_expect(!!(count[n] == extent[n]), 0))

and the loop is still completely peeled on POWER at -O2, which
I do not understand.

> Thomas - where did you measure the slowness?  For which dimensionality?

Actually, I didn't, I just made an assumption that it would be
bad for speed as well.  The tests that I ran then didn't show any
such slowdown, so I guess the POWER9 branch predictor is doing
a good job here.

However, this kind of loop is the standard way of accessing multi-
dimensional arrays of unknown dimension in libgfortran. It occurs
in around 400 files there, sometimes more than once, so the size issue
is significant.  I haven't checked if there is an actual degradation
for other use cases. 

> I'm quite sure the loop structure will be sub-optimal for certain
> input shapes... (stride0 == 1 could even use memcpy for the inner dimension).

Yes. I plan to revisit this when looking at PR 93114, where I have
to touch that part of the code anyway.


More information about the Gcc-bugs mailing list