Bug 108839 - Option for rerolling loops
Summary: Option for rerolling loops
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
Reported: 2023-02-17 17:16 UTC by Thomas Koenig
Modified: 2023-02-20 08:16 UTC (History)
0 users

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2023-02-20 00:00:00


Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2023-02-17 17:16:12 UTC
Code sometimes contains manual unrolling.  For example, the BLAS
reference implementation, subroutine DSCAL, has

      IF (INCX.EQ.1) THEN
*        code for increment equal to 1
*        clean-up loop
         M = MOD(N,5)
         IF (M.NE.0) THEN
            DO I = 1,M
               DX(I) = DA*DX(I)
            END DO
            IF (N.LT.5) RETURN
         END IF
         MP1 = M + 1
         DO I = MP1,N,5
            DX(I) = DA*DX(I)
            DX(I+1) = DA*DX(I+1)
            DX(I+2) = DA*DX(I+2)
            DX(I+3) = DA*DX(I+3)
            DX(I+4) = DA*DX(I+4)
         END DO

While such code may have been beneficial on old architectures, by
now this disturbs the compiler's own unrolling and vectorization,
and it increases code size.

It could be beneficial to have a -freroll-loops option, which
undid the manual unrolling of the code above. This could be
stand-alone, or included in options such as -Os.
Comment 1 Andrew Pinski 2023-02-17 17:19:23 UTC
Note the SLP vectorizer should kick in for most cases of manually unrolled loops.
Comment 2 Richard Biener 2023-02-20 08:16:20 UTC
Note the SLP loop vectorizer could be used to re-roll loops (for -Os?) by vectorizing with one element vectors and a "fractional" VF.  I just never got
around playing with this idea (even when actually vectorizing but facing
highly manually unrolled code vectorizing with a single vector and a
"fractional" VF might be worth it for example for register pressure reasons).