Code sometimes contains manual unrolling. For example, the BLAS reference implementation, subroutine DSCAL, has IF (INCX.EQ.1) THEN * * code for increment equal to 1 * * * clean-up loop * M = MOD(N,5) IF (M.NE.0) THEN DO I = 1,M DX(I) = DA*DX(I) END DO IF (N.LT.5) RETURN END IF MP1 = M + 1 DO I = MP1,N,5 DX(I) = DA*DX(I) DX(I+1) = DA*DX(I+1) DX(I+2) = DA*DX(I+2) DX(I+3) = DA*DX(I+3) DX(I+4) = DA*DX(I+4) END DO ELSE While such code may have been beneficial on old architectures, by now this disturbs the compiler's own unrolling and vectorization, and it increases code size. It could be beneficial to have a -freroll-loops option, which undid the manual unrolling of the code above. This could be stand-alone, or included in options such as -Os.
Note the SLP vectorizer should kick in for most cases of manually unrolled loops.
Note the SLP loop vectorizer could be used to re-roll loops (for -Os?) by vectorizing with one element vectors and a "fractional" VF. I just never got around playing with this idea (even when actually vectorizing but facing highly manually unrolled code vectorizing with a single vector and a "fractional" VF might be worth it for example for register pressure reasons).