https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #5 from PeteVine <tulipawn at gmail dot com> --- Clang however gets no further improvement from -funroll-loops meaning a simple `-O3 -mcpu=cortex-a53` produces much better performance than gcc without unrolling.