[Bug c/87869] New: Unrolled loop leads to excessive code bloat with -Os on ARC EM.

nbowler at draconx dot ca gcc-bugzilla@gcc.gnu.org
Fri Nov 2 18:32:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87869

            Bug ID: 87869
           Summary: Unrolled loop leads to excessive code bloat with -Os
                    on ARC EM.
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nbowler at draconx dot ca
  Target Milestone: ---

Consider the following code:

  % cat >test.c <<'EOF'
  #include <stdint.h>

  void do_stuff_12iter(void)
  {
     volatile uint32_t *blah = (void *)0xf0000000;
     unsigned i;

     for (i = 0; i < 12; i++) {
        blah[i] = 3;
     }
  }

  void do_stuff_11iter(void)
  {
     volatile uint32_t *blah = (void *)0xf0000000;
     unsigned i;

     for (i = 0; i < 11; i++) {
        blah[i] = 3;
     }
  }
EOF

When I compile this with gcc:

  % arc-unknown-elf-gcc -v
  Using built-in specs.
 
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/arc-unknown-elf/gcc-bin/8.2.0/arc-unknown-elf-gcc
  COLLECT_LTO_WRAPPER=/usr/libexec/gcc/arc-unknown-elf/8.2.0/lto-wrapper
  Target: arc-unknown-elf
  Configured with:
/var/tmp/portage/cross-arc-unknown-elf/gcc-8.2.0-r3/work/gcc-8.2.0/configure
--host=x86_64-pc-linux-gnu --target=arc-unknown-elf --build=x86_64-pc-linux-gnu
--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/arc-unknown-elf/gcc-bin/8.2.0
--includedir=/usr/lib/gcc/arc-unknown-elf/8.2.0/include
--datadir=/usr/share/gcc-data/arc-unknown-elf/8.2.0
--mandir=/usr/share/gcc-data/arc-unknown-elf/8.2.0/man
--infodir=/usr/share/gcc-data/arc-unknown-elf/8.2.0/info
--with-gxx-include-dir=/usr/lib/gcc/arc-unknown-elf/8.2.0/include/g++-v8
--with-python-dir=/share/gcc-data/arc-unknown-elf/8.2.0/python
--enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror
--with-system-zlib --enable-nls --without-included-gettext
--enable-checking=release --with-bugurl=https://bugs.gentoo.org/
--with-pkgversion='Gentoo 8.2.0-r3' --disable-esp --enable-libstdcxx-time
--enable-poison-system-directories --disable-libstdcxx-time
--with-sysroot=/usr/arc-unknown-elf --disable-bootstrap --with-newlib
--enable-multilib --disable-altivec --disable-fixed-point --disable-libgomp
--disable-libmudflap --disable-libssp --disable-libmpx --disable-systemtap
--disable-vtable-verify --disable-libvtv --disable-libquadmath --enable-lto
--without-isl --disable-libsanitizer --disable-default-pie --enable-default-ssp
  Thread model: single
  gcc version 8.2.0 (Gentoo 8.2.0-r3) 

  % arc-unknown-elf-gcc -c -Os -mcpu=arcem -mno-sdata -mcode-density -mq-class
-mbarrel-shifter -mmpy-option=3 -mswap test.c

The 11-iteration loop gets fully unrolled with pretty horrible results:

00000018 <do_stuff_11iter>:
  18:   730c                    mov_s   r0,3
  1a:   1e00 7000 f000 0000     st      r0,[0xf0000000]
  22:   1e00 7000 f000 0004     st      r0,[0xf0000004]
  2a:   1e00 7000 f000 0008     st      r0,[0xf0000008]
  32:   1e00 7000 f000 000c     st      r0,[0xf000000c]
  3a:   1e00 7000 f000 0010     st      r0,[0xf0000010]
  42:   1e00 7000 f000 0014     st      r0,[0xf0000014]
  4a:   1e00 7000 f000 0018     st      r0,[0xf0000018]
  52:   1e00 7000 f000 001c     st      r0,[0xf000001c]
  5a:   1e00 7000 f000 0020     st      r0,[0xf0000020]
  62:   1e00 7000 f000 0024     st      r0,[0xf0000024]
  6a:   1e00 7000 f000 0028     st      r0,[0xf0000028]
  72:   7ee0                    j_s     [blink]

That's almost five times the size of the 12-iteration one which didn't
get unrolled:

00000000 <do_stuff_12iter>:
   0:   41c3 f000 0000          mov_s   r1,0xf0000000
   6:   734c                    mov_s   r2,3
   8:   d80c                    mov_s   r0,0xc
   a:   240a 7000               mov     lp_count,r0
   e:   20a8 0140               lp      10      ;16 <do_stuff_12iter+0x16>
  12:   1904 0090               st.ab   r2,[r1,4]
  16:   7ee0                    j_s     [blink]

That one's pretty good.  This specific example could be a _tiny_
bit better, because the constant values moved to r2 and r0 could be
immediates in the instructions where those registers are used but
I'm not bothered by that.

Since I requested size optimizations, it would be nice if my code
size didn't get quintupled like this.


More information about the Gcc-bugs mailing list