[Bug c/87869] New: Unrolled loop leads to excessive code bloat with -Os on ARC EM.
nbowler at draconx dot ca
gcc-bugzilla@gcc.gnu.org
Fri Nov 2 18:32:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87869
Bug ID: 87869
Summary: Unrolled loop leads to excessive code bloat with -Os
on ARC EM.
Product: gcc
Version: 8.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: nbowler at draconx dot ca
Target Milestone: ---
Consider the following code:
% cat >test.c <<'EOF'
#include <stdint.h>
void do_stuff_12iter(void)
{
volatile uint32_t *blah = (void *)0xf0000000;
unsigned i;
for (i = 0; i < 12; i++) {
blah[i] = 3;
}
}
void do_stuff_11iter(void)
{
volatile uint32_t *blah = (void *)0xf0000000;
unsigned i;
for (i = 0; i < 11; i++) {
blah[i] = 3;
}
}
EOF
When I compile this with gcc:
% arc-unknown-elf-gcc -v
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/arc-unknown-elf/gcc-bin/8.2.0/arc-unknown-elf-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/arc-unknown-elf/8.2.0/lto-wrapper
Target: arc-unknown-elf
Configured with:
/var/tmp/portage/cross-arc-unknown-elf/gcc-8.2.0-r3/work/gcc-8.2.0/configure
--host=x86_64-pc-linux-gnu --target=arc-unknown-elf --build=x86_64-pc-linux-gnu
--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/arc-unknown-elf/gcc-bin/8.2.0
--includedir=/usr/lib/gcc/arc-unknown-elf/8.2.0/include
--datadir=/usr/share/gcc-data/arc-unknown-elf/8.2.0
--mandir=/usr/share/gcc-data/arc-unknown-elf/8.2.0/man
--infodir=/usr/share/gcc-data/arc-unknown-elf/8.2.0/info
--with-gxx-include-dir=/usr/lib/gcc/arc-unknown-elf/8.2.0/include/g++-v8
--with-python-dir=/share/gcc-data/arc-unknown-elf/8.2.0/python
--enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror
--with-system-zlib --enable-nls --without-included-gettext
--enable-checking=release --with-bugurl=https://bugs.gentoo.org/
--with-pkgversion='Gentoo 8.2.0-r3' --disable-esp --enable-libstdcxx-time
--enable-poison-system-directories --disable-libstdcxx-time
--with-sysroot=/usr/arc-unknown-elf --disable-bootstrap --with-newlib
--enable-multilib --disable-altivec --disable-fixed-point --disable-libgomp
--disable-libmudflap --disable-libssp --disable-libmpx --disable-systemtap
--disable-vtable-verify --disable-libvtv --disable-libquadmath --enable-lto
--without-isl --disable-libsanitizer --disable-default-pie --enable-default-ssp
Thread model: single
gcc version 8.2.0 (Gentoo 8.2.0-r3)
% arc-unknown-elf-gcc -c -Os -mcpu=arcem -mno-sdata -mcode-density -mq-class
-mbarrel-shifter -mmpy-option=3 -mswap test.c
The 11-iteration loop gets fully unrolled with pretty horrible results:
00000018 <do_stuff_11iter>:
18: 730c mov_s r0,3
1a: 1e00 7000 f000 0000 st r0,[0xf0000000]
22: 1e00 7000 f000 0004 st r0,[0xf0000004]
2a: 1e00 7000 f000 0008 st r0,[0xf0000008]
32: 1e00 7000 f000 000c st r0,[0xf000000c]
3a: 1e00 7000 f000 0010 st r0,[0xf0000010]
42: 1e00 7000 f000 0014 st r0,[0xf0000014]
4a: 1e00 7000 f000 0018 st r0,[0xf0000018]
52: 1e00 7000 f000 001c st r0,[0xf000001c]
5a: 1e00 7000 f000 0020 st r0,[0xf0000020]
62: 1e00 7000 f000 0024 st r0,[0xf0000024]
6a: 1e00 7000 f000 0028 st r0,[0xf0000028]
72: 7ee0 j_s [blink]
That's almost five times the size of the 12-iteration one which didn't
get unrolled:
00000000 <do_stuff_12iter>:
0: 41c3 f000 0000 mov_s r1,0xf0000000
6: 734c mov_s r2,3
8: d80c mov_s r0,0xc
a: 240a 7000 mov lp_count,r0
e: 20a8 0140 lp 10 ;16 <do_stuff_12iter+0x16>
12: 1904 0090 st.ab r2,[r1,4]
16: 7ee0 j_s [blink]
That one's pretty good. This specific example could be a _tiny_
bit better, because the constant values moved to r2 and r0 could be
immediates in the instructions where those registers are used but
I'm not bothered by that.
Since I requested size optimizations, it would be nice if my code
size didn't get quintupled like this.
More information about the Gcc-bugs
mailing list