Bug 99286 - ivopts don't select the best candidates with -Os
Summary: ivopts don't select the best candidates with -Os
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2021-02-26 11:15 UTC by GengQi
Modified: 2021-02-26 11:49 UTC (History)
0 users

See Also:
Host:
Target: riscv
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
-c -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -fdump-tree-ivopts-details (269 bytes, text/plain)
2021-02-26 11:15 UTC, GengQi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description GengQi 2021-02-26 11:15:32 UTC
Created attachment 50261 [details]
-c -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -fdump-tree-ivopts-details

I have compared the assembly files and object files generated by different versions of the gcc.

One is:
$ /lhome/gengq/riscv64-linux-mastertest/bin/riscv64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/lhome/gengq/riscv64-linux-mastertest/bin/riscv64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/lhome/gengq/riscv64-linux-mastertest/libexec/gcc/riscv64-unknown-linux-gnu/11.0.0/lto-wrapper
Target: riscv64-unknown-linux-gnu
Configured with: /lhome/gengq/riscv-gnu-toolchain-master/riscv-gnu-toolchain/riscv-gcc/configure --target=riscv64-unknown-linux-gnu --prefix=/lhome/gengq/riscv64-linux-mastertest --with-sysroot=/lhome/gengq/riscv64-linux-mastertest/sysroot --with-system-zlib --enable-shared --enable-tls --enable-languages=c,c++,fortran --disable-libmudflap --disable-libssp --disable-libquadmath --disable-libsanitizer --disable-nls --disable-bootstrap --src=.././riscv-gcc --disable-multilib --with-abi=lp64d --with-arch=rv64gc 'CFLAGS_FOR_TARGET=-O2   -mcmodel=medlow' 'CXXFLAGS_FOR_TARGET=-O2   -mcmodel=medlow'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20210209 (experimental) (GCC)

cmd is:
/lhome/gengq/riscv64-linux-mastertest/bin/riscv64-unknown-linux-gnu-gcc -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -c

The other is:
$ /lhome/gengq/riscv64-linux-810test/bin/riscv32-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/lhome/gengq/riscv64-linux-810test/bin/riscv32-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/lhome/gengq/riscv64-linux-810test/libexec/gcc/riscv32-unknown-linux-gnu/8.1.0/lto-wrapper
Target: riscv32-unknown-linux-gnu
Configured with: /lhome/gengq/riscv-gnu-toolchain-master/riscv-gnu-toolchain/riscv-gcc/configure --target=riscv32-unknown-linux-gnu --prefix=/lhome/gengq/riscv64-linux-810test --with-sysroot=/lhome/gengq/riscv64-linux-810test/sysroot --with-newlib --without-headers --disable-shared --disable-threads --with-system-zlib --enable-tls --enable-languages=c --disable-libatomic --disable-libmudflap --disable-libssp --disable-libquadmath --disable-libgomp --disable-nls --disable-bootstrap --src=.././riscv-gcc --with-pkgversion= --disable-multilib --with-abi=ilp32d --with-arch=rv32gc 'CFLAGS_FOR_TARGET=-O2  -mcmodel=medlow' 'CXXFLAGS_FOR_TARGET=-O2  -mcmodel=medlow' CC=gcc CXX=g++
Thread model: single
gcc version 8.1.0 ()

cmd is:
/lhome/gengq/riscv64-linux-810test/bin/riscv32-unknown-linux-gnu-gcc -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -fdump-tree-all-details -c

The code generated by gcc11.0 is worse than by gcc8.1.0. I have done some analysis and I think the difference due to 'ivopts'.

It seems that gcc11.0 has done a more detailed job in 'ivopts'. For gcc11.0,there are 2 best candidate sets:
One is equivalent to what gcc8.0 used.
Another one is the final choice of gcc11.0. And its 'cost' is very close to the other one.
I noticed that: The second set include more invariants and less induction varibles. The code implementation prefers to use iv. And this preference can sway the final choice as the differences are minimal.
So,why prefer iv? Is there any better treatment here? What I can think of from my experience is that the inv variables are more atomic and have more potential to be optimized. But this also means that the inv may generate more intermediate variables if it is not optimised. Like this case, we chose to use more invs and also created more intermediate variables, which ended up overflowing the registers.

I'm not sure I've hit the nail on the head with my analysis, and I'd like to try to find a better solution.