This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/79149] bad optimization on MIPS and ARM leading to excessive stack usage in some cases
- From: "arnd at linaro dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 20 Jan 2017 11:31:10 +0000
- Subject: [Bug rtl-optimization/79149] bad optimization on MIPS and ARM leading to excessive stack usage in some cases
- Auto-submitted: auto-generated
- References: <bug-79149-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Arnd Bergmann <arnd at linaro dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #40546|0 |1
is obsolete| |
--- Comment #2 from Arnd Bergmann <arnd at linaro dot org> ---
Created attachment 40554
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40554&action=edit
wp512 reference source code, standalone version
After checking a bit more, I found that the reference source code
implementation does behave exactly like the in-kernel version after all, and I
was able to do some performance timing (using qemu-user) on it as well.
Building Whirlpool.c using "mips64el-linux-gnuabi64-gcc-5 -O2
-Wframe-larger-than=100 Whirlpool.c -o Whirlpool-mips-smallstack
-fno-sched-critical-path-heuristic -fno-sched-dep-count-heuristic" in this case
uses 256 bytes of stack in the processBuffer and run for 87 seconds doing
10000000 iterations in qemu, while the version without
"fno-sched-critical-path-heuristic -fno-sched-dep-count-heuristic" takes 230
seconds and needs 1520 bytes of stack. The extra time is apparently spent
spilling registers to the stack.
The same test with arm32 shows a less significant version of the same behavior,
with the stack shrinking from 832 to 352 bytes, and the time improving from 301
seconds to 217 seconds.
Obviously it would be helpful to do the same tests on actual hardware, as
benchmarking in an emulated machine can be very misleading.