Created attachment 49967 [details] .s file Executing on host: /home/dave/gnu/gcc/objdir/gcc/xgcc -B/home/dave/gnu/gcc/objdi r/gcc/ /home/dave/gnu/gcc/gcc/gcc/testsuite/gcc.dg/torture/stackalign/builtin-re turn-1.c -fdiagnostics-plain-output -O1 -lm -o ./builtin-return-1.exe (timeout = 300) spawn -ignore SIGHUP /home/dave/gnu/gcc/objdir/gcc/xgcc -B/home/dave/gnu/gcc/obj dir/gcc/ /home/dave/gnu/gcc/gcc/gcc/testsuite/gcc.dg/torture/stackalign/builtin- return-1.c -fdiagnostics-plain-output -O1 -lm -o ./builtin-return-1.exe PASS: gcc.dg/torture/stackalign/builtin-return-1.c -O1 (test for excess error s) Setting LD_LIBRARY_PATH to :/home/dave/gnu/gcc/objdir/gcc:/home/dave/gnu/gcc/obj dir/hppa-linux-gnu/./libatomic/.libs::/home/dave/gnu/gcc/objdir/gcc:/home/dave/g nu/gcc/objdir/hppa-linux-gnu/./libatomic/.libs:/home/dave/gnu/gcc/objdir/hppa-li nux-gnu/libstdc++-v3/src/.libs:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libssp/. libs:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libphobos/src/.libs:/home/dave/gnu /gcc/objdir/hppa-linux-gnu/libgomp/.libs:/home/dave/gnu/gcc/objdir/hppa-linux-gn u/libatomic/.libs:/home/dave/gnu/gcc/objdir/./gcc:/home/dave/gnu/gcc/objdir/./pr ev-gcc:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/src/.libs:/home/dav e/gnu/gcc/objdir/hppa-linux-gnu/libssp/.libs:/home/dave/gnu/gcc/objdir/hppa-linu x-gnu/libphobos/src/.libs:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libgomp/.libs :/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libatomic/.libs:/home/dave/gnu/gcc/obj dir/./gcc:/home/dave/gnu/gcc/objdir/./prev-gcc Execution timeout is: 300 spawn [open ...] FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O1 execution test Similar fails: FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O3 -g -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -Os execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -Os -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -flto -flto-partition=none execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -flto -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -flto execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -flto -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O0 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O0 -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O1 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O2 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O3 -g -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -Os execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -Os -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O2 -flto -flto-partition=none execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O2 -flto -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O2 -flto execution test FAIL: gcc.dg/torture/stackalign/builtin-return-2.c -O2 -flto -fpic execution test
Revision c4a6b2dadcd:b9a7bc9531b:b5f24739632389d50903bfde6d1bfc06d522c56b was okay.
Introduced by the following commit: commit 0b76990a9d75d97b84014e37519086b81824c307 (HEAD) Author: Richard Sandiford <richard.sandiford@arm.com> Date: Thu Dec 17 00:15:12 2020 +0000 fwprop: Rewrite to use RTL SSA This patch rewrites fwprop.c to use the RTL SSA framework. It tries as far as possible to mimic the old behaviour, even in caes where that doesn't fit naturally with the new framework. I've added ??? comments to mark those places, but I think <E2><80><9C>fixing<E2><80><9D> them should be done separately to make bisection easier. In particular: * The old implementation iterated over uses, and after a successful substitution, the new insn's uses were added to the end of the list. The pass still processed those uses, but because it processed them at the end, it didn't fully optimise one instruction before propagating it into the next. The new version follows the same approach for comparison purposes, but I'd like to drop that as a follow-on patch. * The old implementation operated on single use sites (DF_REF_LOCs). This doesn't work well for instructions with match_dups, where it's necessary to update both an operand and its dups at the same time. For example, attempting to substitute into a divmod instruction would fail because only the div or the mod side would be updated. The new version again follows this to some extent for comparison purposes (although not exactly). Again I'd like to drop it as a follow-on patch. One difference is that if a register occurs in multiple MEM addresses in a set, the new version will try to update them all at once. This is what causes the SVE ACLE st4* output to improve. Also, the old version didn't naturally guarantee termination (PR79405), whereas the new one does. gcc/ * fwprop.c: Rewrite to use the RTL SSA framework. gcc/testsuite/ * gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't expect insn updates to be deferred. * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition to be folded into the address. * gcc.target/aarch64/sve/acle/asm/st4_u8.c: Likewise.
Created attachment 50320 [details] .s diff to gcc10 .s The following code is wrong in gcc-11: - ldw 12(%r3),%r28 + ldw 12(%r3),%r5 + copy %r5,%r28 bl foo,%r2 ldo 64(%r3),%r4 - stw %r28,0(%r4) + stw %r5,0(%r4) In gcc-10, the return value from foo is stored in 0(%r4).
Mine.
The master branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>: https://gcc.gnu.org/g:49e651990a6966936a0273138dd56ac394e57b16 commit r11-8214-g49e651990a6966936a0273138dd56ac394e57b16 Author: Richard Sandiford <richard.sandiford@arm.com> Date: Fri Apr 16 12:38:02 2021 +0100 Mark untyped calls and handle them specially [PR98689] This patch fixes a regression introduced by the rtl-ssa patches. It was seen on HPPA but it might be latent elsewhere. The problem is that the traditional way of expanding an untyped_call is to emit sequences like: (call (mem (symbol_ref "foo"))) (set (reg pseudo1) (reg result1)) ... (set (reg pseudon) (reg resultn)) The ABI specifies that result1..resultn are clobbered by the call but nothing in the RTL indicates that result1..resultn are the results of the call. Normally, using a clobbered value gives undefined results, but in this case the results are well-defined and matter for correctness. This seems like a niche case, so I think it would be better to mark it explicitly rather than try to detect it heuristically. Note that in expand_builtin_apply we already have an rtx_insn *, so it doesn't matter whether we call emit_call_insn or emit_insn. Calling emit_insn seems more natural now that the gen_* call has been split out. It also matches later code in the function. gcc/ PR rtl-optimization/98689 * reg-notes.def (UNTYPED_CALL): New note. * combine.c (distribute_notes): Handle it. * emit-rtl.c (try_split): Likewise. * rtlanal.c (rtx_properties::try_to_add_insn): Likewise. Assume that calls with the note implicitly set all return value registers. * builtins.c (expand_builtin_apply): Add a REG_UNTYPED_CALL to untyped_calls.
Finally fixed, sorry for the long delay.