void foo(unsigned int *a, unsigned char *b) { *a &= *b; } cc1 -O3 -march=z9-109 t.c -mno-lra foo: llc %r1,0(%r3) n %r1,0(%r2) st %r1,0(%r2) br %r14 cc1 -O3 -march=z9-109 t.c foo: stg %r15,120(%r15) lay %r15,-168(%r15) llc %r1,0(%r3) st %r1,164(%r15) nc 0(4,%r2),164(%r15) lg %r15,288(%r15) br %r14 Problem is present since r199754 which enabled LRA on S/390.
It seems to somehow "optimize" with regard to the and op being unsigned char and thus all upper bits being zero and only the lower ones need to be computed? Before RA (insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ]) (zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8]))) t.c:3 1209 {*zero_extendqisi2_extimm} (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ]) (nil))) (insn 8 7 11 2 (parallel [ (set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (reg:SI 66 [ *b_4(D)+-3 ]))) (clobber (reg:CC 33 %cc)) ]) t.c:3 1473 {*andsi3_zarch} (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ]) (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ]) (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil))))) after (insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ]) (zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8]))) t.c:3 1209 {*zero_extendqisi2_extimm} (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ]) (nil))) (insn 8 7 11 2 (parallel [ (set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (reg:SI 66 [ *b_4(D)+-3 ]))) (clobber (reg:CC 33 %cc)) ]) t.c:3 1473 {*andsi3_zarch} (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ]) (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ]) (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil))))) but then LRA chooses to spill while reload is happy with the above.
The difference I see is that LRA chooses alternative "Q,0,Q" and reload chooses "d,0,R". For the "Q,O,Q" LRA reports: 2 Spill pseudo into memory: reject+=3 alt=11,overall=9,losers=1,rld_nregs=0 For "d,0,R" it reports: 0 Non-pseudo reload: reject+=2 0 Non input pseudo reload: reject++ 1 Dying matched operand reload: reject++ alt=8,overall=10,losers=1 -- refuse So it is 9 vs 10. It would be the same # of insns if we already had a stack frame. Most non-toy functions will have a stack frame. So the problem is not that bad for a real world scenario. I'll look what can I do to fix this. But I should say that it is a very sensitive code of LRA. Fiddling with heuristics might affect many programs and targets and might result in new PRs.
(In reply to Vladimir Makarov from comment #2) Thanks for having a look. I'll experiment a bit with adding a '?' constraint modifier to see what impact it has on benchmarks. In fact it would match the reality a bit better anyway since the mem-mem instructions have some restrictions others don't have.
Is there any new information on this issue?
(In reply to Dominik Vogt from comment #4) > Is there any new information on this issue? Adding the ? constraint modifier was an overall loss. So I did not pursue this any further.
The only solution we found caused other regressions.
(In reply to Andreas Krebbel from comment #6) > The only solution we found caused other regressions. I'll try to change the sensitive LRA code to solve it. It will need to test a few targets. So, if everything is ok, the patch will be probably ready at the end of the week or on the next week.
Author: vmakarov Date: Fri Apr 7 16:01:50 2017 New Revision: 246764 URL: https://gcc.gnu.org/viewcvs?rev=246764&root=gcc&view=rev Log: 2017-04-07 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70478 * lra-constraints.c (process_alt_operands): Disfavor alternative insn memory operands. 2017-04-07 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70478 * gcc.target/s390/pr70478.c: New. Added: trunk/gcc/testsuite/gcc.target/s390/pr70478.c Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c trunk/gcc/testsuite/ChangeLog
Author: vmakarov Date: Sat Apr 8 19:18:42 2017 New Revision: 246789 URL: https://gcc.gnu.org/viewcvs?rev=246789&root=gcc&view=rev Log: 2017-04-08 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70478 * lra-constraints.c: Reverse the last patch. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c
Author: vmakarov Date: Mon Apr 10 14:58:33 2017 New Revision: 246808 URL: https://gcc.gnu.org/viewcvs?rev=246808&root=gcc&view=rev Log: 2017-04-10 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70478 * lra-constraints.c (curr_small_class_check): New. (update_and_check_small_class_inputs): New. (process_alt_operands): Update curr_small_class_check. Disfavor alternative insn memory operands. Check available regs for small class operands. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c
Author: vmakarov Date: Tue Apr 11 19:39:59 2017 New Revision: 246854 URL: https://gcc.gnu.org/viewcvs?rev=246854&root=gcc&view=rev Log: 2017-04-11 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70478 * lra-constraints.c (process_alt_operands): Check memory for disfavoring memory insn operand. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c