Bug 70478 - [LRA] S/390: Performance regression - superfluous stack frame
Summary: [LRA] S/390: Performance regression - superfluous stack frame
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: ra
Depends on:
Blocks:
 
Reported: 2016-03-31 11:18 UTC by Andreas Krebbel
Modified: 2018-10-13 04:13 UTC (History)
1 user (show)

See Also:
Host: s390x-ibm-linux
Target: s390x-ibm-linux
Build: s390x-ibm-linux
Known to work: 4.8.5
Known to fail: 4.9.0
Last reconfirmed: 2016-03-31 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Krebbel 2016-03-31 11:18:18 UTC
void foo(unsigned int *a, unsigned char *b)
{
  *a &= *b;
}

cc1 -O3 -march=z9-109 t.c -mno-lra

foo:
        llc     %r1,0(%r3)
        n       %r1,0(%r2)
        st      %r1,0(%r2)
        br      %r14

cc1 -O3 -march=z9-109 t.c

foo:
        stg     %r15,120(%r15)
        lay     %r15,-168(%r15)
        llc     %r1,0(%r3)
        st      %r1,164(%r15)
        nc      0(4,%r2),164(%r15)
        lg      %r15,288(%r15)
        br      %r14

Problem is present since r199754 which enabled LRA on S/390.
Comment 1 Richard Biener 2016-03-31 13:43:23 UTC
It seems to somehow "optimize" with regard to the and op being unsigned char
and thus all upper bits being zero and only the lower ones need to be computed?

Before RA

(insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ])
        (zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8]))) t.c:3 1209 {*zero_extendqisi2_extimm}
     (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ])
        (nil)))
(insn 8 7 11 2 (parallel [
            (set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
                (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
                    (reg:SI 66 [ *b_4(D)+-3 ])))
            (clobber (reg:CC 33 %cc))
        ]) t.c:3 1473 {*andsi3_zarch}
     (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ])
        (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
            (expr_list:REG_UNUSED (reg:CC 33 %cc)
                (nil)))))

after

(insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ])
        (zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8]))) t.c:3 1209 {*zero_extendqisi2_extimm}
     (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ])
        (nil)))
(insn 8 7 11 2 (parallel [
            (set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
                (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
                    (reg:SI 66 [ *b_4(D)+-3 ])))
            (clobber (reg:CC 33 %cc))
        ]) t.c:3 1473 {*andsi3_zarch}
     (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ])
        (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
            (expr_list:REG_UNUSED (reg:CC 33 %cc)
                (nil)))))

but then LRA chooses to spill while reload is happy with the above.
Comment 2 Vladimir Makarov 2016-03-31 18:58:32 UTC
The difference I see is that LRA chooses alternative "Q,0,Q" and reload chooses "d,0,R".

For the "Q,O,Q" LRA reports:

          2 Spill pseudo into memory: reject+=3
          alt=11,overall=9,losers=1,rld_nregs=0
 
For "d,0,R" it reports:

            0 Non-pseudo reload: reject+=2
            0 Non input pseudo reload: reject++
            1 Dying matched operand reload: reject++
            alt=8,overall=10,losers=1 -- refuse
 
So it is 9 vs 10.  It would be the same # of insns if we already had a stack frame.  Most non-toy functions will have a stack frame.  So the problem is not that bad for a real world scenario.

I'll look what can I do to fix this.  But I should say that it is a very sensitive code of LRA.  Fiddling with heuristics might affect many programs and targets and might result in new PRs.
Comment 3 Andreas Krebbel 2016-04-01 07:02:28 UTC
(In reply to Vladimir Makarov from comment #2)
Thanks for having a look.  I'll experiment a bit with adding a '?' constraint modifier to see what impact it has on benchmarks. In fact it would match the reality a bit better anyway since the mem-mem instructions have some restrictions others don't have.
Comment 4 Dominik Vogt 2017-02-03 12:40:29 UTC
Is there any new information on this issue?
Comment 5 Andreas Krebbel 2017-02-03 12:50:38 UTC
(In reply to Dominik Vogt from comment #4)
> Is there any new information on this issue?

Adding the ? constraint modifier was an overall loss. So I did not pursue this any further.
Comment 6 Andreas Krebbel 2017-03-30 09:11:51 UTC
The only solution we found caused other regressions.
Comment 7 Vladimir Makarov 2017-04-05 21:04:28 UTC
(In reply to Andreas Krebbel from comment #6)
> The only solution we found caused other regressions.

I'll try to change the sensitive LRA code to solve it.  It will need to test a few targets.  So, if everything is ok, the patch will be probably ready at the end of the week or on the next week.
Comment 8 Vladimir Makarov 2017-04-07 16:02:23 UTC
Author: vmakarov
Date: Fri Apr  7 16:01:50 2017
New Revision: 246764

URL: https://gcc.gnu.org/viewcvs?rev=246764&root=gcc&view=rev
Log:
2017-04-07  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70478
	* lra-constraints.c (process_alt_operands): Disfavor alternative
	insn memory operands.

2017-04-07  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70478
	* gcc.target/s390/pr70478.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/s390/pr70478.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
    trunk/gcc/testsuite/ChangeLog
Comment 9 Vladimir Makarov 2017-04-08 19:19:13 UTC
Author: vmakarov
Date: Sat Apr  8 19:18:42 2017
New Revision: 246789

URL: https://gcc.gnu.org/viewcvs?rev=246789&root=gcc&view=rev
Log:
2017-04-08  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70478
	* lra-constraints.c: Reverse the last patch.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
Comment 10 Vladimir Makarov 2017-04-10 14:59:05 UTC
Author: vmakarov
Date: Mon Apr 10 14:58:33 2017
New Revision: 246808

URL: https://gcc.gnu.org/viewcvs?rev=246808&root=gcc&view=rev
Log:
2017-04-10  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70478
	* lra-constraints.c (curr_small_class_check): New.
	(update_and_check_small_class_inputs): New.
	(process_alt_operands): Update curr_small_class_check.  Disfavor
	alternative insn memory operands.  Check available regs for small
	class operands.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
Comment 11 Vladimir Makarov 2017-04-11 19:40:31 UTC
Author: vmakarov
Date: Tue Apr 11 19:39:59 2017
New Revision: 246854

URL: https://gcc.gnu.org/viewcvs?rev=246854&root=gcc&view=rev
Log:
2017-04-11  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70478
	* lra-constraints.c (process_alt_operands): Check memory for
	disfavoring memory insn operand.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c