Bug 80080 - S390: Isses with emitted cs-instructions for __atomic builtins.
Summary: S390: Isses with emitted cs-instructions for __atomic builtins.
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Andreas Krebbel
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-03-17 09:07 UTC by stli@linux.ibm.com
Modified: 2018-12-03 09:49 UTC (History)
2 users (show)

See Also:
Host:
Target: S390
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-03-17 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description stli@linux.ibm.com 2017-03-17 09:07:23 UTC
For s390, I am now using the c11 atomic builtins in glibc.
There are now some issues with the emitted cs-instructions.
If __atomic_compare_exchange_n is used within a condition for if/while,
it is sometimes not using the condition code directly to jump away.
Instead it extracts the condition code to a general register via ipm followed by
further instructions in order to compare it. Afterwards it jumps according to
this comparison.

int foo1 (int *mem)
{
  int val, newval;
  val = __atomic_load_n (mem, __ATOMIC_RELAXED);
  do
    {
      newval = val | 123;
    }
  while (!__atomic_compare_exchange_n (mem, &val, newval, 1, __ATOMIC_ACQUIRE,
				       __ATOMIC_RELAXED));

  /*
    0000000000000000 <foo>:
   0:   58 10 20 00             l       %r1,0(%r2)
   4:   18 31                   lr      %r3,%r1
   6:   a5 3b 00 7b             oill    %r3,123
   a:   ba 13 20 00             cs      %r1,%r3,0(%r2)
   e:   b2 22 00 30             ipm     %r3
  12:   8a 30 00 1c             sra     %r3,28
  16:   ec 36 ff f7 00 7e       cijne   %r3,0,4 <foo+0x4>
  1c:   a7 29 00 00             lghi    %r2,0
  20:   07 fe                   br      %r14
  22:   07 07                   nopr    %r7
  24:   07 07                   nopr    %r7
  26:   07 07                   nopr    %r7
   */
  return 0;
}



For __atomic_exchange_n, s390 has no special instruction and thus a cs-loop is
used. An extra register is needed to save the old-value instead of using the
"old"-register of cs-instruction which is updated if the new value is not stored
to memory. This extra register is reloaded in every loop.

extern int bar (int *mem);
int foo2 (int *mem)
{
  int old = __atomic_exchange_n (mem, 0, __ATOMIC_ACQUIRE);
  if (old >= 2)
    return bar (mem);

  /*
0000000000000028 <foo2>:
  28:   a7 48 00 00             lhi     %r4,0
  2c:   58 10 20 00             l       %r1,0(%r2)
  30:   18 31                   lr      %r3,%r1
  32:   ba 14 20 00             cs      %r1,%r4,0(%r2)
  36:   a7 74 ff fd             jne     30 <foo2+0x8>
  3a:   ec 3c 00 06 01 7e       cijnh   %r3,1,46 <foo2+0x1e>
  40:   c0 f4 00 00 00 00       jg      40 <foo2+0x18>
                        42: R_390_PC32DBL       bar+0x2
  46:   a7 29 00 00             lghi    %r2,0
  4a:   07 fe                   br      %r14
  4c:   07 07                   nopr    %r7
  4e:   07 07                   nopr    %r7
   */
  return 0;
}

In case of exchanging to a zero value, like above, the load-and-and instruction
can be used. Then only one register is needed for the new and old value and
there is no loop.


If the exchanged memory is a global variable, the address of it is loaded
within the loop instead of before the loop.

extern int foo3_mem;
int foo3 (void)
{
  return __atomic_exchange_n (&foo3_mem, 5, __ATOMIC_ACQUIRE);

  /*
0000000000000050 <foo3>:
  50:   c4 1d 00 00 00 00       lrl     %r1,50 <foo3>
                        52: R_390_PC32DBL       foo3_mem+0x2
  56:   a7 38 00 05             lhi     %r3,5
  5a:   18 21                   lr      %r2,%r1
  5c:   c0 40 00 00 00 00       larl    %r4,5c <foo3+0xc>
                        5e: R_390_PC32DBL       foo3_mem+0x2
  62:   ba 13 40 00             cs      %r1,%r3,0(%r4)
  66:   a7 74 ff fa             jne     5a <foo3+0xa>
  6a:   b9 14 00 22             lgfr    %r2,%r2
  6e:   07 fe                   br      %r14
   */
}
Comment 1 Richard Biener 2017-03-17 10:13:46 UTC
Is this a duplicate of PR79981 (and thus fixed in trunk)?
Comment 2 Andreas Krebbel 2017-03-17 10:21:41 UTC
(In reply to Richard Biener from comment #1)
> Is this a duplicate of PR79981 (and thus fixed in trunk)?

No it is a different problem. We are working on a fix.
Comment 3 Dominik Vogt 2017-03-21 12:58:56 UTC
Most of this can be fixed in the backend, but all attempts to get rid of the additional load ("lr %r3,%31") in "bar" within the backend have failed.  This is caused by overly complicated code being generated in expand_compare_and_swap_loop().  I'm currently testing the patch below which eliminates the superfluous variable cmp_reg (and the register allocated for it).  Is there any chance to get something like this into Gcc7?  The patch could possibly also help other targets.  (Will post on gcc-patches when the tests are complete.)


--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5798,17 +5798,15 @@ find_cc_set (rtx x, const_rtx pat, void *data)
 static bool
 expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq)
 {
-  machine_mode mode = GET_MODE (mem);
   rtx_code_label *label;
-  rtx cmp_reg, success, oldval;
+  rtx success, oldval;
 
   /* The loop we want to generate looks like
 
-	cmp_reg = mem;
+	old_reg = mem;
       label:
-        old_reg = cmp_reg;
 	seq;
-	(success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg)
+	(success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
 	if (success)
 	  goto label;
 
@@ -5816,23 +5814,21 @@ expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq)
      iterations use the value loaded by the compare-and-swap pattern.  */
 
   label = gen_label_rtx ();
-  cmp_reg = gen_reg_rtx (mode);
 
-  emit_move_insn (cmp_reg, mem);
+  emit_move_insn (old_reg, mem);
   emit_label (label);
-  emit_move_insn (old_reg, cmp_reg);
   if (seq)
     emit_insn (seq);
 
   success = NULL_RTX;
-  oldval = cmp_reg;
+  oldval = old_reg;
   if (!expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg,
 				       new_reg, false, MEMMODEL_SYNC_SEQ_CST,
 				       MEMMODEL_RELAXED))
     return false;
 
-  if (oldval != cmp_reg)
-    emit_move_insn (cmp_reg, oldval);
+  if (oldval != old_reg)
+    emit_move_insn (old_reg, oldval);
 
   /* Mark this jump predicted not taken.  */
   emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx,
Comment 4 Jakub Jelinek 2017-03-21 14:23:49 UTC
(In reply to Dominik Vogt from comment #3)
> Most of this can be fixed in the backend, but all attempts to get rid of the
> additional load ("lr %r3,%31") in "bar" within the backend have failed. 
> This is caused by overly complicated code being generated in
> expand_compare_and_swap_loop().  I'm currently testing the patch below which
> eliminates the superfluous variable cmp_reg (and the register allocated for
> it).  Is there any chance to get something like this into Gcc7?  The patch
> could possibly also help other targets.  (Will post on gcc-patches when the
> tests are complete.)
> 
> 
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -5798,17 +5798,15 @@ find_cc_set (rtx x, const_rtx pat, void *data)
>  static bool
>  expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq)
>  {
> -  machine_mode mode = GET_MODE (mem);
>    rtx_code_label *label;
> -  rtx cmp_reg, success, oldval;
> +  rtx success, oldval;
>  
>    /* The loop we want to generate looks like
>  
> -	cmp_reg = mem;
> +	old_reg = mem;
>        label:
> -        old_reg = cmp_reg;
>  	seq;
> -	(success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg)
> +	(success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
>  	if (success)
>  	  goto label;
>  
> @@ -5816,23 +5814,21 @@ expand_compare_and_swap_loop (rtx mem, rtx old_reg,
> rtx new_reg, rtx seq)
>       iterations use the value loaded by the compare-and-swap pattern.  */
>  
>    label = gen_label_rtx ();
> -  cmp_reg = gen_reg_rtx (mode);
>  
> -  emit_move_insn (cmp_reg, mem);
> +  emit_move_insn (old_reg, mem);
>    emit_label (label);
> -  emit_move_insn (old_reg, cmp_reg);
>    if (seq)
>      emit_insn (seq);
>  
>    success = NULL_RTX;
> -  oldval = cmp_reg;
> +  oldval = old_reg;
>    if (!expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg,
>  				       new_reg, false, MEMMODEL_SYNC_SEQ_CST,
>  				       MEMMODEL_RELAXED))
>      return false;
>  
> -  if (oldval != cmp_reg)
> -    emit_move_insn (cmp_reg, oldval);
> +  if (oldval != old_reg)
> +    emit_move_insn (old_reg, oldval);
>  
>    /* Mark this jump predicted not taken.  */
>    emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx,

That doesn't look correct.  expand_compare_and_swap_loop is called from two places and at least in the first case it wants old_reg to contain on success something that is the target that will be returned.
While with your patch it might be something different.
Comment 5 Dominik Vogt 2017-03-21 16:51:45 UTC
What case do you mean?  The

+  if (oldval != old_reg)
+    emit_move_insn (old_reg, oldval);

at the end should make sure that the oldval-rtx is either not changed by the call, or its value is copied into old_reg at runtime.  In both cases, the value of old_reg should be the old value to be returned (as it was hopefully before the patch).  Me and Stefan have spent quite some time to proof read this patch; do you see somethings we might have overlooked?
Comment 6 Jakub Jelinek 2017-03-21 16:57:07 UTC
I think it depends on what
(success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
sets if success is true.  Is there a guarantee that old_reg will be assigned
whatever has been passed as the second argument in that case on all targets?
If yes, then it can work.  If not, then it will return a different value.
So I think you need to analyze all targets that have HW compare-and-swap and verify that.
Comment 7 Dominik Vogt 2017-03-21 17:37:43 UTC
(In reply to Jakub Jelinek from comment #6)
> I think it depends on what
> (success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
> sets if success is true.  Is there a guarantee that old_reg will be assigned
> whatever has been passed as the second argument in that case on all targets?

Isn't that part of the interface of the atomic_compare_and_swap pattern?

Gccinternals:
> Operand 1 is an output operand which is set to the contents of the memory
> before the operation was attempted. Operand 3 is the value that is
> expected to be in memory.

If the CS instruction on a target does not overwrite the register old_reg with the memory value before swapping, the target's backend must do that manually, I think.
Comment 8 Dominik Vogt 2017-03-22 13:12:49 UTC
The patch has a performance regression on s390x.

  .L1
    lr      %r3,%r1
    cs      %r1,%r4,0(%r2)
    jne     .L1

becomes

  .L1
    cs %r1,%r3,0(%r2)
    ipm %r4
    sra %r4,28
    cijne %r4,0,.L1

Although this can be fixed on s390/s390x in a follow-up patch, I cannot really prove that something similar won't happen on other targets without testing them.
Comment 9 Andreas Krebbel 2017-04-25 07:38:22 UTC
Author: krebbel
Date: Tue Apr 25 07:37:50 2017
New Revision: 247132

URL: https://gcc.gnu.org/viewcvs?rev=247132&root=gcc&view=rev
Log:
S/390: PR80080: Optimize atomic patterns.

The attached patch optimizes the atomic_exchange and atomic_compare
patterns on s390 and s390x (mostly limited to SImode and DImode).
Among general optimizaation, the changes fix most of the problems
reported in PR 80080:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

gcc/ChangeLog:

2017-04-25  Dominik Vogt  <vogt@linux.vnet.ibm.com>

	PR target/80080
	* s390-protos.h (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
	modes as well as CCZ1mode and CCZmode.
	(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
	signature of s390_emit_compare_and_swap.
	(s390_expand_cs_hqi): Likewise, make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
	* config/s390/s390.md ("atomic_compare_and_swap<mode>"): Merge the
	patterns for small and large integers.  Forbid symref memory operands.
	Move expander to s390.c.  Require cc register.
	("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal")
	("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1")
	("*atomic_compare_and_swapdi<CCZZ1:mode>_2")
	("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid
	symref memory operands.  Remove CC mode and call s390_match_ccmode
	instead.
	("atomic_exchange<mode>"): Allow and implement all integer modes.

gcc/testsuite/ChangeLog:

2017-04-25  Dominik Vogt  <vogt@linux.vnet.ibm.com>

	PR target/80080
	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.


Added:
    trunk/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
    trunk/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
    trunk/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/s390/s390-protos.h
    trunk/gcc/config/s390/s390.c
    trunk/gcc/config/s390/s390.md
    trunk/gcc/testsuite/ChangeLog
Comment 10 Andreas Krebbel 2017-04-25 11:12:19 UTC
Author: krebbel
Date: Tue Apr 25 11:11:48 2017
New Revision: 247189

URL: https://gcc.gnu.org/viewcvs?rev=247189&root=gcc&view=rev
Log:
S/390: PR80080: Optimize atomic patterns.

The attached patch optimizes the atomic_exchange and atomic_compare
patterns on s390 and s390x (mostly limited to SImode and DImode).
Among general optimizaation, the changes fix most of the problems
reported in PR 80080:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

gcc/ChangeLog:

2017-04-25  Dominik Vogt  <vogt@linux.vnet.ibm.com>

	Backport from mainline
	2017-04-25  Dominik Vogt  <vogt@linux.vnet.ibm.com>

	PR target/80080
	* s390-protos.h (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
	modes as well as CCZ1mode and CCZmode.
	(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
	signature of s390_emit_compare_and_swap.
	(s390_expand_cs_hqi): Likewise, make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
	* config/s390/s390.md ("atomic_compare_and_swap<mode>"): Merge the
	patterns for small and large integers.  Forbid symref memory operands.
	Move expander to s390.c.  Require cc register.
	("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal")
	("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1")
	("*atomic_compare_and_swapdi<CCZZ1:mode>_2")
	("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid
	symref memory operands.  Remove CC mode and call s390_match_ccmode
	instead.
	("atomic_exchange<mode>"): Allow and implement all integer modes.

gcc/testsuite/ChangeLog:

2017-04-25  Dominik Vogt  <vogt@linux.vnet.ibm.com>

	Backport from mainline
	2017-04-25  Dominik Vogt  <vogt@linux.vnet.ibm.com>

	PR target/80080
	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.


Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
    branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/s390/s390-protos.h
    branches/gcc-7-branch/gcc/config/s390/s390.c
    branches/gcc-7-branch/gcc/config/s390/s390.md
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 11 stli@linux.ibm.com 2018-08-06 09:07:33 UTC
Hi,
I've retested the samples with gcc 7, 8 and head from 2018-07-20, but there are still issues:
The examples foo1 and foo2 are okay.

The issue in example foo3 is still present (see description of the bug-report):

00000000000000a0 <foo3>:
  a0:   a7 18 00 05             lhi     %r1,5
  a4:   c4 2d 00 00 00 00       lrl     %r2,a4 <foo3+0x4>
                        a6: R_390_PC32DBL       foo3_mem+0x2

  aa:   c0 30 00 00 00 00       larl    %r3,aa <foo3+0xa>
                        ac: R_390_PC32DBL       foo3_mem+0x2
  b0:   ba 21 30 00             cs      %r2,%r1,0(%r3)
  b4:   a7 74 ff fb             jne     aa <foo3+0xa>

The address of the global variable is still reloaded within the loop. If the value was not swapped with cs, the jne can jump directly to the cs instruction instead of the larl-instruction.

  b8:   b9 14 00 22             lgfr    %r2,%r2
  bc:   07 fe                   br      %r14
  be:   07 07                   nopr    %r7

I've found a further issue which is observable with the following two examples.
See the questions in the disassembly:

void foo4(int *mem)
{
  int oldval = 0;
  if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1,
				    1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
    {
      bar (mem);
    }
  /*
0000000000000000 <foo4>:
   0:   e3 10 20 00 00 12       lt      %r1,0(%r2)
   6:   a7 74 00 06             jne     12 <foo4+0x12>

Why do we need to jump to 0x12 first instead of directly jumping to 0x18?

   a:   a7 38 00 01             lhi     %r3,1
   e:   ba 13 20 00             cs      %r1,%r3,0(%r2)
  12:   a7 74 00 03             jne     18 <foo4+0x18>
  16:   07 fe                   br      %r14
  18:   c0 f4 00 00 00 00       jg      18 <foo4+0x18>
                        1a: R_390_PC32DBL       bar+0x2
  1e:   07 07                   nopr    %r7
  */
}


void foo5(int *mem)
{
  int oldval = 0;
  __atomic_compare_exchange_n (mem, (void *) &oldval, 1,
			       1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
  if (oldval != 0)
    bar (mem);
  /*
0000000000000040 <foo5>:
  40:   e3 10 20 00 00 12       lt      %r1,0(%r2)
  46:   a7 74 00 06             jne     52 <foo5+0x12>

This is similar to foo4, but the variable oldval is compared against zero instead of using the return value of __atomic_compare_exchange_n.
Can't we jump directly to 0x5a instead of 0x52?

  4a:   a7 38 00 01             lhi     %r3,1
  4e:   ba 13 20 00             cs      %r1,%r3,0(%r2)
  52:   12 11                   ltr     %r1,%r1
  54:   a7 74 00 03             jne     5a <foo5+0x1a>
  58:   07 fe                   br      %r14
  5a:   c0 f4 00 00 00 00       jg      5a <foo5+0x1a>
                        5c: R_390_PC32DBL       bar+0x2
   */
}
Comment 12 Ilya Leoshkevich 2018-08-22 15:12:46 UTC
I've investigated foo3, foo4 and foo5, and came to the following
conclusions:

When foo3 is compiled with -march=z10 or later, cprop1 pass propagates
global's SYMBOL_REF value into UNSPECV_CAS.  On previous machines it
does not happen, because the result is rejected by insn_invalid_p ().
Then, reload realizes that SYMBOL_REF cannot be a legitimate UNSPECV_CAS
argument, and loads it into a pseudo right before.  The net result is
that loading of SYMBOL_REF is moved from outside of the loop into the
loop.  So we need to somehow inhibit constant propagation for this case.

Jump threading in foo4 does not work, because it's done only during
`jump' pass, at which point there are insns with side-effects in the
basic block of the 2nd jump.  They are later deleted by the `combine'
pass, but we don't request CLEANUP_THREADING after that.  I wonder if we
could introduce it?

In addition, when foo4 is compiled with -O2 or -O3, we don't use
conditional return, because our return sequence contains a PARALLEL,
which is rejected by bb_is_just_return ().  This can also be improved.

Finally, in foo5 `cs' is generated by s390_expand_cs_tdsi (), and
comparison is generated by common expansion logic, so it doesn't look
possible to improve the situation solely in the back-end.  We need to
somehow make gcc aware that (oldval == 0) and (retval != 0) are
equivalent after `cs', but I'm not sure at which point we could and
should do this - in theory doing this on tree rather than RTL level can
help other architectures.
Comment 13 Andreas Krebbel 2018-09-06 07:36:07 UTC
Author: krebbel
Date: Thu Sep  6 07:35:35 2018
New Revision: 264142

URL: https://gcc.gnu.org/viewcvs?rev=264142&root=gcc&view=rev
Log:
S/390: Register pass_s390_early_mach statically

The dump file used to come at the end of the sorted dump file list,
because the pass was registered dynamically. This did not reflect the
order in which passes are executed. Static registration fixes this:

* foo4.c.277r.split2
* foo4.c.281r.early_mach
* foo4.c.282r.pro_and_epilogue

gcc/ChangeLog:

2018-09-06  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* config/s390/s390-passes.def: New file.
	* config/s390/s390-protos.h (class rtl_opt_pass): Add forward
	declaration.
	(make_pass_s390_early_mach): Add declaration.
	* config/s390/s390.c (make_pass_s390_early_mach):
	(s390_option_override): Remove dynamic registration.
	* config/s390/t-s390: Add s390-passes.def.



Added:
    trunk/gcc/config/s390/s390-passes.def
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/s390/s390-protos.h
    trunk/gcc/config/s390/s390.c
    trunk/gcc/config/s390/t-s390
Comment 14 Andreas Krebbel 2018-09-06 07:39:14 UTC
Author: krebbel
Date: Thu Sep  6 07:38:42 2018
New Revision: 264143

URL: https://gcc.gnu.org/viewcvs?rev=264143&root=gcc&view=rev
Log:
S/390: Prohibit SYMBOL_REF in UNSPECV_CAS

Inhibit constant propagation inlining SYMBOL_REF loads into
UNSPECV_CAS.  Even though reload can later undo it, the resulting
code will be less efficient.

gcc/ChangeLog:

2018-09-06  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* config/s390/predicates.md: Add nonsym_memory_operand.
	* config/s390/s390.c (s390_legitimize_cs_operand): If operand
	contains a SYMBOL_REF, load it into an intermediate pseudo.
	(s390_emit_compare_and_swap): Legitimize operand.
	* config/s390/s390.md: Use the new nonsym_memory_operand
	with UNSPECV_CAS patterns.

gcc/testsuite/ChangeLog:

2018-09-06  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* gcc.target/s390/pr80080-3.c: New test.
	* gcc.target/s390/s390.exp: Make sure the new test passes
	on all optimization levels.



Added:
    trunk/gcc/testsuite/gcc.target/s390/pr80080-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/s390/predicates.md
    trunk/gcc/config/s390/s390.c
    trunk/gcc/config/s390/s390.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/s390/s390.exp
Comment 15 iii 2018-09-24 14:21:35 UTC
Author: iii
Date: Mon Sep 24 14:21:03 2018
New Revision: 264535

URL: https://gcc.gnu.org/viewcvs?rev=264535&root=gcc&view=rev
Log:
S/390: Fix conditional returns on z196+

S/390 epilogue ends with (parallel [(return) (use %r14)]) instead of
the more usual (return) or (simple_return).  This sequence is not
recognized by the conditional return logic in try_optimize_cfg ().

This was introduced for processors older than z196, where it is
sometimes profitable to use call-clobbered register for returning
instead of %r14.  On newer processors we always return via %r14,
for which the fact that it's used is already reflected by
EPILOGUE_USES.  In this case a simple (return) suffices.

This patch changes return_use () to emit simple (return)s when
returning via %r14.  The resulting sequences are recognized by the
conditional return logic in try_optimize_cfg ().

gcc/ChangeLog:

2018-09-24  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL
	RETURN+USE when returning via %r14.

gcc/testsuite/ChangeLog:

2018-09-24  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* gcc.target/s390/risbg-ll-3.c: Expect conditional returns.
	* gcc.target/s390/zvector/vec-cmp-2.c: Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/s390/s390.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/s390/risbg-ll-3.c
    trunk/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c
Comment 16 iii 2018-12-03 09:49:35 UTC
Author: iii
Date: Mon Dec  3 09:49:02 2018
New Revision: 266734

URL: https://gcc.gnu.org/viewcvs?rev=266734&root=gcc&view=rev
Log:
Repeat jump threading after combine

Consider the following RTL:

(insn (set (reg 65) (if_then_else (eq %cc 0) 1 0)))
(insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)]))
(jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc)))

Combine simplifies this into:

(note NOTE_INSN_DELETED)
(note NOTE_INSN_DELETED)
(jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc)))

opening up the possibility to perform jump threading.

gcc/ChangeLog:

2018-12-03  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* cfgcleanup.c (class pass_postreload_jump): New pass.
	(pass_postreload_jump::execute): Likewise.
	(make_pass_postreload_jump): Likewise.
	* passes.def: Add pass_postreload_jump before
	pass_postreload_cse.
	* tree-pass.h (make_pass_postreload_jump): New pass.

gcc/testsuite/ChangeLog:

2018-12-03  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* gcc.target/s390/pr80080-4.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/s390/pr80080-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgcleanup.c
    trunk/gcc/passes.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-pass.h