Bug 71915

Summary: A missed opportunity for SLSR
Product: gcc Reporter: Martin Liška <marxin>
Component: tree-optimizationAssignee: Bill Schmidt <bill.schmidt>
Status: RESOLVED FIXED    
Severity: normal CC: bill.schmidt, danglin, rguenth
Priority: P3    
Version: 7.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2016-10-19 00:00:00
Bug Depends on:    
Bug Blocks: 71490    

Description Martin Liška 2016-07-18 09:11:19 UTC
Hello.

This is a follow-up of PR71490, where we started to FAIL slsr-8.c test-case.
Problem is that starting from r237185, we sink into both branches:

Sinking x3_18 = c_13(D) + _9;
 from bb 2 to bb 3
Sinking _9 = -_8;
 from bb 2 to bb 3
Sinking _8 = _7 * 4;
 from bb 2 to bb 3
Sinking _7 = (long unsigned int) a3_17;
 from bb 2 to bb 3
Sinking a3_17 = s_11(D) * 6;
 from bb 2 to bb 3
Sinking x2_16 = c_13(D) + _6;
 from bb 2 to bb 5
Sinking _6 = -_5;
 from bb 2 to bb 5
Sinking _5 = _4 * 4;
 from bb 2 to bb 5
Sinking _4 = (long unsigned int) a2_15;
 from bb 2 to bb 5
Sinking a2_15 = s_11(D) * 4;
 from bb 2 to bb 5
f (int s, int * c)
{
  int * x3;
  int * x2;
  int * x1;
  int a3;
  int a2;
  int a1;
  long unsigned int _1;
  long unsigned int _2;
  sizetype _3;
  long unsigned int _4;
  long unsigned int _5;
  sizetype _6;
  long unsigned int _7;
  long unsigned int _8;
  sizetype _9;
  int * iftmp.0_10;

  <bb 2>:
  a1_12 = s_11(D) * 2;
  _1 = (long unsigned int) a1_12;
  _2 = _1 * 4;
  _3 = -_2;
  x1_14 = c_13(D) + _3;
  if (x1_14 != 0B)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 5>:
  a2_15 = s_11(D) * 4;
  _4 = (long unsigned int) a2_15;
  _5 = _4 * 4;
  _6 = -_5;
  x2_16 = c_13(D) + _6;
  goto <bb 4>;

  <bb 3>:
  a3_17 = s_11(D) * 6;
  _7 = (long unsigned int) a3_17;
  _8 = _7 * 4;
  _9 = -_8;
  x3_18 = c_13(D) + _9;

  <bb 4>:
  # iftmp.0_10 = PHI <x2_16(5), x3_18(3)>
  return iftmp.0_10;

}

While in time when the test-case was introduced, we sank just to one branch:
Sinking x3_18 = c_7(D) + _17;
 from bb 2 to bb 3
Sinking _17 = -_16;
 from bb 2 to bb 3
Sinking _16 = _15 * 4;
 from bb 2 to bb 3
Sinking _15 = (long unsigned int) a3_14;
 from bb 2 to bb 3
Sinking a3_14 = s_2(D) * 6;
 from bb 2 to bb 3
f (int s, int * c)
{
  int * x3;
  int * x2;
  int * x1;
  int a3;
  int a2;
  int a1;
  int * iftmp.0;
  long unsigned int _4;
  long unsigned int _5;
  sizetype _6;
  long unsigned int _10;
  long unsigned int _11;
  sizetype _12;
  long unsigned int _15;
  long unsigned int _16;
  sizetype _17;

  <bb 2>:
  a1_3 = s_2(D) * 2;
  _4 = (long unsigned int) a1_3;
  _5 = _4 * 4;
  _6 = -_5;
  x1_8 = c_7(D) + _6;
  a2_9 = s_2(D) * 4;
  _10 = (long unsigned int) a2_9;
  _11 = _10 * 4;
  _12 = -_11;
  x2_13 = c_7(D) + _12;
  if (x1_8 != 0B)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 5>:
  goto <bb 4>;

  <bb 3>:
  a3_14 = s_2(D) * 6;
  _15 = (long unsigned int) a3_14;
  _16 = _15 * 4;
  _17 = -_16;
  x3_18 = c_7(D) + _17;

  <bb 4>:
  # iftmp.0_1 = PHI <x2_13(5), x3_18(3)>
  return iftmp.0_1;

}

Because of that, slsr does not cover one opportunity.

Martin
Comment 1 Bill Schmidt 2016-07-24 20:05:14 UTC
I'll have a look soon.
Comment 2 Martin Liška 2016-09-11 12:38:35 UTC
*** Bug 77552 has been marked as a duplicate of this bug. ***
Comment 3 Bill Schmidt 2016-10-19 16:49:43 UTC
Confirmed.
Comment 4 Bill Schmidt 2016-10-19 16:58:54 UTC
Something is causing us to hate the second case now, assigning "infinite" cost to it:

Processing dependency tree rooted at 5.
Using existing initializer: _3 = -_2;

Increment vector:

  0  increment:   -8
     count:       1
     cost:        -24
     initializer: _3

  1  increment:   -16
     count:       1
     cost:        1000
     initializer:
Comment 5 Bill Schmidt 2016-10-19 18:27:01 UTC
This is an example of a known limitation in SLSR.  The explanation is in analyze_increments:

      /* FORNOW: If we need to add an initializer, give up if a cast from       
         the candidate's type to its stride's type can lose precision.          
         This could eventually be handled better by expressly retaining the     
         result of a cast to a wider type in the stride.  Example:              
                                                                                
           short int _1;                                                        
           _2 = (int) _1;                                                       
           _3 = _2 * 10;                                                        
           _4 = x + _3;    ADD: x + (10 * _1) : int                             
           _5 = _2 * 15;                                                        
           _6 = x + _3;    ADD: x + (15 * _1) : int                             
                                                                                
         Right now replacing _6 would cause insertion of an initializer         
         of the form "short int T = _1 * 5;" followed by a cast to              
         int, which could overflow incorrectly.  Had we recorded _2 or          
         (int)_1 as the stride, this wouldn't happen.  However, doing           
         this breaks other opportunities, so this will require some             
         care.  */

In this case, we have "int" rather than "short int," and "int *" rather than "int".  Assuming a 64-bit machine, casting from a 64-bit wrapping type to a 32-bit non-wrapping type isn't legal.

I need to look at the stride-selection logic again.  The various cast-sensitive bits of this are tricky to get right.
Comment 6 Bill Schmidt 2016-10-31 03:05:32 UTC
Author: wschmidt
Date: Mon Oct 31 03:04:59 2016
New Revision: 241695

URL: https://gcc.gnu.org/viewcvs?rev=241695&root=gcc&view=rev
Log:
[gcc]

2016-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/71915
	PR tree-optimization/71490
	* gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
	stride_type field.
	(find_basis_for_base_expr): Require stride types to match when
	seeking a basis.
	(alloc_cand_and_find_basis): Record the stride type.
	(slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
	(backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
	the expressions having those types.
	(slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
	(create_mul_ssa_cand): Likewise.
	(create_mul_imm_cand): Likewise.
	(create_add_ssa_cand): Likewise.
	(create_add_imm_cand): Likewise.
	(legal_cast_p_1): Change interface to accept types rather than the
	expressions having those types.
	(legal_cast_p): Pass types to legal_cast_p_1.
	(slsr_process_cast): Pass stride type to
	alloc_cand_and_find_basis.
	(slsr_process_copy): Likewise.
	(dump_candidate): Display stride type when a cast exists.
	(create_add_on_incoming_edge): Introduce a cast when necessary for
	the stride type.
	(analyze_increments): Change the code checking for invalid casts
	to rely on the stride type, and update the documentation and
	example.  Change the code checking for pointer multiplies to rely
	on the stride type.
	(insert_initializers): Introduce a cast when necessary for the
	stride type.  Use the stride type for the type of the initializer.

[gcc/testsuite]

2016-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/71915
	PR tree-optimization/71490
	* gcc.dg/tree-ssa/pr54245.c: Delete.
	* gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
	document why.



Removed:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimple-ssa-strength-reduction.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
Comment 7 Bill Schmidt 2016-10-31 03:06:47 UTC
Fixed.