Bug 52563 - FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1
Summary: FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: testsuite (show other bugs)
Version: tree-ssa
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 62173
  Show dependency treegraph
 
Reported: 2012-03-12 13:57 UTC by Igor Zamyatin
Modified: 2017-01-17 08:25 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-03-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Zamyatin 2012-03-12 13:57:24 UTC
Fails appeared after revision 185129:

2012-03-09 Jiangning Liu <jiangning.liu@arm.com> 

tree-scalar-evolution (interpret_rhs_expr): generate chrec for
array reference and component reference.
(analyze_scalar_evolution_for_address_of): New. 
2012-03-09 Jiangning Liu <jiangning.liu@arm.com> 

gcc.dg/tree-ssa/scev-3.c: New. 
gcc.dg/tree-ssa/scev-4.c: New. 


 Probably some simple misprint in tests - there are 2 "&a" in dumps...
Comment 1 Dominique d'Humieres 2012-03-12 14:00:49 UTC
Also seen on x86_64-apple-darwin10 with -m64.
Comment 2 Richard Biener 2012-03-12 14:19:58 UTC
We now perform store motion for the address computation as expected.  The
question is what the testcase was for (I suppose final-value-replacement
non-optimization) and eventually disable lim on them.
Comment 3 Jiangning Liu 2012-03-13 08:11:40 UTC
First, I tried gcc 4.4.3 on x86-64, and it works for this test case, so it is to some extension a GCC regression.

Second, I tried trunk, and I can reproduce the failure for this test case. It means there are still some other bugs hidden after my scalar evolution improvement on address of array element.

Third, loop ivopt should be able to find &a is the base object of a selected IV candidate after the analysis from simple_iv. After loop ivopt, I see the following dump,

Selected IV set:
candidate 5 (important)
  depends on 1
  var_before i_13
  var_after i_6
  original biv
  type int
  base k_2(D)
  step k_2(D)
candidate 6
  depends on 1
  var_before ivtmp.11_9
  var_after ivtmp.11_1
  incremented before exit test
  type unsigned long
  base (unsigned long) ((int *) &a + (sizetype) k_2(D) * 4)
  step (unsigned long) ((sizetype) k_2(D) * 4)
  base object (void *) &a

So &a is already identified as an IV base object, but somehow only one &a is  hoisted out of loop, while the other one isn't.

<bb 4>:
  D.1728_15 = (sizetype) k_2(D);
  D.1729_16 = D.1728_15 * 4;
  D.1730_17 = (sizetype) k_2(D);
  D.1731_18 = D.1730_17 * 4;
  D.1732_19 = &a + D.1731_18;
  ivtmp.11_20 = (unsigned long) D.1732_19;

<bb 5>:
  # i_13 = PHI <i_6(7), k_2(D)(4)>
  # ivtmp.11_9 = PHI <ivtmp.11_1(7), ivtmp.11_20(4)>
  a_p.0_4 = (int *) ivtmp.11_9;
  MEM[(int *)&a][i_13] = 100;
  i_6 = i_13 + k_2(D);
  ivtmp.11_1 = ivtmp.11_9 + D.1729_16;
  if (i_6 <= 999)
    goto <bb 7>;
  else
    goto <bb 6>;

This statement,

  MEM[(int *)&a][i_13] = 100;

is expected to be like,

  D.4086_21 = (void *) ivtmp.11_9;
  MEM[base: D.4086_21, offset: 0B] = 100;

after loop ivopt.
Comment 4 Jiangning Liu 2012-03-19 04:10:36 UTC
Hi,

  /* In general,
     (TYPE) (BASE + STEP * i) = (TYPE) BASE + (TYPE -- sign extend) STEP * i,
     but we must check some assumptions.

     1) If [BASE, +, STEP] wraps, the equation is not valid when precision
        of CT is smaller than the precision of TYPE.  For example, when we
	cast unsigned char [254, +, 1] to unsigned, the values on left side
	are 254, 255, 0, 1, ..., but those on the right side are
	254, 255, 256, 257, ...
     2) In case that we must also preserve the fact that signed ivs do not
        overflow, we must additionally check that the new iv does not wrap.
	For example, unsigned char [125, +, 1] casted to signed char could
	become a wrapping variable with values 125, 126, 127, -128, -127, ...,
	which would confuse optimizers that assume that this does not
	happen.  */
  must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type);

The code above in function convert_affine_scev set must_check_src_overflow to true for 32-bit, while set false for 64-bit code.

This means 64-bit mode fails to unfold the address expression for array element because of the case 1) as listed in comments above.

For the address of array element a[i], "&a + unitsize * i" has different representation for 32-bit and 64-bit. For 32-bit, it is "(32-bit pointer) + (32-bit integer)", while for 64-bit, it is "(64-bit pointer) + (32-bit integer)".

If you try the case scev-5.c as below, you may find the case can pass.

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

int *a_p;
int a[1000];

f(int k)
{
        long long i;

        for (i=k; i<1000; i+=k) {
                a_p = &a[i];
                *a_p = 100;
        }
}

/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
/* { dg-final { cleanup-tree-dump "optimized" } } */

Any idea to fix this problem?

Thanks,
-Jiangning
Comment 5 Richard Biener 2012-03-19 09:55:15 UTC
This statement,

  MEM[(int *)&a][i_13] = 100;

is expected to be like,

  D.4086_21 = (void *) ivtmp.11_9;
  MEM[base: D.4086_21, offset: 0B] = 100;

after loop ivopt.

If the mem is not changed then we failed to analyze it.

(In reply to comment #4)
> Hi,
> 
>   /* In general,
>      (TYPE) (BASE + STEP * i) = (TYPE) BASE + (TYPE -- sign extend) STEP * i,
>      but we must check some assumptions.
> 
>      1) If [BASE, +, STEP] wraps, the equation is not valid when precision
>         of CT is smaller than the precision of TYPE.  For example, when we
>     cast unsigned char [254, +, 1] to unsigned, the values on left side
>     are 254, 255, 0, 1, ..., but those on the right side are
>     254, 255, 256, 257, ...
>      2) In case that we must also preserve the fact that signed ivs do not
>         overflow, we must additionally check that the new iv does not wrap.
>     For example, unsigned char [125, +, 1] casted to signed char could
>     become a wrapping variable with values 125, 126, 127, -128, -127, ...,
>     which would confuse optimizers that assume that this does not
>     happen.  */
>   must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type);
> 
> The code above in function convert_affine_scev set must_check_src_overflow to
> true for 32-bit, while set false for 64-bit code.
> 
> This means 64-bit mode fails to unfold the address expression for array element
> because of the case 1) as listed in comments above.
> 
> For the address of array element a[i], "&a + unitsize * i" has different
> representation for 32-bit and 64-bit. For 32-bit, it is "(32-bit pointer) +
> (32-bit integer)", while for 64-bit, it is "(64-bit pointer) + (32-bit
> integer)".

Of course.  'i' is 'int', thus 32bit on 64bit, so it is
&a + 4 * (sizetype) i

this is the usual issue that we lose overflow knowledge with the current
POINTER_PLUS_EXPR representation (forced unsigned offset, forced sizetype
precision offset).

> If you try the case scev-5.c as below, you may find the case can pass.
> 
> /* { dg-do compile } */
> /* { dg-options "-O2 -fdump-tree-optimized" } */
> 
> int *a_p;
> int a[1000];
> 
> f(int k)
> {
>         long long i;
> 
>         for (i=k; i<1000; i+=k) {
>                 a_p = &a[i];
>                 *a_p = 100;
>         }
> }
> 
> /* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
> /* { dg-final { cleanup-tree-dump "optimized" } } */
> 
> Any idea to fix this problem?

We cannot fix it without relaxing the POINTER_PLUS_EXPR constraints.
I was working on that, but as usual the TYPE_IS_SIZETYPE removal
has priority.

Please consider fixing/XFAILing the testcases as they still FAIL and you
are responsible for this.  You can open a new enhancement PR covering
this.

> Thanks,
> -Jiangning
Comment 6 Jiangning Liu 2012-03-20 02:32:12 UTC
> We cannot fix it without relaxing the POINTER_PLUS_EXPR constraints.
> I was working on that, but as usual the TYPE_IS_SIZETYPE removal
> has priority.

Do you mean you are also working on removing TYPE_IS_SIZETYPE?

> 
> Please consider fixing/XFAILing the testcases as they still FAIL and you
> are responsible for this.  You can open a new enhancement PR covering
> this.
> 

I think 64-bit mode should also have this optimization enabled. XFAIL implies the missing of this optimization is a correct behavior. But I think this is not what I expected. So I don't think we should add XFAIL for this case. Instead I want to add a new test case scev-5.c to cover 64-bit testing.

Thanks,
-Jiangning
Comment 7 rguenther@suse.de 2012-03-20 09:51:00 UTC
On Tue, 20 Mar 2012, liujiangning at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563
> 
> --- Comment #6 from Jiangning Liu <liujiangning at gcc dot gnu.org> 2012-03-20 02:32:12 UTC ---
> > We cannot fix it without relaxing the POINTER_PLUS_EXPR constraints.
> > I was working on that, but as usual the TYPE_IS_SIZETYPE removal
> > has priority.
> 
> Do you mean you are also working on removing TYPE_IS_SIZETYPE?

Yes.

> > 
> > Please consider fixing/XFAILing the testcases as they still FAIL and you
> > are responsible for this.  You can open a new enhancement PR covering
> > this.
> > 
> 
> I think 64-bit mode should also have this optimization enabled. XFAIL implies
> the missing of this optimization is a correct behavior. But I think this is not
> what I expected. So I don't think we should add XFAIL for this case. Instead I
> want to add a new test case scev-5.c to cover 64-bit testing.

XFAIL says it's a known failure.

Richard.
Comment 8 Jiangning Liu 2012-03-22 09:17:51 UTC
Author: liujiangning
Date: Thu Mar 22 09:17:45 2012
New Revision: 185678

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185678
Log:
2012-03-22  Jiangning Liu  <jiangning.liu@arm.com>                                                        

	PR tree-optimization/52563
	* gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64.
	* gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64.
	* gcc.dg/tree-ssa/scev-5.c: New.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
Modified:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
Comment 9 bin cheng 2015-06-02 03:34:08 UTC
Author: amker
Date: Tue Jun  2 03:33:35 2015
New Revision: 224009

URL: https://gcc.gnu.org/viewcvs?rev=224009&root=gcc&view=rev
Log:

	PR tree-optimization/52563
	PR tree-optimization/62173
	* tree-ssa-loop-ivopts.c (struct iv): New field.  Reorder fields.
	(alloc_iv, set_iv): New parameter.
	(determine_biv_step): Delete.
	(find_bivs): Inline original determine_biv_step.  Pass new
	argument to set_iv.
	(idx_find_step): Use no_overflow information for conversion.
	* tree-scalar-evolution.c (analyze_scalar_evolution_in_loop): Let
	resolve_mixers handle folded_casts.
	(instantiate_scev_name): Change bool parameter to bool pointer.
	(instantiate_scev_poly, instantiate_scev_binary): Ditto.
	(instantiate_array_ref, instantiate_scev_not): Ditto.
	(instantiate_scev_3, instantiate_scev_2): Ditto.
	(instantiate_scev_1, instantiate_scev_r): Ditto.
	(instantiate_scev_convert, ): Change parameter.  Pass argument
	to chrec_convert_aggressive.
	(instantiate_scev): Change argument.
	(resolve_mixers): New parameter and set it.
	(scev_const_prop): New argument.
	* tree-scalar-evolution.h (resolve_mixers): New parameter.
	* tree-chrec.c (convert_affine_scev): Call chrec_convert instead
	of chrec_conert_1.
	(chrec_convert): New parameter.  Move definition below.
	(chrec_convert_aggressive): New parameter and set it.  Call
	convert_affine_scev.
	* tree-chrec.h (chrec_convert): New parameter.
	(chrec_convert_aggressive): Ditto.

	gcc/testsuite/ChangeLog
	PR tree-optimization/52563
	PR tree-optimization/62173
	* gcc.dg/tree-ssa/scev-3.c: Remove xfail.
	* gcc.dg/tree-ssa/scev-4.c: Ditto.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
    trunk/gcc/tree-chrec.c
    trunk/gcc/tree-chrec.h
    trunk/gcc/tree-scalar-evolution.c
    trunk/gcc/tree-scalar-evolution.h
    trunk/gcc/tree-ssa-loop-ivopts.c
Comment 10 ctice 2016-04-08 17:09:42 UTC
Author: ctice
Date: Fri Apr  8 17:09:09 2016
New Revision: 234832

URL: https://gcc.gnu.org/viewcvs?rev=234832&root=gcc&view=rev
Log:
Unify changes with Android's GCC 4.9 compiler.

Add the following changes from the Android
GCC 4.9 compiler (mostly adding fixes for aarch64):

Fix mingw build breakage
    1) Add missing _GCOV_fopen if !__KERNEL__
    2) Use _fullpath

Backport Cortex-A57's machine description support from trunk

Adjust generic move costs for aarch64. Backport from trunk

Enable C++ exceptions and RTTI by default.

Modify LINK_SPEC to pass --fix-cortex-a53-843419 as default

Rename libstdc++.so to libgnustl_shared.so when enabling bionic libs.

Drop mips64r2 from Android compiler's multilib

Merge "Drop mips64r2 from Android compiler's multilib"

Adjust several costs for AArch64:
  Refactor aarch64_address_costs; add cost tables for Cortex-A7;
  better estimate cost of building a constant; wrap aarch64_rtx_costs
  to dump verbose output; factor out common MULT cases; det default
  costs and handle vector modes; cost memory accesses using address
  costs; better cost logical operations; improve costs for div/mod and
  sign/zero extend operations; cost comparisons, flag setting
  operators and IF_THEN_ELSE; cost more Floating point RTX; cost
  TRUNCATE, SET, SYMBOL_REF, HIGH and LO_SUM; dump a message if we are
  unable to cost an insn; fix typos in cost data structure.


Add several improvements for AArch64 (Backported from GCC 5):
  (spill code - swap order in shr patterns; spill code - swap order in
  shl pattern; fix aarch64_rtx_costs of PLUS/MINUS; cost operand 0 in
  FP compare-with-0.0 case; properly cost FABD pattern; properly
  handle mvn-register and add EON+shift pattern and cost
  appropriately).

Disable inlining of memcpy for x86 with 'rep movs'.
Default to TLS guard for x86 stack-protector.
Change gcc BASE-VER from 4.9.x-google to 4.9.x

Cherry pick the following fixes from trunk: PR bootstrap/66638, 67954
(svn rev 230894, PR tree-optimization/65447,
PR tree-optimization/52563, tree-optimization/62173,
PR tree-optimization/48052, PR 64878, PR65048, PR65177, PR65735.

Port revision 219584 from linaro/gcc-4_9-branch

Fix for arm64 bad code for copysignl.


Added:
    branches/google/gcc-4_9-mobile/gcc/sancov.c
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/sancov/
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/sancov/asan.c
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/sancov/basic0.c
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/sancov/basic1.c
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/sancov/basic2.c
Modified:
    branches/google/gcc-4_9-mobile/ChangeLog
    branches/google/gcc-4_9-mobile/config/futex.m4
    branches/google/gcc-4_9-mobile/configure
    branches/google/gcc-4_9-mobile/configure.ac
    branches/google/gcc-4_9-mobile/gcc/BASE-VER
    branches/google/gcc-4_9-mobile/gcc/ChangeLog
    branches/google/gcc-4_9-mobile/gcc/Makefile.in
    branches/google/gcc-4_9-mobile/gcc/builtins.def
    branches/google/gcc-4_9-mobile/gcc/cfghooks.c
    branches/google/gcc-4_9-mobile/gcc/cfgloop.c
    branches/google/gcc-4_9-mobile/gcc/cfgloop.h
    branches/google/gcc-4_9-mobile/gcc/common.opt
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64-cores.def
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64-elf-raw.h
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64-linux.h
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64-protos.h
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64-tune.md
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64.c
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64.md
    branches/google/gcc-4_9-mobile/gcc/config/aarch64/aarch64.opt
    branches/google/gcc-4_9-mobile/gcc/config/i386/i386.c
    branches/google/gcc-4_9-mobile/gcc/config/linux-android.h
    branches/google/gcc-4_9-mobile/gcc/config/mips/t-linux-android64
    branches/google/gcc-4_9-mobile/gcc/configure
    branches/google/gcc-4_9-mobile/gcc/doc/invoke.texi
    branches/google/gcc-4_9-mobile/gcc/except.c
    branches/google/gcc-4_9-mobile/gcc/expmed.c
    branches/google/gcc-4_9-mobile/gcc/gcov-io.h
    branches/google/gcc-4_9-mobile/gcc/loop-init.c
    branches/google/gcc-4_9-mobile/gcc/lra-constraints.c
    branches/google/gcc-4_9-mobile/gcc/omp-low.c
    branches/google/gcc-4_9-mobile/gcc/params.def
    branches/google/gcc-4_9-mobile/gcc/passes.def
    branches/google/gcc-4_9-mobile/gcc/sanitizer.def
    branches/google/gcc-4_9-mobile/gcc/testsuite/ChangeLog
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
    branches/google/gcc-4_9-mobile/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
    branches/google/gcc-4_9-mobile/gcc/tree-cfg.c
    branches/google/gcc-4_9-mobile/gcc/tree-cfg.h
    branches/google/gcc-4_9-mobile/gcc/tree-chrec.c
    branches/google/gcc-4_9-mobile/gcc/tree-chrec.h
    branches/google/gcc-4_9-mobile/gcc/tree-pass.h
    branches/google/gcc-4_9-mobile/gcc/tree-scalar-evolution.c
    branches/google/gcc-4_9-mobile/gcc/tree-scalar-evolution.h
    branches/google/gcc-4_9-mobile/gcc/tree-ssa-loop-ivopts.c
    branches/google/gcc-4_9-mobile/gcc/tree-ssa-loop-niter.c
    branches/google/gcc-4_9-mobile/gcc/tree-ssa-loop-niter.h
    branches/google/gcc-4_9-mobile/gcc/tree-ssa-threadedge.c
    branches/google/gcc-4_9-mobile/gcc/tree-ssa-threadupdate.c
    branches/google/gcc-4_9-mobile/gcc/tree-ssa-threadupdate.h
    branches/google/gcc-4_9-mobile/libgcc/libgcov-util.c
    branches/google/gcc-4_9-mobile/libstdc++-v3/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/acinclude.m4
    branches/google/gcc-4_9-mobile/libstdc++-v3/configure
    branches/google/gcc-4_9-mobile/libstdc++-v3/configure.ac
    branches/google/gcc-4_9-mobile/libstdc++-v3/doc/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/include/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/libsupc++/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/po/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/python/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/src/Makefile.am
    branches/google/gcc-4_9-mobile/libstdc++-v3/src/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/src/c++11/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/src/c++98/Makefile.in
    branches/google/gcc-4_9-mobile/libstdc++-v3/testsuite/Makefile.in
Comment 11 Rainer Orth 2016-05-19 14:54:15 UTC
Between 20160513 and 20160519, the following failures started to appear
on sparc*-*-solaris2.*, both 32- and 64-bit:

FAIL: gcc.dg/tree-ssa/scev-3.c scan-tree-dump-times optimized "&a" 1
FAIL: gcc.dg/tree-ssa/scev-4.c scan-tree-dump-times optimized "&a" 1
FAIL: gcc.dg/tree-ssa/scev-5.c scan-tree-dump-times optimized "&a" 1

This is a regression from gcc-6.

  Rainer
Comment 12 Andre Vieira 2016-05-20 12:28:45 UTC
Same regression observed on arm-none-eabi.
Comment 13 rguenther@suse.de 2016-05-20 12:42:37 UTC
On Fri, 20 May 2016, andre.simoesdiasvieira at arm dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563
> 
> Andre Vieira <andre.simoesdiasvieira at arm dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |andre.simoesdiasvieira@arm.
>                    |                            |com
> 
> --- Comment #12 from Andre Vieira <andre.simoesdiasvieira at arm dot com> ---
> Same regression observed on arm-none-eabi.

Note the scanning is for final value replacement and it seems IVO now
has an initial value using &a as well.  So maybe instead scan the
scev-ccp dump.
Comment 14 bin cheng 2016-05-20 15:31:02 UTC
(In reply to rguenther@suse.de from comment #13)
> On Fri, 20 May 2016, andre.simoesdiasvieira at arm dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563
> > 
> > Andre Vieira <andre.simoesdiasvieira at arm dot com> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |andre.simoesdiasvieira@arm.
> >                    |                            |com
> > 
> > --- Comment #12 from Andre Vieira <andre.simoesdiasvieira at arm dot com> ---
> > Same regression observed on arm-none-eabi.
> 
> Note the scanning is for final value replacement and it seems IVO now
> has an initial value using &a as well.  So maybe instead scan the
> scev-ccp dump.

Haven't looked into it, but the use of &a in initial value is intended?  Looks like sccp failed to propagate last value of induction variable to a_p.
Comment 15 Andre Vieira 2016-05-20 15:36:29 UTC
So the code change for sccp moves the following sequence out of the loop:

  _2 = (sizetype) i_30;
  _3 = _2 * 8;
  _10 = _3 + 4;
  _1 = &a + _10;
  a_p = _1;

This is basically: 
*a_p = &a[last_i].y;

I agree that that makes sense, were it not for the fact that sequence is recomputing the address of a[i].y which is already computed inside the loop for:

MEM[(int *)&a][i_11].y = 100;

When IVOPTS comes around it creates a code sequence similar to this one to calculate the address it writes 100 to, and you end up with a needless recomputation of the address.

Now I don't know what phase should be responsible for cleaning this up, whether its sccp's responsibility to realize that the address computation is the same, or whether there should be some sort of common sub-expression elimination step in between or something else.
Comment 16 bin cheng 2016-05-20 15:47:23 UTC
(In reply to Andre Vieira from comment #15)
> So the code change for sccp moves the following sequence out of the loop:
> 
>   _2 = (sizetype) i_30;
>   _3 = _2 * 8;
>   _10 = _3 + 4;
>   _1 = &a + _10;
>   a_p = _1;
> 
> This is basically: 
> *a_p = &a[last_i].y;
> 
> I agree that that makes sense, were it not for the fact that sequence is
> recomputing the address of a[i].y which is already computed inside the loop
> for:
> 
> MEM[(int *)&a][i_11].y = 100;
> 
> When IVOPTS comes around it creates a code sequence similar to this one to
> calculate the address it writes 100 to, and you end up with a needless
> recomputation of the address.
> 
> Now I don't know what phase should be responsible for cleaning this up,
> whether its sccp's responsibility to realize that the address computation is
> the same, or whether there should be some sort of common sub-expression
> elimination step in between or something else.

Passes have been reordered recently.  After that, sink pass runs after lim will sink the computation to loop exit.  also there is no pass cleaning it up.
Before reordering, assignment to a_p isn't lsm out of loop, there is nothing for sink pass to sink.
Comment 17 Andre Vieira 2016-05-20 16:43:24 UTC
Ah yes my bad, its not sccp doing it... got a bit confused there... It is indeed sink that moves that sequence down. Sorry for the noise.

Question remains on how to clean this up though. Ideally you would like to end up with

a_p = <last_computed_base>;
outside of the loop.
Comment 18 rguenther@suse.de 2016-05-23 07:40:31 UTC
On Fri, 20 May 2016, andre.simoesdiasvieira at arm dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563
> 
> --- Comment #17 from Andre Vieira <andre.simoesdiasvieira at arm dot com> ---
> Ah yes my bad, its not sccp doing it... got a bit confused there... It is
> indeed sink that moves that sequence down. Sorry for the noise.
> 
> Question remains on how to clean this up though. Ideally you would like to end
> up with
> 
> a_p = <last_computed_base>;
> outside of the loop.

First of all please open a new bug for the FAILs.  Second, the fix will
be mostly adjusting the testcase expectations (eventually disabling LIM
for example if we want to test SCCP abilities).

As to your question it techincally is a job of CSE though it may be a
tough job to make it figure out the equivalence.

Now, in the case of scev-5.c (the only regression I see on x86_64, with
-m32), SCCP fails to do final value replacement for i_24:

  <bb 4>:
  # i_12 = PHI <i_10(3), i_9(5)>
  MEM[(int *)&a][i_12] = 100;
  i_9 = i_5 + i_12;
  if (i_9 <= 999)
    goto <bb 5>;
  else
    goto <bb 6>;

  <bb 5>:
  goto <bb 4>;

  <bb 6>:
  # i_24 = PHI <i_12(4)>
  _2 = (sizetype) i_24;
  _3 = _2 * 4;
  _1 = &a + _3;
  a_p = _1;

which may or may not be a good thing - this way IVOPTs can see the
extra use of i_12 on the loop exit and it _could_ look for derived
uses of that so it _may_ be able to replace the use of i_24 with
something better.
Comment 19 Andre Vieira 2016-05-23 13:47:28 UTC
> First of all please open a new bug for the FAILs.  Second, the fix will
> be mostly adjusting the testcase expectations (eventually disabling LIM
> for example if we want to test SCCP abilities).

Opened a new ticket for this PR71237, makes sense to continue the discussions there. I also quoted your comment there.
Comment 20 Richard Biener 2017-01-17 08:24:41 UTC
Fixed.
Comment 21 Richard Biener 2017-01-17 08:25:05 UTC
Author: rguenth
Date: Tue Jan 17 08:24:26 2017
New Revision: 244519

URL: https://gcc.gnu.org/viewcvs?rev=244519&root=gcc&view=rev
Log:
2017-01-17  Richard Biener  <rguenther@suse.de>

	PR testsuite/52563
	PR testsuite/71237
	PR testsuite/77737
	* gcc.dg/tree-ssa/scev-3.c: Re-write to a GIMPLE testcase for IVOPTs.
	* gcc.dg/tree-ssa/scev-4.c: Likewise.
	* gcc.dg/tree-ssa/scev-5.c: Likewise.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c