This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

From: Yufeng Zhang <Yufeng dot Zhang at arm dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: Jeff Law <law at redhat dot com>, Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Tue, 03 Dec 2013 15:52:18 +0000
Subject: Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
Authentication-results: sourceware.org; auth=none
References: <5277EA58 dot 5020303 at arm dot com> <1384189786 dot 8213 dot 28 dot camel at gnopaine> <5282ACE5 dot 8020304 at arm dot com> <1384365888 dot 8213 dot 65 dot camel at gnopaine> <5283D3B5 dot 1040300 at arm dot com> <1384373498 dot 8213 dot 76 dot camel at gnopaine> <1384376098 dot 8213 dot 94 dot camel at gnopaine> <52840A69 dot 1020308 at arm dot com> <CAFiYyc189fgJ6Ytov4xfi6tyAacKrhcivYdY38O+M66jTAvbQA at mail dot gmail dot com> <5294B7F4 dot 4020106 at arm dot com> <529CABA5 dot 1040003 at arm dot com> <529D7ED6 dot 6060408 at redhat dot com> <529DD39F dot 9080405 at arm dot com> <CAFiYyc28DNZRn2=LbjYfPbmOxxQy1q6nKrtz-x9LPfXv_j8VCg at mail dot gmail dot com>

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang<Yufeng.Zhang@arm.com>  wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html


Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhang<Yufeng.Zhang@arm.com>     wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it
helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.




Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks for
your prompt
replies and patience in guiding!

Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr maintainer
from the steering committee.   Voting was in progress just before the
close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)


Yes, you are right about the non-trivial 'base' tree are rarely shared.
     The cached is introduced mainly because get_alternative_base () may
be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often the
cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving time.
    It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is worth having
a cache to avoid calling them too many times.  I run the slsr-*.c tests
under gcc.dg/tree-ssa/ and find out that the cache hit rates range from
55.6% to 90%, with 73.5% as the average.  The samples may not well represent
the real world scenario, but they do show the fact that the 'base' tree can
be shared to some extent.  So I'd like to have the cache in the patch.

+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
is an effective check.  Alternatively, I can add a follow-up patch to
add some dumping facility in replace_ref () to print out the replacing
actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would be
better.  That's the only change that is required for this to move forward.



I've updated to patch to dump more details when -fdump-tree-slsr-details is
on.  The tests have also been updated to scan for these new dumps instead of
MEMs.


I suggest doing it quickly.  We're well past stage1 close at this point.



The bootstrapping on x86_64 is still running.  OK to commit if it succeeds?


I still don't like it.  It's using the wrong and too expensive tools to do
stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

   /* Base expression for the chain of candidates:  often, but not
      always, an SSA name.  */
   tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base (),
ignoring the returned offset.

'base_expr' is essentially the base address of a handled_component_p,e.g. ARRAY_REF, COMPONENT_REF, etc. In most case, it is the address ofthe object returned by get_inner_reference ().


Given a test case like the following:

typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
  a2[i+10][j] = 1;
  a2[i+10][j+1] = 1;
  a2[i+20][j] = 1;
}

The IR before SLSR is (on x86_64):

  _2 = (long unsigned int) i_1(D);
  _3 = _2 * 80;
  _4 = _3 + 800;
  _6 = a2_5(D) + _4;
  *_6[j_8(D)] = 1;
  _10 = j_8(D) + 1;
  *_6[_10] = 1;
  _12 = _3 + 1600;
  _13 = a2_5(D) + _12;
  *_13[j_8(D)] = 1;

The base_expr for the 1st and 2nd memory reference are the same, i.e._6, while the base_expr for a2[i+20][j] is _13.

_13 is essentially (_6 + 800), so all of the three memory referencesessentially share the same base address. As their strides are also thesame (MULT_EXPR (j, 4)), the three references can all be lowered toMEM_REFs. What this patch does is to use the tree affine tools to helprecognize the underlying base address expression; as it requires lookinginto the definitions of SSA_NAMEs, get_addr_base_and_unit_offset ()won't help here.

Bill has helped me exploit other ways of achieving this in SLSR, but sofar we think this is the best way to proceed. The use of tree affineroutines has been restricted to CAND_REFs only and there is theaforementioned cache facility to help reduce the overhead.


Thanks,
Yufeng

P.S. some more details what the patch does:

The CAND_REF for the three memory references are:

 6  [2] *_6[j_8(D)] = 1;
     REF  : _6 + ((sizetype) j_8(D) * 4) + 0 : int[20] *
     basis: 0  dependent: 8  sibling: 0
     next-interp: 0  dead-savings: 0

  8  [2] *_6[_10] = 1;
     REF  : _6 + ((sizetype) j_8(D) * 4) + 4 : int[20] *
     basis: 6  dependent: 11  sibling: 0
     next-interp: 0  dead-savings: 0

 11  [2] *_13[j_8(D)] = 1;
     REF  : _13 + ((sizetype) j_8(D) * 4) + 0 : int[20] *
     basis: 8  dependent: 0  sibling: 0
     next-interp: 0  dead-savings: 0

Before the patch, the strength reduction candidate chains for the threeCAND_REFs are:


  _6 -> 6 -> 8
  _13 -> 11

i.e. SLSR recognizes the first two references share the same basis,while the last one is on it own.


With the patch, an extra candidate chain can be recognized:

  a2_5(D) + (sizetype) i_1(D) * 80 -> 6 -> 11 -> 8

i.e. all of the three references are found to have the same basis(a2_5(D) + (sizetype) i_1(D) * 80), which is essentially the expanded _6or _13, with the immediate offset removed. The pass is now able tolower all of the three references, instead of the first two only, toMEM_REFs.

Follow-Ups:
- Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
  - From: Jeff Law
- Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
  - From: Richard Biener

References:
- Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
  - From: Yufeng Zhang
- Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
  - From: Jeff Law
- Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
  - From: Yufeng Zhang
- Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]