This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH]Construct canonical scaled address expression in IVOPT



> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Tuesday, September 24, 2013 7:58 PM
> To: Bin.Cheng
> Cc: Bin Cheng; GCC Patches; Richard Earnshaw
> Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT
> 
> On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng <amker.cheng@gmail.com>
> wrote:
> > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <bin.cheng@arm.com> wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>
> >> Or even [reg*scale] (not sure about that).  But yes, at least
> >> reg*scale + offset and reg*scale + reg.
> >>
> >>>   Apparently it's infeasible to check every possibility for each
> >>> architecture, is it ok we at least check "index", then "addr" if
> >>> "index" is failed?  By "any kind of addressing modes", I mean modes
> >>> supported by function get_address_cost,  i.e., in form of "[base] +
> >>> [off] + [var] + (reg|reg*scale)".
> >>
> >> I suppose so, but it of course depends on what IVOPTs uses the answer
> >> for in the end.  Appearantly it doesn't distinguish between the
> >> various cases even though TARGET_MEM_REF does support all the
> >> variants in question (reg * scale, reg * scale + reg, reg * scale +
> >> const, reg * scale + reg + const).
> >>
> >> So the better answer may be to teach the costs about the differences?
> > Ideally, IVOPT should be aware whether scaling is allowed in every
> > kind of addressing modes and account cost of multiplier accordingly.
> > For current code, there are two scenarios here
> > 1) If target supports reg*scale+reg, but not reg*scale, in this case,
> > IVOPT considers multiplier is not allowed in any addressing mode and
> > account multiplier with high cost.  This is the problem arm having.
> > 2) If target supports reg*scale, but not some kind of addressing mode
> > (saying reg*scale+reg), in this case, IVOPT still constructs various
> > scaled addressing mode in get_address_cost and depends on address_cost
> > to compute correct cost for that addressing expression.  I think this
> > happens to work even IVOPT doesn't know "reg*scale+reg" is actually
> > not supported.
> >
> >>
> >>>> The above also builds more RTX waste which you can fix by re-using
> >>>> the
> >>> PLUS
> >>>> by building it up-front similar to the multiplication.  You also
> >>>> miss the
> >>> Yes, this can be fixed.
> >>>
> >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is
valid.
> >>> I
> >>>> would expect that many targets support reg1 * scale +
> >>>> constant-offset but not many reg1 * scale + reg2.
> >>> I thought scale==1 is unnecessary because the addressing mode
> >>> degrades into "reg" or "reg+reg".  Moreover, calls of
> >>> multiplier_allowed_in_address_p in both get_address_cost and
> get_computation_cost_at have scale other than 1.
> >>
> >> Ok.
> >>
> >>>>
> >>>> So no, the helper now checks sth completely different.  What's the
> >>>> problem with arm supporting reg1 * scale?  Why shouldn't it being
> >>>> able to handle
> >>> the
> >>>> implicit zero offset?
> >>>
> >>> As Richard clarified, ARM does not support scaled addressing mode
> >>> without base register.
> >>
> >> I see.
> >>
> > Also from the newer comments:
> >
> >> Btw, it should be reasonably possible to compute the whole
> >> multiplier_allowed_in_address_p table for all primary and secondary
> >> archs (simply build cross-cc1) and compare the results before / after
> >> a patch candidate.  Querying both reg * scale and reg + reg * scale
> >> if the first fails sounds like a good solution to me.
> > I take this as we should do minimal change by checking reg + reg *
> > scale if reg * scale is failed,  right?
> 
> Yes, you can share a single RTL expression for all this and I think
querying reg
> + reg * scale first makes sense (then fallback to reg * scale for
compatibility).
> 
I updated the patch as discussed.  Please review.
It bootstraps and passes regression test on x86/x86_64, but fails
tree-ssa/scev-4.c on arm cortex-a15.
Hi Richard, I investigated the failure and found out it reveals two other
problems in IVOPT we have discussed.
For scev-4.c like:

typedef struct {
	int x;
	int y;
} S;
int *a_p;
S a[1000];
f(int k)
{
	int i;

	for (i=k; i<1000; i+=k) {
		a_p = &a[i].y;
		*a_p = 100;
        }
}

The iv candidates and uses are like:

use 0
  generic
  in statement a_p.0_5 = &a[k_11].y;

  at position 
  type int *
  base (int *) &a + ((sizetype) k_3(D) * 8 + 4)
  step (sizetype) k_3(D) * 8
  base object (void *) &a
  related candidates 
use 1
  address
  in statement MEM[(int *)&a][k_11].y = 100;

  at position MEM[(int *)&a][k_11].y
  type int *
  base &MEM[(int *)&a][k_3(D)].y
  step (sizetype) k_3(D) * 8
  base object (void *) &a
  related candidates

candidate 4 (important)
  depends on 1
  original biv
  type int
  base k_3(D)
  step k_3(D)

candidate 7
  depends on 1
  var_before ivtmp.12
  var_after ivtmp.12
  incremented before exit test
  type unsigned int
  base (unsigned int) ((int *) &a + (sizetype) k_3(D) * 8)
  step (unsigned int) ((sizetype) k_3(D) * 8)
  base object (void *) &a
Candidate 7 is related to use 0

Problem 1) When computing cost of use 1 using cand 4, the call of
strip_offset (&MEM[(int *)&a][k_3(D)].y, &off1) computes off1 == 0, which is
actually 4.
This can be fixed my previous patch which still in re-working.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01749.html

Problem 2) It allocates iv with not simplified base like &MEM[(int
*)&a][k_3(D)].y,  while cost computation functions like ptr_difference_const
can't handle such complex address properly (also force_expr_to_var_cost
etc.).  In this case, it can't understand that "((int *) &a + (sizetype)
k_3(D) * 8)" and "&MEM[(int *)&a][k_3(D)]" are actually equal.
I think it would substantially simplify computation of cost in IVOPT if we
can allocate iv with simplified base expression.  I have another patch for
this and will send it for review.

BTW, the failure can be fixed by the offset fixation patch.

So is it OK to applied this updated patch?

Thanks.
bin




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]