This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [RFC] loop index promotion pass

From: Richard Guenther <richard dot guenther at gmail dot com>
To: Richard Guenther <richard dot guenther at gmail dot com>, gcc-patches at gcc dot gnu dot org
Date: Fri, 24 Apr 2009 21:50:49 +0200
Subject: Re: [PATCH] [RFC] loop index promotion pass
References: <20090423161719.GV22588@codesourcery.com> <84fc9c000904240242i2b99b3ecr874798c46db7b5ec@mail.gmail.com> <20090424184303.GZ22588@codesourcery.com>

On Fri, Apr 24, 2009 at 8:43 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On Fri, Apr 24, 2009 at 11:42:19AM +0200, Richard Guenther wrote:
>> On Thu, Apr 23, 2009 at 6:17 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
>> > The pass is placed as early as possible in the pipeline so that as many
>> > optimizations as possible see the promoted indices.
>> >
>> > Bootstrapped on x86_64-unknown-linux-gnu. ?Comments welcome.
>>
>> I wonder if on no-undefined-overflow branch we can simply avoid
>> the promotion/demotion (by teaching fold to recognize the
>> (short)((int)x +/nv (int)y) pattern and fold it to x + y).
>
> Something along those lines would be nice. ?(Although the actual thing
> to recognize would be (short)((unsigned short)x +/nv (unsigned short)
> y).)

Ok, I did that.  It regularizes the code, but with no effect on the generated
code.  It seems that SCEV analysis is confused enough to not be able to
compute the scalar evolution for the loads at all (I didn't yet "fix" the
SCEV code wrt no-undefined-overflow semantics though).

(get_scalar_evolution
  (scalar = j_25)
  (scalar_evolution = {2, +, 1}_1))
(get_scalar_evolution
  (scalar = D.1246_6)
  (scalar_evolution = ))
(get_scalar_evolution
  (scalar = D.1246_32)
  (scalar_evolution = ))

<bb 3>:
  # D.1246_32 = PHI <D.1246_6(4), 1(2)>
  # ivtmp.28_36 = PHI <ivtmp.28_35(4), 0(2)>
  D.1247_14 = D.1246_32 +/nv pretmp.26_38;
  c_15 = (unsigned int) D.1247_14;
  D.1250_18 = c_15 + 481;
  D.1251_20 = in_19(D) +/nv D.1250_18;
  D.1252_21 = *D.1251_20;
  MEM[base: out_5(D), index: ivtmp.28_36]{*out} = D.1252_21;
  D.2016_39 = (short unsigned int) ivtmp.28_36;
  j_40 = D.2016_39 + 2;
  j_25 = j_40;
  D.1246_6 = (int) j_25;
  ivtmp.28_35 = ivtmp.28_36 + 1;
  if (ivtmp.28_35 != 478)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;


for the reduced testcase

void
frob (unsigned char *in, unsigned char *out, unsigned short i)
{
  unsigned short w = 480;
  unsigned short j;

  for (j = 1; j < (w - 1); j++)
    {
      unsigned int c = (i * w) + j;
      unsigned short b = in[c + w + 1];
      *out++ = b;
    }
}

I have to think about this more (if computing a SCEV is possible here).

Thus, the real problem seems to be that of using an induction
variable derived from (int)j with j being unsigned (and thus
possibly wrapping).  But it at least looks like with overflowing
signed arithmetic being available on the branch we should be
able to fix this.

It's definitely an interesting testcase ;)

>> Or maybe I am missing what is exactly confusing the compiler here?
>> Is it that c is unsigned int? ?Or is it that the induction variable
>> arithmetic and loop exist tests are done in int?
>
> IIRC, the IV canonicalization passes get confused by the casts on the IV
> arithmetic, not the casts. ?You're not doing the arithmetic on the
> actual index, you're doing arithmetic on a differently-typed value and
> then stuffing that back into the loop index. ?(This issue affects other
> things, like computing loop trip counts and so forth, not just IV
> canonicalization.) ?If signed arithmetic was represented in the IR as
> actual signed arithmetic and not convert-to-unsigned-and-back, then IV
> canonicalization might do the right thing for free.

Not yet at least.  The issue is that we do not seem to use the fact that
the loop is finite and the induction variable does not wrap (number
of iteration analysis brute-forces this knowledge).

>> Other than that, why did you not implement this in the
>> induction variable canonicalization pass (thus, simply create
>> an alternative induction variable there)?
>
> Because I did not see how to make the loop optimizer bits handle the
> casts correctly. ?I agree that this would be a more elegant solution,
> but it will have to wait for somebody smarter than myself to implement
> it. ?(Or until I get a chance to sit down and receive a serious tutorial
> from the loop optimizer folks...)

Yeah, it's not exactly straight-forward.

Richard.

References:
- [PATCH] [RFC] loop index promotion pass
  - From: Nathan Froyd
- Re: [PATCH] [RFC] loop index promotion pass
  - From: Richard Guenther
- Re: [PATCH] [RFC] loop index promotion pass
  - From: Nathan Froyd

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]