[PATCH] Fix PR79201 (half-way)

Thu May 11 14:06:00 GMT 2017

On Thu, 11 May 2017, Uros Bizjak wrote:

> On Thu, May 11, 2017 at 2:48 PM, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 11 May 2017, Rainer Orth wrote:
> >
> >> Hi Richard,
> >>
> >> > On Mon, 24 Apr 2017, Richard Biener wrote:
> >> >>
> >> >> One issue in PR79201 is that we don't sink pure/const calls which is
> >> >> what the following simple patch fixes.
> >> >>
> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >> >
> >> > Needed some gimple_assign_lhs -> gimple_get_lhs adjustments and
> >> > adjustment of gcc.target/i386/pr22152.c where we now sink the
> >> > assignment out of the pointless loop.  Not sure what the original
> >> > bug was about (well, reg allocation) so I simply disabled sinking
> >> > for it.
> >> >
> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >> >
> >> > Richard.
> >> >
> >> > 2017-04-25  Richard Biener  <rguenther@suse.de>
> >> >
> >> >     PR tree-optimization/79201
> >> >     * tree-ssa-sink.c (statement_sink_location): Handle calls.
> >> >
> >> >     * gcc.dg/tree-ssa/ssa-sink-16.c: New testcase.
> >> >     * gcc.target/i386/pr22152.c: Disable sinking.
> >>
> >> however, gcc.target/i386/pr22152.c FAILs now for 32-bit:
> >>
> >> FAIL: gcc.target/i386/pr22152.c scan-assembler-times movq[ \\\\t]+[^\\n]*%mm 1
> >
> > I remember seeing this and was not able to make sense of the testcase
> > which was added to fix some backend issue.  Disabling sinking doesn't
> > work (IIRC) as it is required to generate the original code as well.
> >
> > Uros added the testcase in 2008 -- I think if we want to have a testcase
> > for the original issue we need a different one.  Or simply remove
> > the testcase.
> 
> No, there is something going on in the testcase:
> 
> .L3:
>         movq    (%ecx,%eax,8), %mm1
>         paddq   (%ebx,%eax,8), %mm1
>         addl    $1, %eax
>         movq    %mm1, %mm0
>         cmpl    %eax, %edx
>         jne     .L3
> 
> 
> The compiler should allocate %mm0 to movq and paddq to avoid %mm1 ->
> %mm0 move. These are all movv1di patterns (they shouldn't interfere
> with movdi), and it is not clear to me why RA allocates %mm1 instead
> of %mm0.

In any case the testcase is no longer testing what it tested as the
input to RA is now different.  The testcase doesn't make much sense:

__m64
unsigned_add3 (const __m64 * a, const __m64 * b, unsigned int count)
{
  __m64 sum;
  unsigned int i;

  for (i = 1; i < count; i++)
    sum = _mm_add_si64 (a[i], b[i]);

  return sum;
}

that's equivalent to

__m64
unsigned_add3 (const __m64 * a, const __m64 * b, unsigned int count)
{
  __m64 sum;
  unsigned int i;

  if (1 < count)
    sum = _mm_add_si64 (a[count-1], b[count-1]);

  return sum;
}

which means possibly using uninitialized sum plus a pointless loop.

Richard.