User account creation filtered due to spam.

Bug 59642 - [5/6 Regression] Performance regression with -ftree-loop-distribute-patterns
Summary: [5/6 Regression] Performance regression with -ftree-loop-distribute-patterns
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P2 normal
Target Milestone: 5.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2013-12-30 21:03 UTC by olle
Modified: 2017-04-04 09:59 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work: 4.7.3, 7.0.1
Known to fail: 5.4.0, 6.3.0
Last reconfirmed: 2017-04-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description olle 2013-12-30 21:03:34 UTC
t.c:
void foo(int *v) {
  int *p;

  for(p = v; p < v + 2; ++p) *p = 0;
  
  for(p = v; p < v + 2; ++p)
    if(*p) *p = 1;
}

4.8.0/4.8.2 (fedora 19, x86_64):
% gcc -O3 -S t.c
        movq    $0, (%rdi)
        movl    4(%rdi), %eax
        testl   %eax, %eax
        je      .L1
        movl    $1, 4(%rdi)
.L1:
        rep ret

4.7.3:
        movl    $0, (%rdi)
        movl    $0, 4(%rdi)
        ret

With -fno-tree-loop-distribute-patterns 4.7 and 4.8 generate same code.
Comment 1 Marc Glisse 2013-12-30 22:58:52 UTC
I've noticed the same in other PRs, normally we manage to track the actual value of *p, but we don't manage that when *p was written by __builtin_mem*, which should still be doable:
int f(int*p){
  __builtin_memset(p,0,4);
  return *p;
}

gives the following .optimized:

  __builtin_memset (p_2(D), 0, 4);
  _4 = *p_2(D);
  return _4;

(RTL fixes things later in this simple case)
Comment 2 Marc Glisse 2014-01-01 19:15:15 UTC
(In reply to Marc Glisse from comment #1)
> I've noticed the same in other PRs, normally we manage to track the actual
> value of *p, but we don't manage that when *p was written by __builtin_mem*,
> which should still be doable:

PR 58483 has an example with memcpy.
Comment 3 Andrew Pinski 2016-11-26 21:44:00 UTC
This works for me on the trunk on aarch64-linux-gnu.
Comment 4 Richard Biener 2017-04-04 09:59:06 UTC
We now unroll early enough to not regress compared to 4.8 anymore.  Some niter analysis improvements make us unroll in cunrolli pass.

Note that if you adjust the testcase to make sure we don't unroll code with loop distribution is better:

void foo(int *v)
{
  int *p;

  for(p = v; p < v + 18; ++p) *p = 0;

  for(p = v; p < v + 18; ++p)
    if(*p) *p = 1;
}

still we don't optimize the 2nd loop in GCC for the following testcase

void foo(int *v, int n)
{
  int *p;

  __builtin_memset (v, 0, n * sizeof (int));

  for(p = v; p < v + n; ++p)
    if(*p) *p = 1;
}

because value-numbering isn't clever enough here (constant 'n' would be
moderately easier to handle).