User account creation filtered due to spam.
t.c: void foo(int *v) { int *p; for(p = v; p < v + 2; ++p) *p = 0; for(p = v; p < v + 2; ++p) if(*p) *p = 1; } 4.8.0/4.8.2 (fedora 19, x86_64): % gcc -O3 -S t.c movq $0, (%rdi) movl 4(%rdi), %eax testl %eax, %eax je .L1 movl $1, 4(%rdi) .L1: rep ret 4.7.3: movl $0, (%rdi) movl $0, 4(%rdi) ret With -fno-tree-loop-distribute-patterns 4.7 and 4.8 generate same code.
I've noticed the same in other PRs, normally we manage to track the actual value of *p, but we don't manage that when *p was written by __builtin_mem*, which should still be doable: int f(int*p){ __builtin_memset(p,0,4); return *p; } gives the following .optimized: __builtin_memset (p_2(D), 0, 4); _4 = *p_2(D); return _4; (RTL fixes things later in this simple case)
(In reply to Marc Glisse from comment #1) > I've noticed the same in other PRs, normally we manage to track the actual > value of *p, but we don't manage that when *p was written by __builtin_mem*, > which should still be doable: PR 58483 has an example with memcpy.
This works for me on the trunk on aarch64-linux-gnu.
We now unroll early enough to not regress compared to 4.8 anymore. Some niter analysis improvements make us unroll in cunrolli pass. Note that if you adjust the testcase to make sure we don't unroll code with loop distribution is better: void foo(int *v) { int *p; for(p = v; p < v + 18; ++p) *p = 0; for(p = v; p < v + 18; ++p) if(*p) *p = 1; } still we don't optimize the 2nd loop in GCC for the following testcase void foo(int *v, int n) { int *p; __builtin_memset (v, 0, n * sizeof (int)); for(p = v; p < v + n; ++p) if(*p) *p = 1; } because value-numbering isn't clever enough here (constant 'n' would be moderately easier to handle).