Bug 27440 - [4.4 regression] code quality regression due to ivopts
Summary: [4.4 regression] code quality regression due to ivopts
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.3
: P2 minor
Target Milestone: 4.5.0
Assignee: Zdenek Dvorak
URL:
Keywords: missed-optimization
Depends on:
Blocks: 39201
  Show dependency treegraph
 
Reported: 2006-05-04 22:50 UTC by Dan Nicolaescu
Modified: 2012-03-13 13:00 UTC (History)
6 users (show)

See Also:
Host:
Target: i686-pc-linux-gnu
Build:
Known to work: 3.4.6, 4.5.0
Known to fail: 4.0.4
Last reconfirmed: 2009-02-03 16:25:29


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2006-05-04 22:50:42 UTC
Compiling this code with 3.4.6
void fill2 (unsigned int *arr,  unsigned int val, unsigned int start, unsigned int limit)
{
  unsigned int i;
  for (i = start; i < start + limit; i++)
    arr[i] = val;
}
generates: 
.L10:
        movl    %ecx, (%ebx,%eax,4)
        incl    %eax
.L8:
        cmpl    %eax, %edx
        ja      .L10
4.0/4.1/4.2 -O2 generate:

.L4:
        incl    %edx
        movl    %esi, (%eax)
        addl    $4, %eax
        cmpl    %ecx, %edx
        jne     .L4
which is both slower and bigger. 

using -O2 -fno-ivopts the result is much better:
.L4:
        movl    %ecx, (%ebx,%eax,4)
        incl    %eax
        cmpl    %edx, %eax
        jb      .L4

The difference in the .final_cleanup dump with and without ivopts is obvious:
With ivopts: 

  void * ivtmp.29;
  unsigned int ivtmp.26;
  unsigned int D.1290;

<bb 0>:
  D.1290 = start + limit;
  if (start < D.1290) goto <L6>; else goto <L2>;

<L6>:;
  ivtmp.29 = arr + (unsigned int *) (start * 4);
  ivtmp.26 = 0;

<L0>:;
  MEM[base: (unsigned int *) ivtmp.29] = val;
  ivtmp.26 = ivtmp.26 + 1;
  ivtmp.29 = ivtmp.29 + 4B;
  if (ivtmp.26 != D.1290 - start) goto <L0>; else goto <L2>;

<L2>:;
  return;

Without ivopts:
  unsigned int i;
  unsigned int D.1290;

<bb 0>:
  D.1290 = start + limit;
  if (start < D.1290) goto <L11>; else goto <L2>;

<L11>:;
  i = start;

<L0>:;
  *((unsigned int *) (i * 4) + arr) = val;
  i = i + 1;
  if (i < D.1290) goto <L0>; else goto <L2>;

<L2>:;
  return;


The   "void * ivtmp.29" is created by the ivopts pass. Why is it
a void* when it is known to be assigned to a unsigned int* ? 

Note that loops like the one in this example are quite common. For example in the assembly for PR8361 there are about 37 "fill" functions with very similar code (they are intantiations of 2 different templates, but still...)
Comment 1 Andrew Pinski 2006-05-04 23:04:26 UTC
IV-OPTs just gets info from the target.  Now if the target says weird addressing mode is the same as cheap ones, what do you think will happen?
Comment 2 Dan Nicolaescu 2006-05-04 23:09:33 UTC
(In reply to comment #1)
> IV-OPTs just gets info from the target.  Now if the target says weird
> addressing mode is the same as cheap ones, what do you think will happen?

Does IV-OPTs also take into consideration the cost of having 2 IVs instead of 1? 
Comment 3 Richard Biener 2006-05-05 13:10:46 UTC
Yes it does.  Not that the outcome is optimal here...
Comment 4 Wolfgang Bangerth 2006-09-13 03:32:35 UTC
With today's 4.1.x snapshot and on x86_64, I get this at -O2:
----------------
.L4:
        mov     %edx, %eax
        incl    %edx
        cmpl    %edx, %ecx
        movl    %esi, (%rdi,%rax,4)
        jne     .L4
----------------
and this at -O2 -fno-ivopts
----------------
.L4:
        mov     %edx, %eax
        incl    %edx
        cmpl    %ecx, %edx
        movl    %esi, (%rdi,%rax,4)
        jb      .L4
----------------

That means we get the same bad code there in both cases :-(

W.
Comment 5 Andrew Pinski 2006-09-13 04:25:56 UTC
(In reply to comment #4)
> That means we get the same bad code there in both cases :-(

Actually that is better code than was produced before.  Only the extra move is there now.  Except I don't get your results for either the mainline or 4.1.2.
Comment 6 Richard Biener 2006-10-10 14:06:28 UTC
Confirmed (as in comment #1).  With -Os instead of -O2 we even produce

.L3:
        movl    %ebx, -4(%edx)
        incl    %eax
.L2:
        addl    $4, %edx
        cmpl    %ecx, %eax
        jb      .L3

(because loop header copying is disabled there).
Comment 7 Uroš Bizjak 2006-10-10 14:48:30 UTC
(In reply to comment #6)
> Confirmed (as in comment #1).  With -Os instead of -O2 we even produce
> 
> .L3:
>         movl    %ebx, -4(%edx)

The -4(...) part comes from PR 24669.
Comment 8 Gabriel Dos Reis 2007-02-03 16:55:25 UTC
Won't fix in GCC-4.0.x.  Adjusting milestone.
Comment 9 Steven Bosscher 2007-11-19 10:07:26 UTC
What does GCC 4.3 do with the test case of this bug?
Comment 10 Uroš Bizjak 2007-11-20 07:36:22 UTC
(In reply to comment #9)
> What does GCC 4.3 do with the test case of this bug?

gcc -Os:

.L3:
        movl    %ebx, -4(%edx)
        incl    %eax
.L2:
        addl    $4, %edx
        cmpl    %ecx, %eax
        jb      .L3

And the -4(...) is due to PR 26726, look at comment #18.

Comment 11 Joseph S. Myers 2008-07-04 21:21:58 UTC
Closing 4.1 branch.
Comment 12 Paolo Bonzini 2009-02-03 16:25:29 UTC
When generating 64-bit code we produce good induction variables either with or without ivopts, but with an extra mov.

For 32-bit code the situation is the same as reported originally.
Comment 13 Joseph S. Myers 2009-03-31 19:36:37 UTC
Closing 4.2 branch.
Comment 14 Richard Biener 2009-08-04 12:27:45 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 15 Steven Bosscher 2009-12-30 16:39:49 UTC
With "GCC: (GNU) 4.5.0 20091228 (experimental) [trunk revision 155486]" I get identical code at -O2 with and without -fno-ivopts for i686:

.L3:
	movl	%ebx, (%ecx,%eax,4)
	addl	$1, %eax
	cmpl	%edx, %eax
	jb	.L3


For x86_64 at -O2, there is still the extra mov. The extra mov is not there with -Os (with and without -fno-ivopts).

Since the mov on x86_64 is essentially free, I consider this fixed for GCC 4.5.0.
Comment 16 Richard Biener 2010-05-22 18:11:06 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 17 Richard Biener 2011-06-27 12:12:28 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 18 Jakub Jelinek 2012-03-13 13:00:16 UTC
Fixed in 4.5+, 4.4 is no longer supported.