42216 – [4.5 Regression] changes in scheduling regress 464.h264ref 20%

Bug 42216 - [4.5 Regression] changes in scheduling regress 464.h264ref 20%

Summary: [4.5 Regression] changes in scheduling regress 464.h264ref 20%

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.5.0

Importance:	P2 normal
Target Milestone:	4.5.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2009-11-29 13:39 UTC by Richard Biener
Modified:	2010-03-18 17:59 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:	x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2009-12-01 09:58:01

Attachments
Test patch to attempt to narrow down the problem (2.37 KB, patch) 2009-12-01 17:25 UTC, Bernd Schmidt	Details \| Diff
basic-block that is rejected (3.31 KB, text/plain) 2009-12-02 13:16 UTC, Richard Biener	Details
Patch to make it less conservative when accepting matching constraints with different modes (697 bytes, patch) 2009-12-02 16:28 UTC, Bernd Schmidt	Details \| Diff
Alternative fix for 42220 (2.16 KB, patch) 2010-03-08 23:04 UTC, Bernd Schmidt	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2009-11-29 13:39:04 UTC

See http://gcc.opensuse.org/SPEC/CINT/sb-balakirew-head-64-2006/recent.html

First bad rev. is 154713, last good is 154686.

It's quite obvious that either the regrename.c changes have code generation
differences or that removing the vec_interleave_* expanders caused this
regression.

Richard - you removed all vec_interleave_* expanders but the vectorizer
still checks for optab_for_tree_code (VEC_INTERLEAVE_*_EXPR) which ends
up looking at vec_interleave_*_optab.  Either this code is all dead now
or you caused the vectorization no longer to apply.

I'll check if reverting your revision gets back performance.

Comment 1 Richard Henderson 2009-11-29 17:58:20 UTC

The vec_interleave_*_optab should still be populated.  It's just
that what was once "sse2_punpcklwd" is now "vec_interleave_lowv8hi"
directly.  If this patch *is* attributable to a regression, then
perhaps there's a typo somewhere in the substitutions...

Comment 2 rguenther@suse.de 2009-11-30 09:58:12 UTC

Subject: Re:  [4.5 Regression] 464.h265ref peak
 regressed 20%

On Sun, 29 Nov 2009, rth at gcc dot gnu dot org wrote:

> ------- Comment #1 from rth at gcc dot gnu dot org  2009-11-29 17:58 -------
> The vec_interleave_*_optab should still be populated.  It's just
> that what was once "sse2_punpcklwd" is now "vec_interleave_lowv8hi"
> directly.  If this patch *is* attributable to a regression, then
> perhaps there's a typo somewhere in the substitutions...

Indeed it wasn't your patch.  Checking with Bernds patch reverted now.

Richard.

Comment 3 H.J. Lu 2009-11-30 10:54:04 UTC

This may be related to revision 154688, which has caused PR 42202.

Comment 4 Richard Biener 2009-12-01 09:58:00 UTC

It was indeed Bernds patch.  Bernd - the patch discription suggested it was
a compile-time performance fix and not going to change code?  There is quite
big effect (including this negative one) on SPEC CPU 2006 on x86_64 ...

Comment 5 Bernd Schmidt 2009-12-01 15:26:24 UTC

Code generation changes are expected for two reasons - the code became less conservative when determining conflicts with other registers, so we can usually rename in more cases.  There are also a few cases where we have to give up on renaming due to multiword hard register overlaps we can't track.

Unfortunately I don't have access to spec; I'll have to think about how to track this down.

Comment 6 rguenther@suse.de 2009-12-01 15:35:48 UTC

Subject: Re:  [4.5 Regression] rev 15458[78]
 regress 464.h264ref peak 20%

On Tue, 1 Dec 2009, bernds_cb1 at t-online dot de wrote:

> ------- Comment #5 from bernds_cb1 at t-online dot de  2009-12-01 15:26 -------
> Code generation changes are expected for two reasons - the code became less
> conservative when determining conflicts with other registers, so we can usually
> rename in more cases.  There are also a few cases where we have to give up on
> renaming due to multiword hard register overlaps we can't track.
> 
> Unfortunately I don't have access to spec; I'll have to think about how to
> track this down.

I will try to come up with a testcase for you.  I suppose it might
be a backend issue as regrename should mainly effect the 2nd
scheduling pass, right?

Richard.

Comment 7 Richard Biener 2009-12-01 16:46:34 UTC

Just reverting rev. 154688 and using the training set gets us from

464.h264ref        --        228         -- S

to

464.h264ref        --        170         -- S                                  

at -O3 -ffast-math -funroll-loops -march-native (-march=k8-sse3 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=k8).

After the patch the oprofile looks like

CPU: AMD64 processors, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
2241755  51.1739  SetupFastFullPelSearch
442008   10.0900  BlockMotionSearch
271429    6.1961  SubPelBlockMotionSearch
269337    6.1483  FastPelY_14
170961    3.9026  UMVLine16Y_11
159311    3.6367  SetupLargerBlocks
155556    3.5510  SATD
127230    2.9044  FastLine16Y_11
72328     1.6511  dct_luma
69728     1.5917  getNonAffNeighbour

All but the *Y_1[14] functions are inside mv-search.c, just re-compiling
that file is enough to reproduce the issue.

after reverting it

CPU: AMD64 processors, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
1223269  36.8289  SetupFastFullPelSearch
450306   13.5573  BlockMotionSearch
320395    9.6461  SubPelBlockMotionSearch
245175    7.3815  FastPelY_14
160694    4.8380  SetupLargerBlocks
155353    4.6772  SATD
153292    4.6152  UMVLine16Y_11
79531     2.3944  FastLine16Y_11
72455     2.1814  dct_luma
70733     2.1296  getNonAffNeighbour
42828     1.2894  UMVPelY_14
34396     1.0356  UnifiedOneForthPix

Comment 8 Richard Biener 2009-12-01 17:03:31 UTC

The hot loop is mv-search.c:SetupFastFullPelSearch

  for (pos = 0; pos < max_pos; pos++)
  {
    abs_y = offset_y + spiral_search_y[pos];
    abs_x = offset_x + spiral_search_x[pos];

    if (range_partly_outside)
    {
      if (abs_y >= 0 && abs_y <= max_height &&
          abs_x >= 0 && abs_x <= max_width    )
      {
        PelYline_11 = FastLine16Y_11;
      }
      else
      {
        PelYline_11 = UMVLine16Y_11;
      }
    }

    orgptr = orig_blocks;
    bindex = 0;
    for (blky = 0; blky < 4; blky++)
    {
      LineSadBlk0 = LineSadBlk1 = LineSadBlk2 = LineSadBlk3 = 0;
      for (y = 0; y < 4; y++)
      {
        refptr = PelYline_11 (ref_pic, abs_y++, abs_x, img_height, img_width);

        LineSadBlk0 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk0 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk0 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk0 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk1 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk1 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk1 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk1 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk2 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk2 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk2 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk2 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk3 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk3 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk3 += byte_abs [*refptr++ - *orgptr++];
        LineSadBlk3 += byte_abs [*refptr++ - *orgptr++];
      }
      block_sad[bindex++][pos] = LineSadBlk0;
      block_sad[bindex++][pos] = LineSadBlk1;
      block_sad[bindex++][pos] = LineSadBlk2;
      block_sad[bindex++][pos] = LineSadBlk3;
    }
  }

good assembly of the innermost loop:

.L1422:
        leal    1(%rsi), %r9d
        movl    64(%rsp), %r8d
        movl    68(%rsp), %ecx
        movl    52(%rsp), %edx
        movq    72(%rsp), %rdi
        movl    %r9d, 32(%rsp)
        call    *%rax
        movzwl  (%rbx), %ecx
        movzwl  (%rax), %edx
        movzwl  2(%rbx), %r10d
        movzwl  2(%rax), %r11d
        movq    byte_abs(%rip), %r9
        movzwl  4(%rbx), %esi
        subl    %ecx, %edx
        movzwl  4(%rax), %ecx
        movslq  %edx, %r8
        subl    %r10d, %r11d
        movzwl  6(%rax), %r10d
        addl    (%r9,%r8,4), %r14d
        movzwl  6(%rbx), %r8d
        movslq  %r11d, %rdi
        addl    (%r9,%rdi,4), %r14d
        movzwl  8(%rbx), %edi
        subl    %esi, %ecx
        movslq  %ecx, %rdx
        movzwl  8(%rax), %ecx
        subl    %r8d, %r10d
        addl    (%r9,%rdx,4), %r14d
        movzwl  10(%rax), %r8d
        movzwl  10(%rbx), %edx
        movslq  %r10d, %r11
        addl    (%r9,%r11,4), %r14d
        movzwl  12(%rbx), %r11d
        subl    %edi, %ecx
        movzwl  12(%rax), %edi
        movslq  %ecx, %rsi
        subl    %edx, %r8d
        addl    (%r9,%rsi,4), %r13d
        movzwl  14(%rax), %edx
        movzwl  14(%rbx), %esi
        movslq  %r8d, %r10
        subl    %r11d, %edi
        addl    (%r9,%r10,4), %r13d
        movzwl  16(%rax), %r11d
        movzwl  16(%rbx), %r10d
        movslq  %edi, %rcx
        addl    (%r9,%rcx,4), %r13d
        movzwl  18(%rbx), %ecx
        subl    %esi, %edx
        movslq  %edx, %r8
        movzwl  18(%rax), %edx
        subl    %r10d, %r11d
        addl    (%r9,%r8,4), %r13d
        movzwl  20(%rax), %r10d
        movzwl  20(%rbx), %r8d
        movslq  %r11d, %rdi
        addl    (%r9,%rdi,4), %ebp
        movzwl  22(%rbx), %edi
        subl    %ecx, %edx
        movzwl  22(%rax), %ecx
        movslq  %edx, %rsi
        subl    %r8d, %r10d
        addl    (%r9,%rsi,4), %ebp
        movzwl  24(%rax), %r8d
        movzwl  24(%rbx), %esi
        movslq  %r10d, %r11
        subl    %edi, %ecx
        addl    (%r9,%r11,4), %ebp
        movzwl  26(%rax), %edi
        movslq  %ecx, %rdx
        movzwl  26(%rbx), %r11d
        addl    (%r9,%rdx,4), %ebp
        movzwl  28(%rbx), %edx
        subl    %esi, %r8d
        movslq  %r8d, %r10
        movzwl  28(%rax), %r8d
        movzwl  30(%rax), %eax
        addl    (%r9,%r10,4), %r12d
        movzwl  30(%rbx), %r10d
        subl    %r11d, %edi
        movslq  %edi, %rcx
        addl    (%r9,%rcx,4), %r12d
        subl    %edx, %r8d
        subl    %r10d, %eax
        movslq  %r8d, %rsi
        addl    (%r9,%rsi,4), %r12d
        cltq
        addl    (%r9,%rax,4), %r12d
        addq    $32, %rbx
        addq    $32, %r15
        cmpq    $128, %r15
        je      .L1600
        movq    PelYline_11(%rip), %rax
        movl    32(%rsp), %esi
        jmp     .L1422

bad assembly:

.L1422:
        leal    1(%rsi), %edx
        movl    68(%rsp), %ecx
        movl    64(%rsp), %r8d
        movq    72(%rsp), %rdi
        movl    %edx, 32(%rsp)
        movl    52(%rsp), %edx
        call    *%rax
        movzwl  (%rbx), %esi
        movzwl  (%rax), %ecx
        movq    byte_abs(%rip), %rdx
        subl    %esi, %ecx
        movzwl  2(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r14d
        movzwl  2(%rax), %ecx
        subl    %esi, %ecx
        movzwl  4(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r14d
        movzwl  4(%rax), %ecx
        subl    %esi, %ecx
        movzwl  6(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r14d
        movzwl  6(%rax), %ecx
        subl    %esi, %ecx
        movzwl  8(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r14d
        movzwl  8(%rax), %ecx
        subl    %esi, %ecx
        movzwl  10(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r13d
        movzwl  10(%rax), %ecx
        subl    %esi, %ecx
        movzwl  12(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r13d
        movzwl  12(%rax), %ecx
        subl    %esi, %ecx
        movzwl  14(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r13d
        movzwl  14(%rax), %ecx
        subl    %esi, %ecx
        movzwl  16(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r13d
        movzwl  16(%rax), %ecx
        subl    %esi, %ecx
        movzwl  18(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %ebp
        movzwl  18(%rax), %ecx
        subl    %esi, %ecx
        movzwl  20(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %ebp
        movzwl  20(%rax), %ecx
        subl    %esi, %ecx
        movzwl  22(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %ebp
        movzwl  22(%rax), %ecx
        subl    %esi, %ecx
        movzwl  24(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %ebp
        movzwl  24(%rax), %ecx
        subl    %esi, %ecx
        movzwl  26(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r12d
        movzwl  26(%rax), %ecx
        subl    %esi, %ecx
        movzwl  28(%rbx), %esi
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r12d
        movzwl  28(%rax), %ecx
        movzwl  30(%rax), %eax
        subl    %esi, %ecx
        movslq  %ecx, %rcx
        addl    (%rdx,%rcx,4), %r12d
        movzwl  30(%rbx), %ecx
        subl    %ecx, %eax
        cltq
        addl    (%rdx,%rax,4), %r12d
        addq    $32, %rbx
        addq    $32, %r15
        cmpq    $128, %r15
        je      .L1600
        movq    PelYline_11(%rip), %rax
        movl    32(%rsp), %esi
        jmp     .L1422

seems to be really only scheduling differences...

Comment 9 Bernd Schmidt 2009-12-01 17:25:54 UTC

Created attachment 19202 [details]
Test patch to attempt to narrow down the problem

Comment 10 Bernd Schmidt 2009-12-01 17:27:21 UTC

Yes, regrename should only affect the second scheduling pass.

I'm attaching a stab-in-the-dark patch.  It contains a fix for the ia64 regressions (testing in progress), it adds a few warnings which could be useful to spot whether the testcase is producing the corner cases we refuse to handle, and it makes some changes to the register selection.  The idea is to prefer registers which aren't used in the basic block yet, which should give the best freedom to the scheduler.

I'd be glad if you could play with this and let me know the results.

Comment 11 Richard Biener 2009-12-02 13:16:05 UTC

The patch didn't change performance.  For the relevant functions I see

mv-search.c:452:1: warning: failing block due to bad pattern

(the location seems to be always the end of the function though).

The insns are most of the time

(insn 333 1737 334 39 mv-search.c:542 (set (reg:DI 0 ax [orig:1365 block_c.pos_x ] [1365])
        (sign_extend:DI (reg:SI 0 ax))) 126 {extendsidi2_rex64} (nil))

or similar.

For the hot function with the regression I see:

(insn 287 2237 289 9 mv-search.c:327 (set (reg:DI 0 ax [803])
        (sign_extend:DI (reg:SI 0 ax [802]))) 126 {extendsidi2_rex64} (nil))

(insn 437 436 438 27 mv-search.c:305 (parallel [
            (set (reg:DI 5 di [853])
                (zero_extend:DI (plus:SI (reg:SI 5 di [orig:737 ivtmp.1996 ] [737])
                        (const_int -1 [0xffffffffffffffff]))))
            (clobber (reg:CC 17 flags))
        ]) 253 {*addsi_1_zext} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

(insn 663 662 664 55 mv-search.c:436 (set (reg:DI 0 ax [940])
        (sign_extend:DI (reg:SI 0 ax [939]))) 126 {extendsidi2_rex64} (nil))

(insn 719 2224 721 62 mv-search.c:327 (set (reg:DI 0 ax [948])
        (sign_extend:DI (reg:SI 0 ax [947]))) 126 {extendsidi2_rex64} (nil))

line 436 is inside the hot loop in question, I'll attach a dump of the
BB in question we skip due to it.  After the above insn only the following
extra insns appear:

(insn 664 663 665 55 mv-search.c:436 (parallel [
            (set (reg/v:SI 41 r12 [orig:394 LineSadBlk3 ] [394])
                (plus:SI (reg/v:SI 41 r12 [orig:394 LineSadBlk3 ] [394])
                    (mem:SI (plus:DI (mult:DI (reg:DI 0 ax [940])
                                (const_int 4 [0x4]))
                            (reg/f:DI 1 dx [orig:479 byte_abs.393 ] [479])) [2 S4 A32])))  
            (clobber (reg:CC 17 flags))
        ]) 251 {*addsi_1} (expr_list:REG_DEAD (reg/f:DI 1 dx [orig:479 byte_abs.393 ] [479])
        (expr_list:REG_DEAD (reg:DI 0 ax [940])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(note 665 664 666 55 NOTE_INSN_DELETED)
(insn 666 665 667 55 mv-search.c:305 (parallel [
            (set (reg/v/f:DI 3 bx [orig:404 orgptr ] [404])
                (plus:DI (reg/v/f:DI 3 bx [orig:404 orgptr ] [404])
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) 252 {*adddi_1} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 667 666 669 55 mv-search.c:305 (parallel [
            (set (reg:DI 44 r15 [orig:398 ivtmp.1894 ] [398])
                (plus:DI (reg:DI 44 r15 [orig:398 ivtmp.1894 ] [398])
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) 252 {*adddi_1} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 669 667 670 55 mv-search.c:417 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:DI 44 r15 [orig:398 ivtmp.1894 ] [398])
            (const_int 128 [0x80]))) 7 {*cmpdi_1} (nil))
(jump_insn 670 669 671 55 mv-search.c:417 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 668)
            (pc))) 619 {*jcc_1} (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (expr_list:REG_BR_PROB (const_int 7500 [0x1d4c])
            (nil)))
 -> 668)
;; lr  out       3 [bx] 6 [bp] 7 [sp] 41 [r12] 42 [r13] 43 [r14] 44 [r15]
;; live  out     3 [bx] 6 [bp] 7 [sp] 41 [r12] 42 [r13] 43 [r14] 44 [r15]


They all look like they'd be handled just fine.

Comment 12 Richard Biener 2009-12-02 13:16:44 UTC

Created attachment 19208 [details]
basic-block that is rejected

Comment 13 Bernd Schmidt 2009-12-02 16:28:22 UTC

Created attachment 19211 [details]
Patch to make it less conservative when accepting matching constraints with different modes

Comment 14 Bernd Schmidt 2009-12-02 16:29:18 UTC

Would you mind trying another patch, both for testing performance, and if that fails, for interesting warnings?

Comment 15 Richard Biener 2009-12-02 16:57:04 UTC

With the second patch there are no warnings when building 464.h264ref (I have
reverted the first patch before installing the 2nd, it didn't apply cleanly
otherwise).  Performance is back to normal with the patch applied.

Comment 16 Bernd Schmidt 2009-12-02 17:06:27 UTC

Thanks for testing.  This means the regression is fixed by the patch?  I'll do a full test run then.

Comment 17 rguenther@suse.de 2009-12-02 17:07:39 UTC

Subject: Re:  [4.5 Regression] changes in scheduling
 regress 464.h264ref 20%

On Wed, 2 Dec 2009, bernds_cb1 at t-online dot de wrote:

> ------- Comment #16 from bernds_cb1 at t-online dot de  2009-12-02 17:06 -------
> Thanks for testing.  This means the regression is fixed by the patch?  I'll do
> a full test run then.

Yes, the regression is fixed by the second patch.

Richard.

Comment 18 Bernd Schmidt 2009-12-04 14:26:17 UTC

Unfortunately it causes failures.  Tracking these mismatches really is quite tricky.

I'll try to fix it.

Comment 19 rguenther@suse.de 2009-12-04 14:34:44 UTC

Subject: Re:  [4.5 Regression] changes in scheduling
 regress 464.h264ref 20%

On Fri, 4 Dec 2009, bernds_cb1 at t-online dot de wrote:

> ------- Comment #18 from bernds_cb1 at t-online dot de  2009-12-04 14:26 -------
> Unfortunately it causes failures.  Tracking these mismatches really is quite
> tricky.
> 
> I'll try to fix it.

Thanks.

Richard.

Comment 20 Bernd Schmidt 2009-12-10 18:03:23 UTC

Subject: Bug 42216

Author: bernds
Date: Thu Dec 10 18:03:05 2009
New Revision: 155134

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155134
Log:
	PR rtl-opt/42216
	* regrename.c: Error out if MAX_RECOG_OPERANDS is larger than
	HOST_BITS_PER_WIDE_INT.
	(verify_reg_in_set): New function, broken out of verify_reg_tracked.
	(verify_reg_tracked): Use it.
	(scan_rtx_reg): When seeing a use involving a superset of the
	registers in an existing chain, enlarge that chain.  Otherwise,
	allow subsets and set fail_current_block for all other kinds of
	overlap.
	(hide_operands): New argument UNTRACKED_OPERANDS; callers changed.
	Do not modify operands when the bit with the corresponding number
	is set in that bitmap.
	(build_def_use): When we see matching operands with different
	modes, don't set fail_current_block, but keep track of such
	operands in a bitmap if their registers aren't already tracked
	in an open chain.  Pass this bitmap to all hide_operands calls.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/regrename.c

Comment 21 Richard Biener 2009-12-11 21:39:05 UTC

All is well again.

Comment 22 Richard Biener 2010-03-08 11:48:31 UTC

All is bad again after the fix for PR42220:

2010-03-07  Bernd Schmidt  <bernd.schmidt@analog.com>

        PR rtl-optimization/42220
        * regrename.c (scan_rtx) <case STRICT_LOW_PART, ZERO_EXTRACT>:
        Use verify_reg_tracked to determine if we should use OP_OUT rather
        than OP_INOUT.
        (build_def_use): If we see an in-out operand for a register that we
        know nothing about, treat is an output if possible, fail the block if
        not.

Comment 23 Bernd Schmidt 2010-03-08 23:04:31 UTC

Created attachment 20048 [details]
Alternative fix for 42220

If you wouldn't mind, please test the attached patch which should be an alternative fix for 42220 and avoids the need to set fail_block for some cases.

Comment 24 Dominique d'Humieres 2010-03-09 08:35:36 UTC

> If you wouldn't mind, please test the attached patch which should be an
> alternative fix for 42220 and avoids the need to set fail_block for some cases.

I cannot test if the patch in comment #23 fixes the regression for SPEC, but a partial test indicates that it does not reintroduce pr42220 (patch applied on top of r157293 with the patch from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43287#c10 and a simple update). More testing during the coming nights).

Comment 25 rguenther@suse.de 2010-03-09 10:31:01 UTC

Subject: Re:  [4.5 Regression] changes in scheduling
 regress 464.h264ref 20%

On Mon, 8 Mar 2010, bernds_cb1 at t-online dot de wrote:

> ------- Comment #23 from bernds_cb1 at t-online dot de  2010-03-08 23:04 -------
> Created an attachment (id=20048)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20048&action=view)
> Alternative fix for 42220
> 
> If you wouldn't mind, please test the attached patch which should be an
> alternative fix for 42220 and avoids the need to set fail_block for some cases.

The patch fixes the regression for me.

Thanks,
Richard.

Comment 26 Dominique d'Humieres 2010-03-14 09:33:49 UTC

I have posted a full regression testing with the patch in comment #23 at

http://gcc.gnu.org/ml/gcc-testresults/2010-03/msg01182.html

The failures are the usual ones and pr42220 is still fixed.

Comment 27 Bernd Schmidt 2010-03-17 09:25:47 UTC

Subject: Bug 42216

Author: bernds
Date: Wed Mar 17 09:25:35 2010
New Revision: 157511

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157511
Log:
	PR rtl-optimization/42216
	* regrename.c (create_new_chain): New function, broken out from...
	(scan_rtx_reg): ... here.  Call it.  Handle the case where we are
	appending a use to an empty chain.
	(build_def_use): Remove previous changes that convert OP_INOUT to
	OP_OUT operands; instead detect the case where an OP_INOUT operand
	uses a previously untracked register and create an empty chain for
	it.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/regrename.c

Comment 28 Richard Biener 2010-03-18 17:59:28 UTC

All fine again. Fixed.