Bug 45312 - [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel
Summary: [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.4.4
: P1 normal
Target Milestone: 4.4.5
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2010-08-17 18:12 UTC by Artem S. Tashkinov
Modified: 2010-09-14 15:46 UTC (History)
10 users (show)

See Also:
Host:
Target: i686-pc-linux-gnu
Build:
Known to work: 4.6.0 4.5.2 4.4.5
Known to fail: 4.5.1 4.4.4 4.3.5
Last reconfirmed: 2010-08-24 10:30:58


Attachments
Potential test cases (851.81 KB, application/x-gzip)
2010-08-23 16:00 UTC, H. Peter Anvin
Details
pageattr.i (137.17 KB, text/plain)
2010-08-24 10:18 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Artem S. Tashkinov 2010-08-17 18:12:28 UTC
This bug is described here: https://bugzilla.kernel.org/show_bug.cgi?id=16612

In two words: this patch: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.35.y.git;a=commitdiff;h=568132624386f53e87575195d868db9afb2e9316

makes kernel 2.6.35.2 unusable on my PC.

The particular file that gets miscompiled is this one: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.35.y.git;a=blob;f=arch/x86/include/asm/cmpxchg_32.h;h=c1cf59d72f096e3825010681889eb6625c662d16;hb=HEAD

GCC 4.5.1 is not affected by this bug.
Comment 1 Jakub Jelinek 2010-08-17 18:24:35 UTC
That is not a proper bug report, see http://gcc.gnu.org/bugs.html for what needs to be provided.
Comment 2 Jakub Jelinek 2010-08-17 19:07:57 UTC
In particular, first bisect between object files compiled without/with the patch to find one compilation unit where the problem is, provide preprocessed source for it (both without and with the patch) and gcc command line options used to compile that.
Comment 3 Artem S. Tashkinov 2010-08-18 04:05:01 UTC
OK, then I'm closing this bug report, since I don't have enough experience to actually find a file that is being miscompiled. (And according to Linus it can be any out of dozens).
Comment 4 Serge Belyshev 2010-08-18 04:40:57 UTC
It is not ok to mark suspected wrong-code bugs as WONTFIX until they are confirmed or rejected by a maintainer.

Artem, it would be helpful if you at least tried to trim your kernel config until the bug disappears. Please also check GCC 4.3.5.  Also try to trigger the crash with init=/bin/sh kernel command line option.
Comment 5 Artem S. Tashkinov 2010-08-18 05:11:44 UTC
This crash occurs upon loading modules, so it's *very* difficult to trace/pinpoint it.

Most likely no kernel options have any effect on it.
Comment 6 Jakub Jelinek 2010-08-18 08:55:11 UTC
Doing binary search shouldn't be too hard, just build two separate kernel trees, one with the problematic patch applied, one without, with make V=1 output put into some log file in each case.
Then just start with the final link line and replace half of the objects/libraries there with matching objects/libraries from the second build and vice versa, test that kernel and depending on the results continue narrowing it down to half of the object files/libraries etc., repeat until you have e.g. working kernel with all objects with the patch applied except one (or non-working kernel with all objects without the patch applied except one).
Or, if you suspect particular object files, try to change just those immediately.
Comment 7 Artem S. Tashkinov 2010-08-18 20:00:32 UTC
I bet this bug can be triggered in a VM (e.g. in VirtualBox) too, so I'll try to debug it later.
Comment 8 H. Peter Anvin 2010-08-23 16:00:47 UTC
Created attachment 21549 [details]
Potential test cases

These are the four modules which this change seem to affect, from *my* build... please note that I do not personally have a reproduction of the failure.
Comment 9 Jakub Jelinek 2010-08-24 10:18:55 UTC
Created attachment 21554 [details]
pageattr.i

Compiling this on x86_64-linux (and likely i?86-linux) with 4.4 branch (r163492) using
./cc1 -fno-strict-aliasing -fno-common -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=core2 -ffreestanding -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow -fconserve-stack

results in broken second cmpxchg8b arguments:
.L74:
        movl    -108(%ebp), %eax
        xorl    %edx, %edx
        subl    mem_map, %eax
        movl    __supported_pte_mask, %ecx
        sarl    $5, %eax
        andl    $99, %ecx
        shldl   $12, %eax, %edx
        movl    -100(%ebp), %edi
        sall    $12, %eax
        movl    %edx, %ebx
        orl     %eax, %ecx
        movl    %edx, -28(%ebp)
        movl    %ecx, -32(%ebp)
        movl    %edx, %ecx
#APP
# 72 "/mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h" 1

1:      movl (%edi), %eax
        movl 4(%edi), %edx
        .section .smp_locks,"a"
.balign 4
.long 671f - .
.previous
671:
        lock; cmpxchg8b (%edi)
        jnz 1b
# 0 "" 2
#NO_APP

*.optimized has:
<bb 77>:
  D.29689 = (long long unsigned int) (long unsigned int) (((int) base - (int) mem_map) /[ex] 32) << 12 | __supported_pte_mask & 99;
  D.29693 = {};
  D.29693.pte = D.29689;
  D.29686 = D.29693;
  D.29718 = D.29686;
  pte = D.29718;
  pte = pte;
  pte = pte;
  ptep.249 = (long long unsigned int *) kpte;
  value = pte.pte;
  value.59 = (unsigned int *) &value;
  __asm__ __volatile__("
1:      movl (%1), %%eax
        movl 4(%1), %%edx
        .section .smp_locks,"a"
.balign 4
.long 671f - .
.previous
671:
        lock; cmpxchg8b (%1)
        jnz 1b" : "=m" *ptep.249 : "D" ptep.249 : "b" *value.59 : "c" *(value.59 + 4) : "memory", "dx", "ax", "dx", "ax", "ax");

so ebx should contain lower 32 bits of D.29689 (i.e. the same as -32(%ebp)) and
ecx should contain upper 32 bits of D.29689 (i.e. the same as -28(%ebp)).
But as can be seen in the assembly, that's not the case, %ebx instead contains the same value as %ecx, i.e. the upper 32 bits of D.29689.
Comment 10 Jakub Jelinek 2010-08-24 10:30:57 UTC
It seems things go wrong during RA.  In *.asmcons we still have the correct:
...
(insn 672 304 837 3 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:96 (parallel [
            (set (reg/f:SI 558 [ value.59 ])
                (plus:SI (reg/f:SI 20 frame)
                    (const_int -20 [0xffffffffffffffec])))
            (clobber (reg:CC 17 flags))
        ]) 285 {*addsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
;; End of basic block 3 -> ( 4)
...
(insn 1253 1125 1254 79 /mnt/b1/src/linux/set64/arch/x86/include/asm/pgtable-3level.h:36 (set (mem/c/i:SI (plus:SI (reg/f:SI 20 frame)
                (const_int -20 [0xffffffffffffffec])) [0 value+0 S4 A32])
        (reg:SI 600 [ D.29693 ])) 47 {*movsi_1} (nil))

(insn 1254 1253 673 79 /mnt/b1/src/linux/set64/arch/x86/include/asm/pgtable-3level.h:36 (set (mem/c/i:SI (plus:SI (reg/f:SI 20 frame)
                (const_int -16 [0xfffffffffffffff0])) [0 value+4 S4 A32])
        (reg:SI 601 [ D.29693+4 ])) 47 {*movsi_1} (nil))
            
(note 673 1254 674 79 NOTE_INSN_DELETED)

(insn 674 673 675 79 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 356)
        (mem:SI (plus:SI (reg/f:SI 558 [ value.59 ])
                (const_int 4 [0x4])) [0 S4 A32])) 47 {*movsi_1} (nil))

(insn 675 674 677 79 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (parallel [
            (set (mem:DI (reg/v/f:SI 132 [ kpte ]) [0 S8 A64])
                (asm_operands/v:DI ("
1:      movl (%1), %%eax
        movl 4(%1), %%edx
        .section .smp_locks,"a"
.balign 4
.long 671f - .
.previous
671:
        lock; cmpxchg8b (%1)
        jnz 1b") ("=m") 0 [
                        (reg/v/f:SI 132 [ kpte ])
                        (reg:SI 600 [ D.29693 ])
                        (reg:SI 356)
                    ]
                     [
                        (asm_input:SI ("D") 0)
                        (asm_input:SI ("b") 0)
                        (asm_input:SI ("c") 0)
                    ] 347175))
...
i.e. it stores the correct values into the DImode stack slot and then as ebx uses pseudo 600 and as as ecx reads from the stack the upper part (which is equal to pseudo 601).  But in *.ira it is wrong already:
(insn 1253 1125 1254 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/pgtable-3level.h:36 (set (mem/c/i:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -32 [0xffffffffffffffe0])) [0 value+0 S4 A32])
        (reg:SI 2 cx [501])) 47 {*movsi_1} (nil))

(insn 1254 1253 673 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/pgtable-3level.h:36 (set (mem/c/i:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -28 [0xffffffffffffffe4])) [0 value+4 S4 A32])
        (reg:SI 3 bx [orig:601 D.29693+4 ] [601])) 47 {*movsi_1} (nil))

(note 673 1254 674 78 NOTE_INSN_DELETED)

(note 674 673 1405 78 NOTE_INSN_DELETED)

(insn 1405 674 1407 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 5 di)
        (mem/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -100 [0xffffffffffffff9c])) [0 %sfp+-88 S4 A32])) 47 {*movsi_1} (nil))

(insn 1407 1405 1406 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 2 cx)
        (mem/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -28 [0xffffffffffffffe4])) [0 S4 A32])) 47 {*movsi_1} (nil))

(insn 1406 1407 675 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 3 bx)
        (reg:SI 2 cx [501])) 47 {*movsi_1} (nil))

If insn 1406 came right before insn 1407, it would be still correct.
Comment 11 Jakub Jelinek 2010-08-24 10:43:03 UTC
Reloads for insn # 675
Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        GENERAL_REGS, RELOAD_OTHER (opnum = 0)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 2: reload_in (SI) = (reg:SI 2 cx [501])
        BREG, RELOAD_FOR_INPUT (opnum = 2)
        reload_in_reg: (reg:SI 600 [ D.29693 ])
        reload_reg_rtx: (reg:SI 3 bx)
Reload 3: reload_in (SI) = (reg:SI 356)
        CREG, RELOAD_OTHER (opnum = 3)
        reload_in_reg: (reg:SI 356)
        reload_reg_rtx: (reg:SI 2 cx)
Comment 12 Vladimir Makarov 2010-09-01 18:06:26 UTC
(In reply to comment #10)

> (insn 1407 1405 1406 78
> /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 2 cx)
>         (mem/c:SI (plus:SI (reg/f:SI 6 bp)
>                 (const_int -28 [0xffffffffffffffe4])) [0 S4 A32])) 47
> {*movsi_1} (nil))
> 
> (insn 1406 1407 675 78
> /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 3 bx)
>         (reg:SI 2 cx [501])) 47 {*movsi_1} (nil))
> 
> If insn 1406 came right before insn 1407, it would be still correct.
> 

Yes, it would but I think the reload should still generate the right code in
this particular order of insns.  IMHO, fixing the order of insn is not the right thing to do because there might be situation of cycle (e.g. value of p600 is inherited from 2 but should be reloaded into 3 and p356 is inherited from 3 and should be reloaded into 2). 

The problem is definitely in reload inheritance.  Reg_last_reload_reg is not invalidated by insn #1407 which is generated by another reload of insn #675.

Reload inheritence bug fixes result in either big code degradation or possibility to induce new bugs.  It could be ok to fix such problem on the trunk but fixing it on release brach might be dangerous.

Looking through all patches for reload after gcc4.4 I don't think the bug is fixed on the trunk (or in gcc 4.5).  We probably are lucky that it did not occur there.
Comment 13 Jakub Jelinek 2010-09-01 18:32:38 UTC
Yeah, if the bug is still present on the trunk, it definitely should be fixed on the trunk first, and whether it is actually backportable or not is to be decided afterwards.  We just happen to have a reproducer only for 4.4 branch (not even 4.4-RH can reproduce it).
Comment 14 Ulrich Weigand 2010-09-03 18:30:19 UTC
(In reply to comment #12)
> Yes, it would but I think the reload should still generate the right code in
> this particular order of insns.  IMHO, fixing the order of insn is not the
> right thing to do because there might be situation of cycle (e.g. value of p600
> is inherited from 2 but should be reloaded into 3 and p356 is inherited from 3
> and should be reloaded into 2). 
> 
> The problem is definitely in reload inheritance.  Reg_last_reload_reg is not
> invalidated by insn #1407 which is generated by another reload of insn #675.

Actually, I think this really is a reload insn ordering problem. Note that reload inheritance tracking via reg_last_reload_reg works on the whole reloaded insn including all the generated reload insns as a whole. Conflicts between different reloads of the same original insn should be treated specially, taking into account reload insn ordering (this is done in free_for_value_p).

In this particular case, what happens is this.  After initial reload register selection, we have this set of reloads:

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 1: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        DIREG, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 2: reload_in (SI) = (reg:SI 588 [ D.29684 ])
        BREG, RELOAD_FOR_INPUT (opnum = 2)
        reload_in_reg: (reg:SI 588 [ D.29684 ])
        reload_reg_rtx: (reg:SI 3 bx)
Reload 3: reload_in (SI) = (reg:SI 356)
        CREG, RELOAD_FOR_INPUT (opnum = 3)
        reload_in_reg: (reg:SI 356)
        reload_reg_rtx: (reg:SI 2 cx)

Reload inheritance notes that reload_in of reload 2 (reg:SI 588) is already available in CX at this point.  While this cannot be used as reload register due to the BREG class, it is chosen as "override input", i.e. reload_in is set to (reg:SI 2 cx) instead of of (reg:SI 588).

Code near the end of choose_reload_regs then verifies whether this causes any problems due to re-use of the CX register, and comes to the (correct) decision that it does not, since it knows the reload insn for reload 3 (a RELOAD_FOR_INPUT for operand 3) which clobbers CX will certainly be generated *after* the override-input reload insn for reload 2 (a RELOAD_FOR_INPUT for operand 2).

Unfortunately, some time *after* this decision has been made, the when_needed field of reload 3 is changed from RELOAD_FOR_INPUT to RELOAD_OTHER.  This causes the sequence of generated reload insns to change (RELOAD_OTHER reloads are emitted before RELOAD_FOR_INPUT reloads) -- this breaks the assumptions the reload inheritance code made, and causes wrong code to be generated.

Now the question is, why does this change to RELOAD_OTHER happen.  This is done in merge_assigned_reloads, which is called because i386 is a target with SMALL_REGISTER_CLASSES.  This routine notices that reloads 0 and 1 share a reload register DI, and decides to merge them, making reload 0 a RELOAD_OTHER reload (and cancelling reload 1).  It then goes through all other reloads:

          /* If this is now RELOAD_OTHER, look for any reloads that
             load parts of this operand and set them to
             RELOAD_FOR_OTHER_ADDRESS if they were for inputs,
             RELOAD_OTHER for outputs.  Note that this test is
             equivalent to looking for reloads for this operand
             number.

This loop now appears to detect reload 3 as a reload that "loads part of" the operand 0.  This happens because 
               reg_overlap_mentioned_for_reload_p (rld[j].in,
                                                   rld[i].in))
returns true since both reload-in values are stack slots, and reg_overlap_mentioned_for_reload_p conservatively assumes that all memory accesses conflict.

Therefore, when_needed for reload 3 is also set to RELOAD_OTHER.  This is not only unnecessary, but in turn causes the breakage.

Now in this particular case, it might be possible to fix the problem by using a better detection of which additional reloads need to be modified.  (The comment says, "Note that this test is equivalent to looking for reloads for this operand number." -- which is not true, but could be implemented instead ...)

However, the problem seems to be more fundamental.  If *any* change, even necessary changes, to when_needed flags are made at this point, *any* decision on reload inheritance or input overrides that was based on reload insn ordering may now be incorrect.

I guess a conservative fix could be to check whether any reload inheritance was indeed or input override was indeed performed on this insn, and if so, refuse to perform the merge ...

Comment 15 Vladimir Makarov 2010-09-03 20:45:28 UTC
(In reply to comment #14)
Ulrih, I've just wanted to post the following when I found that you already posted analogous conclusion.  I should have been on CC to see your comment right away.  The problem is really fundamental.  Code for merge_assigned_reloads ignores inheritance (and dependencies between reloads because of inheritance) at all.  Here is my post wanted to add.

After thorough examining code for inheritance in
reload1.c::choose_reload_regs, I can not find where it can be wrong
for this test case.  After this function, we have the following
reloads:

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
Reload 1: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        DIREG, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
Reload 2: reload_in (SI) = (reg:SI 600 [ D.29693 ])
        BREG, RELOAD_FOR_INPUT (opnum = 2)
        reload_in_reg: (reg:SI 600 [ D.29693 ])
Reload 3: reload_in (SI) = (reg:SI 356)
        CREG, RELOAD_FOR_INPUT (opnum = 3)
        reload_in_reg: (reg:SI 356)

Function reload1.c::merge_assigned_reload called after
reload1.c::choose_reload_regs for targets with SMALL_REGISTER_CLASSES
(i686 case) merges 0th and 1st reloads (merging results in nullifying
reload_in in 1st the reload and changing 0th to RELOAD_OTHER)
producing

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        GENERAL_REGS, RELOAD_OTHER (opnum = 0)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 2: reload_in (SI) = (reg:SI 2 cx [501])
        BREG, RELOAD_FOR_INPUT (opnum = 2)
        reload_in_reg: (reg:SI 600 [ D.29693 ])
        reload_reg_rtx: (reg:SI 3 bx)
Reload 3: reload_in (SI) = (reg:SI 356)
        CREG, RELOAD_FOR_INPUT (opnum = 3)
        reload_in_reg: (reg:SI 356)
        reload_reg_rtx: (reg:SI 2 cx)

So far everything is ok.  But after that, it changes 3rd reload to
RELOAD_OTHER which means that it will be issued before 2nd reload
instead of after it as it was before.  Changing to RELOAD_OTHER is
done because the code assumes (on function
reg_overlap_mentioned_for_reload_p) that changing 3rd reload will
affect 0th reload.  In this unfortunate case pseudo 132 (from 0th
reload) and pseudo 356 (from 3rd reload) have equivalent memory and
reg_overlap_mentioned_for_reload_p is a simplified code which in this
case decides that changing equivalent memory of p356 affects
equivalent memory of p132.

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
        GENERAL_REGS, RELOAD_OTHER (opnum = 0)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg/v/f:SI 132 [ kpte ])
        reload_reg_rtx: (reg:SI 5 di)
Reload 2: reload_in (SI) = (reg:SI 2 cx [501])
        BREG, RELOAD_FOR_INPUT (opnum = 2)
        reload_in_reg: (reg:SI 600 [ D.29693 ])
        reload_reg_rtx: (reg:SI 3 bx)
Reload 3: reload_in (SI) = (reg:SI 356)
        CREG, RELOAD_OTHER (opnum = 3)
        reload_in_reg: (reg:SI 356)
        reload_reg_rtx: (reg:SI 2 cx)

I don't see a good and simple fix for general case (just fixing reg_overlap_mentioned_for_reload_p would wrong and dangerous) for this
code when inheritance is used and there are dependencies for reload 2
and 3 in this case.
Comment 16 Ulrich Weigand 2010-09-06 16:57:50 UTC
(In reply to comment #15)
> Ulrih, I've just wanted to post the following when I found that you already
> posted analogous conclusion.  I should have been on CC to see your comment
> right away.  The problem is really fundamental.  Code for
> merge_assigned_reloads ignores inheritance (and dependencies between reloads
> because of inheritance) at all.  Here is my post wanted to add.

I just noticed that even in the complete absence of reload inheritance, the allocate_reload_reg routine performs free_for_value_p checks, and therefore implicitly takes reload ordering into account.  This seems to imply that even if we'd do merge_assigned_reloads only if no inheritance has taken place, we'd still have a problem.

Does anybody have any idea how much merge_assigned_reloads actually contributes to performance on i386, in particular now that we have a bit more post-reload optimizers that potentially clear up duplicate code of the type generated by unmerged reloads?
Comment 17 Vladimir Makarov 2010-09-07 18:03:28 UTC
(In reply to comment #16)
> 
> 
> I just noticed that even in the complete absence of reload inheritance, the
> allocate_reload_reg routine performs free_for_value_p checks, and therefore
> implicitly takes reload ordering into account.  This seems to imply that even
> if we'd do merge_assigned_reloads only if no inheritance has taken place, we'd
> still have a problem.
> 
> Does anybody have any idea how much merge_assigned_reloads actually contributes
> to performance on i386, in particular now that we have a bit more post-reload
> optimizers that potentially clear up duplicate code of the type generated by
> unmerged reloads?
> 

I am thinking in the same direction.  merge_assign_reloads is dated by 1993.  Since then it was not practically changed.  I guess postreload can remove unecessary loads if it is generated without merge_assigned_reload.

I've tried to compile SPEC2000 by gcc-4.4 with and without merge_assigned_reloads.  I did not find any code difference.  I've tried a lot of other programs with the same result.  The single difference in code I found exists on this test case.

So I'd remove merge_assigned_reloads at all as it became obsolete long ago.

Comment 18 Ulrich Weigand 2010-09-07 19:18:14 UTC
(In reply to comment #17)
> I am thinking in the same direction.  merge_assign_reloads is dated by 1993. 
> Since then it was not practically changed.  I guess postreload can remove
> unecessary loads if it is generated without merge_assigned_reload.
> 
> I've tried to compile SPEC2000 by gcc-4.4 with and without
> merge_assigned_reloads.  I did not find any code difference.  I've tried a lot
> of other programs with the same result.  The single difference in code I found
> exists on this test case.

Thanks, that's certainly good to know!

> So I'd remove merge_assigned_reloads at all as it became obsolete long ago.

I agree, this seems the best way forward.

Comment 19 Vladimir Makarov 2010-09-09 18:36:45 UTC
Subject: Bug 45312

Author: vmakarov
Date: Thu Sep  9 18:36:26 2010
New Revision: 164116

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164116
Log:
2010-09-09  Vladimir Makarov  <vmakarov@redhat.com>

	PR middle-end/45312
	* reload1.c (merge_assigned_reloads): Remove.
	(reload_as_needed): Don't call it.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/reload1.c

Comment 20 Vladimir Makarov 2010-09-09 18:37:28 UTC
Subject: Bug 45312

Author: vmakarov
Date: Thu Sep  9 18:37:17 2010
New Revision: 164117

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164117
Log:
2010-09-09  Vladimir Makarov  <vmakarov@redhat.com>

	PR middle-end/45312
	* reload1.c (merge_assigned_reloads): Remove.
	(reload_as_needed): Don't call it.


Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/reload1.c

Comment 21 Vladimir Makarov 2010-09-09 18:38:11 UTC
Subject: Bug 45312

Author: vmakarov
Date: Thu Sep  9 18:37:58 2010
New Revision: 164118

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164118
Log:
2010-09-09  Vladimir Makarov  <vmakarov@redhat.com>

	PR middle-end/45312
	* reload1.c (merge_assigned_reloads): Remove.
	(reload_as_needed): Don't call it.


Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/reload1.c

Comment 22 Serge Belyshev 2010-09-13 17:54:27 UTC
Fixed everywhere but on 4.3 branch.

Maybe commit the patch there too?
Comment 23 Vladimir Makarov 2010-09-14 15:46:08 UTC
(In reply to comment #22)
> Fixed everywhere but on 4.3 branch.
> 
> Maybe commit the patch there too?
> 

I think there is a smaller probability that this bug occurs in gcc4.3 because it is based on the old RA.  IRA uses hard registers more effectively and frequently than the old RA and therefore it stresses the reload pass more and as the result reload bugs occur more frequently with IRA.

But if it is present in gcc4.3, the patch should be applied too.  Even more I guess that the patch is pretty safe and could be applied to gcc4.3 in any case.

If you want you could apply it to gcc4.3-branch.