Bug 43494

Summary: [4.4/4.5/4.6 Regression] Overlooked dependency causes wrong scheduling, wrong code
Product: gcc Reporter: Kaveh Ghazi <ghazi>
Component: rtl-optimizationAssignee: Alexandre Oliva <aoliva>
Status: RESOLVED FIXED    
Severity: normal CC: amonakov, aoliva, bernds, ebotcazou, gcc-bugs, jakub, rguenth, sje, steven
Priority: P2 Keywords: alias, wrong-code
Version: 4.4.4   
Target Milestone: 4.4.6   
Host: Target: ia64-unknown-linux-gnu
Build: Known to work: 4.3.5
Known to fail: 4.4.4, 4.5.0, 4.6.0 Last reconfirmed: 2010-07-13 09:43:15
Attachments: debug log
Updated patch, still awaiting review

Description Kaveh Ghazi 2010-03-23 17:34:38 UTC
When running the testsuite on ia64-unknown-linux-gnu with extra passes containing -fpic/-fPIC I get the following additional error on the 4.4 branch:

FAIL: gcc.c-torture/execute/vector-2.c execution,  -O2
FAIL: gcc.c-torture/execute/vector-2.c execution,  -Os

on 4.5 trunk I get:

FAIL: gcc.c-torture/execute/vector-2.c execution,  -O2
Comment 1 Kaveh Ghazi 2010-03-23 17:36:53 UTC
4.4.4 ia64 results with error:
http://gcc.gnu.org/ml/gcc-testresults/2010-03/msg01631.html

4.5.0 ia64 results with error:
http://gcc.gnu.org/ml/gcc-testresults/2010-03/msg01997.html
Comment 2 Andrew Pinski 2010-03-23 18:22:57 UTC
This sounds like an aliasing issue which is only exposed on ia64.
Comment 3 Steven Bosscher 2010-07-13 09:43:15 UTC
I'll have a look.
Comment 4 Steven Bosscher 2010-07-17 23:27:59 UTC
Confirmed with r162278. Scheduling after reload (which is bundling for the ia64 backend) seems to cause this:

$ ./xgcc -B. -O2  vector-2.c -fno-inline -fPIC ; echo Testing ; ./a.out 
Testing
Aborted
$ ./xgcc -B. -O2 -fno-schedule-insns2 vector-2.c -fno-inline -fPIC -fschedule-insns ; echo Testing ; ./a.out
Testing
$ 

Goes away with selective scheduling:
$ ./xgcc -B. -O2 -fselective-scheduling2  vector-2.c -fno-inline -fPIC ; echo Testing ; ./a.out
Testing
$ 

Makes this a scheduler bug.
Comment 5 Steven Bosscher 2010-07-18 17:21:35 UTC
It looks like a store is scheduled wrong.

Slightly reduced test case:

---------------------- 8< --------------------#define vector __attribute__((vector_size(16) ))

vector int f1(vector int t, int a)
{
 ((int*)&t)[1] = a;
 return t;
}
int main(void)
{
  vector int a = {0, 0, 0, 0};
  vector int c = {0, 1, 0, 0};
  vector int a0;
  a0 = f1(a, 1);
  if (memcmp (&a0, &c, sizeof(a0)))
    __builtin_abort ();
  return 0;
}
---------------------- 8< --------------------


Compiled at "-O2 -fno-inline -fpic" this will abort. The assembler output for f1 is this:

     6          .global f1#
     7          .type   f1#, @function
     8          .proc f1#
     9  f1:
    10          .prologue
    11          .body
    12          .mmi
    13          mov r15 = r12
    14          nop 0
    15          mov r14 = r12
    16          ;;
    17          .mmi
    18          st8 [r15] = r32, 8
    19          ;;
    20          st8 [r15] = r33, -4
    21          nop 0
    22          .mii
    23          ld8 r8 = [r14], 8
    24          nop 0
    25          ;;
    26          nop 0
    27          .mmb
    28          ld8 r9 = [r14]
    29          st4 [r15] = r34
    30          br.ret.sptk.many b0
    31          .endp f1#


The store on line 29 looked suspicious because the three stores (lines 18, 20, and 29) are together in the unscheduled version (i.e. with -fno-schedule-insns2, lines 15, 17, and 19):

     6          .global f1#
     7          .type   f1#, @function
     8          .proc f1#
     9  f1:
    10          .prologue
    11          .body
    12          mov r15 = r12
    13          mov r14 = r12
    14          ;;
    15          st8 [r15] = r32, 8
    16          ;;
    17          st8 [r15] = r33, -4
    18          ;;
    19          st4 [r15] = r34
    20          ld8 r8 = [r14], 8
    21          ;;
    22          ld8 r9 = [r14]
    23          br.ret.sptk.many b0
    24          ;;
    25          .endp f1#


The abort goes away if I hack the assembly manually with an extra bundle to move the three stores back together:

        .global f1#                             .global f1#
        .type   f1#, @function                  .type   f1#, @function
        .proc f1#                               .proc f1#
f1:                                     f1:
        .prologue                               .prologue
        .body                                   .body   
        .mmi                                    .mmi
        mov r15 = r12                           mov r15 = r12
        nop 0                                   nop 0   
        mov r14 = r12                           mov r14 = r12
        ;;                                      ;;
        .mmi                                    .mmi
        st8 [r15] = r32, 8                      st8 [r15] = r32, 8
        ;;                                      ;;
        st8 [r15] = r33, -4                     st8 [r15] = r33, -4 
        nop 0                                   nop 0   
                                      >         ;;      
                                      >         .mii    
                                      >         st4 [r15] = r34
                                      >         nop 0   
                                      >         nop 0   
        .mii                                    .mii
        ld8 r8 = [r14], 8                       ld8 r8 = [r14], 8
        nop 0                                   nop 0   
        ;;                                      ;;
        nop 0                                   nop 0   
        .mmb                                    .mmb
        ld8 r9 = [r14]                          ld8 r9 = [r14] 
        st4 [r15] = r34               |         nop 0   
        br.ret.sptk.many b0                     br.ret.sptk.many b0
        .endp f1#                               .endp f1#


I am not an ia64 expert, but I see no reason why moving the store is a bad idea. Will have to look at the RTL, and if that doesn't help I'll leave this to a target specialist.
Comment 6 Steven Bosscher 2010-07-18 17:29:47 UTC
Smaller hand-changed assembly without new bundle (left aborts, right does not):

        .global f1#                             .global f1#
        .type   f1#, @function                  .type   f1#, @function
        .proc f1#                               .proc f1#
f1:                                     f1:
        .prologue                               .prologue
        .body                                   .body
        .mmi                                    .mmi
        mov r15 = r12                           mov r15 = r12
        nop 0                                   nop 0
        mov r14 = r12                           mov r14 = r12
        ;;                                      ;;
        .mmi                                    .mmi
        st8 [r15] = r32, 8                      st8 [r15] = r32, 8
        ;;                                      ;;
        st8 [r15] = r33, -4                     st8 [r15] = r33, -4
        nop 0                                   nop 0
        .mii                          |         ;;
                                      >         .mmb
                                      >         st4 [r15] = r34
                                      >         ;;
        ld8 r8 = [r14], 8                       ld8 r8 = [r14], 8
        nop 0                         <
        ;;                                      ;;
        nop 0                                   nop 0
        .mmb                                    .mmb
        ld8 r9 = [r14]                          ld8 r9 = [r14]
        st4 [r15] = r34               |         nop 0
        br.ret.sptk.many b0                     br.ret.sptk.many b0
        .endp f1#                               .endp f1#

The only thing that seems to matter, is that "st4 [r15] = r34" comes before "ld8 r9 = [r14]".
Comment 7 Steven Bosscher 2010-07-18 17:40:36 UTC
Ah, and since both r14 and r15 are initially copies of r12, they point to the same memory area (modulo auto-increments/decrements). So indeed an alias thing.
Comment 8 Steven Bosscher 2010-07-19 19:04:21 UTC
Offending insns that are scheduled in the wrong order:

(insn:TI 28 48 9 2 vector-2.c:7 (set (reg:DI 9 r9 [+8 ])
        (mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])) 5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 14 r14 [351])
        (nil)))

(insn 9 28 18 2 vector-2.c:5 (set (mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32])
        (reg:SI 114 r34 [ a ])) 4 {movsi_internal} (expr_list:REG_DEAD (reg:SI 114 r34 [ a ])
        (expr_list:REG_DEAD (reg/f:DI 15 r15 [343])
            (nil))))


So the MEMs are:
load from "(mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])"
store to  "(mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32])"

There is no dependency of insn 28 on insn 9, even though this is a rather obvious read-after-write dependency.

;;   ======================================================
;;   -- basic block 2 from 30 to 39 -- after reload
;;   ======================================================

;;   --------------- forward dependences: ------------

;;   --- EBB Dependences --- from bb2 to bb2
;;      insn  code    bb   dep  prio  cost   reservation
;;      ----  ----    --   ---  ----  ----   -----------
;;       30     5     2     0     3     1   2_A : 39 9 36 35
;;       29     5     2     0     2     1   2_A : 39 28 21
;;       35     5     2     1     2     1   2_M_only_um23       : 39 21 9 36
;;       36     5     2     2     1     1   2_M_only_um23       : 39 28 9
;;        9     4     2     3     0     1   2_M_only_um23       : 39
;;       21     5     2     2     1     1   2_M_only_um01       : 39 18 28
;;       28     5     2     3     0     1   2_M_only_um01       : 39 18
;;       18    -1     2     2     0     0   nothing     : 39
;;       39   334     2     8     0     0   2_B :

Comment 9 Alexander Monakov 2010-07-20 08:25:34 UTC
It's probably worth noting that the disambiguated MEMs are of different widths:

> load from "(mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])"
> store to  "(mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32])"

(btw, IIUC the above mems do not alias indeed, and the problem is in disambiguating the above store and a load from
(mem/c/i:DI (post_inc:DI (reg/f:DI 14 r14 [351])) [2 t+0 S8 A128]) )

Also, with -fno-strict-aliasing the failure vanishes.
Comment 10 Steven Bosscher 2010-07-20 13:20:08 UTC
Re. comment 9: Well, the order of *this* store and *this* load is the difference between the test case failing or passing. So I do not think the problem is between this load and another store.

Before sched2 (or actually, .mach on ia64):

"vector-2.c.213r.compgotos" (notes, NOPs, bundle markers removed):
;; Function f1 (f1)

   30 r15:DI=r12:DI
   29 r14:DI=r12:DI
   35 [r15:DI++]=r32:DI
   36 [post r15:DI+=0xfffffffffffffffc]=r33:DI
    9 [r15:DI]=r34:SI
   21 r8:DI=[r14:DI++]
   28 r9:DI=[r14:DI]
   18 use r8:TI
   39 {return;use b0:DI;}

After .sched2:
"vector-2.c.215r.mach" (notes, NOPs, bundle markers removed):
   30 r15:DI=r12:DI
   29 r14:DI=r12:DI
   35 [r15:DI++]=r32:DI
   36 [post r15:DI+=0xfffffffffffffffc]=r33:DI
   21 r8:DI=[r14:DI++]
   28 r9:DI=[r14:DI]
    9 [r15:DI]=r34:SI
   18 use r8:TI
   39 {return;use b0:DI;}

Note that the only real change in the scheduling is that insn 9 is moved after insn 21 and insn 28. insn 9 is the store "[r15:DI]=r34:SI" that later expands to the st4 instruction.
Comment 11 Steven Bosscher 2010-07-20 13:22:46 UTC
insn 28 and insn 9 in non-slim form, with assembler output:

//(insn:TI 28 48 9 2 vector-2.c:7 (set (reg:DI 9 r9 [+8 ])
//        (mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64]))
//        5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 14 r14 [351])
//        (nil)))
        ld8 r9 = [r14]  // 28   movdi_internal/5
//(insn 9 28 18 2 vector-2.c:5 (set (mem/s/j/c:SI (reg/f:DI 15 r15 [343])
//                                   [2 t+4 S4 A32])
//        (reg:SI 114 r34 [ a ])) 4 {movsi_internal}
//        (expr_list:REG_DEAD (reg:SI 114 r34 [ a ])
//          (expr_list:REG_DEAD (reg/f:DI 15 r15 [343])
//            (nil))))
        st4 [r15] = r34 // 9    movsi_internal/6

Comment 12 Steven Bosscher 2010-07-20 13:57:39 UTC
Bug 35658 is another case where a store is scheduled after a load that depends on the store. It may well be that bug 35658 and this bug are the same issue.
Comment 13 Alexander Monakov 2010-07-20 14:13:54 UTC
(In reply to comment #10)
> Re. comment 9: Well, the order of *this* store and *this* load is the
> difference between the test case failing or passing. So I do not think the
> problem is between this load and another store.

To clarify, I'm saying that the problem is in moving insn 9 (the store) past insn 21, not past insn 28.  Your simplified dumps in comment #10 support that (r15 is incremented and decremented, r14 is post-modified; thus, both insn 21 and insn 9 touch memory[r12]).  Insn 28 is not relevant to the miscompilation.
Comment 14 Steven Bosscher 2010-07-20 17:56:31 UTC
The options "-O2 -fno-inline" is enough to make the test case of comment #5 fail. This bug is not specific to -fpic / -fPIC, inlining of f1 just hides the bug for the non-PIC case.
Comment 15 Steven Bosscher 2010-07-20 18:08:17 UTC
Works with "gcc-4.3 (Debian 4.3.5-1) 4.3.5", both with -fpic and with -fno-inline.
That makes this bug a regression.  CC:RM - who happens to know a thing or two about the alias analysis code also.
Comment 16 Steven Bosscher 2010-07-21 08:22:39 UTC
OK, I think I finally understand what Alexander tried to explain, and I've annotated the code. Alexander, does this look right to you?

f1:                             // vector int f1(vector int t)
        .mmi
        mov r15 = r12           // 30: r12 = @temp1
        mov r14 = r12           // 29: r14 = @temp1
        addl r16 = 1, r0        // 34: r16 = 1
        ;;
        .mmi
        st8 [r15] = r32, 8      // 36: temp1[0:1] = t[0:1], r15=@temp[2]
        ;;
        st8 [r15] = r33, -4     // 37: temp1[2:3] = t[2:3], r15=@temp[1]
        nop 0
        .mii
        ld8 r8 = [r14], 8       // 21: r8 = temp[0:1]
        nop 0
        ;;
        nop 0
        .mmb
        ld8 r9 = [r14]          // 28: r9 = temp[2:3]
        st4 [r15] = r16         //  9: temp[1] = 1
        br.ret.sptk.many b0     // 40: return r8:r9
        .endp f1#
Comment 17 Alexander Monakov 2010-07-21 08:32:00 UTC
(In reply to comment #16)
> OK, I think I finally understand what Alexander tried to explain, and I've
> annotated the code. Alexander, does this look right to you?

Yes, thanks.
Comment 18 Steven Bosscher 2010-07-21 09:27:59 UTC
Created attachment 21277 [details]
debug log

I think the problem is that fixed_scalar_and_varying_struct_p is called with a VALUE address, i.e. not expanded. See attached debug log.
Comment 19 Steven Bosscher 2010-07-21 09:33:42 UTC
x_addr is a VALUE that has no locs:

Breakpoint 4, true_dependence (mem=0x20000000005ddf68, mem_mode=VOIDmode, x=0x20000000005ddfb0, varies=0x2000000000496720) at ../../trunk/gcc/alias.c:2330
2330      if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem))
(gdb) cont
Continuing.

Breakpoint 11, get_addr (x=0x6000000000316e28) at ../../trunk/gcc/alias.c:1726
1726      if (GET_CODE (x) != VALUE)
(gdb) up
#1  0x400000000036eb50 in true_dependence (mem=0x20000000005ddf68, mem_mode=SImode, x=0x20000000005ddfb0, varies=0x2000000000496720) at ../../trunk/gcc/alias.c:2367
2367          x_addr = get_addr (x_addr);
(gdb) down
#0  get_addr (x=0x6000000000316e28) at ../../trunk/gcc/alias.c:1726
1726      if (GET_CODE (x) != VALUE)
(gdb) next
1728      v = CSELIB_VAL_PTR (x);
(gdb)
1729      if (v)
(gdb)
1731          for (l = v->locs; l; l = l->next)
(gdb) p v->locs
$72 = (struct elt_loc_list *) 0x0
(gdb) p debug_rtx(x)
(value:DI 10:10 @0x6000000000316e28/0x6000000000316d10)
$73 = void
(gdb)

So get_addr just returns the VALUE.

Is it *ever* OK for get_addr to return a VALUE rtx?  It seems to me this should never happen, and we should assert that.
Comment 20 Steven Bosscher 2010-07-21 09:49:16 UTC
Since this bug only triggers if cselib is used, the bug affects schedule_ebbs only. The other schedulers are !use_cselib schedulers.

(Even sel-sched apparently does not use cselib, that's surprising!)

OTOH, this bug can probably be triggered with -fsched2-use-superblocks on any target.
Comment 21 Alexander Monakov 2010-07-21 10:07:26 UTC
(In reply to comment #20)
> (Even sel-sched apparently does not use cselib, that's surprising!)

Offtopic: yes, using cselib in sel-sched is not quite straightforward, since we need it to work on arbitrary regions (as I understand cselib is designed to work on EBBs).
Comment 22 Bernd Schmidt 2010-08-17 13:14:19 UTC
(In reply to comment #19)
> x_addr is a VALUE that has no locs:

That happens because it's an autoincrement, and cselib_subst_to_values just creates an empty value.

It seems to me that we simply need to add a VALUE case to rtx_varies_p.  Can you test that?
Comment 23 Steven Bosscher 2010-08-18 10:50:14 UTC
So the scheduler uses cselib to get a better view of the address, but cselib doesn't actually give a better address. And the solution is to just give up in that case? It seems to me that if cselib doesn't give a better address than XEXP(MEM,0) then the original address should be used.
Comment 24 Bernd Schmidt 2010-08-18 14:36:45 UTC
It should be possible to do better in cselib_subst_to_values - for POST_* we could look up the value of the inner expression, and for PRE_* we could probably construct a PLUS of some kind.  That would be an enhancement, but it's unrelated to the bug.
Comment 25 Jakub Jelinek 2010-08-18 19:00:14 UTC
Alex Oliva posted some patches to make cselib handle autoinc stuff.
No idea whether http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01038.html
is the latest version or if he has a newer one.
Comment 26 Bernd Schmidt 2010-08-19 13:38:47 UTC
(In reply to comment #25)
> Alex Oliva posted some patches to make cselib handle autoinc stuff.
> No idea whether http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01038.html
> is the latest version or if he has a newer one.

Doesn't look bad, but I dislike the name change to _addr variants for functions that look up not just addresses.  I think we should just fix up all the callers to pass in a mode.  I'm also not sure we need a global cselib_record_autoinc flag - why not just do that always?

Shouldn't we do something else when hashing PRE_MODIFY?
Comment 27 Alexandre Oliva 2010-09-10 23:16:00 UTC
> Shouldn't we do something else when hashing PRE_MODIFY?

I don't know, what else do you have in mind?

I'm posting an updated patch that implements your other suggestions momentarily.
Comment 28 Alexandre Oliva 2011-01-13 10:31:21 UTC
Created attachment 22957 [details]
Updated patch, still awaiting review

The patch to fix this bug, posted on Sept 21, is still awaiting review :-(

This updated version fixes just a whitespace difference that caused it to fail to apply.
Comment 29 rguenther@suse.de 2011-01-13 10:55:01 UTC
On Thu, 13 Jan 2011, aoliva at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43494
> 
> --- Comment #28 from Alexandre Oliva <aoliva at gcc dot gnu.org> 2011-01-13 10:31:21 UTC ---
> Created attachment 22957 [details]
>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22957
> Updated patch, still awaiting review
> 
> The patch to fix this bug, posted on Sept 21, is still awaiting review :-(
> 
> This updated version fixes just a whitespace difference that caused it to fail
> to apply.

I suppose pinging it on the ML is better than pinging it here.
Comment 30 Eric Botcazou 2011-01-27 10:46:35 UTC
I'll review Alexandre's patch later today.
Comment 31 Alexandre Oliva 2011-02-03 06:04:07 UTC
Author: aoliva
Date: Thu Feb  3 06:04:04 2011
New Revision: 169782

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169782
Log:
PR debug/43092
PR rtl-optimization/43494
* rtl.h (for_each_inc_dec_fn): New type.
(for_each_inc_dec): Declare.
* rtlanal.c (struct for_each_inc_dec_ops): New type.
(for_each_inc_dec_find_inc_dec): New fn.
(for_each_inc_dec_find_mem): New fn.
(for_each_inc_dec): New fn.
* dse.c (struct insn_size): Remove.
(replace_inc_dec, replace_inc_dec_mem): Remove.
(emit_inc_dec_insn_before): New fn.
(check_for_inc_dec): Use it, along with for_each_inc_dec.
(canon_address): Pass mem modes to cselib_lookup.
* cselib.h (cselib_lookup): Add memmode argument.  Adjust callers.
(cselib_lookup_from_insn): Likewise.
(cselib_subst_to_values): Likewise.
* cselib.c (find_slot_memmode): New var.
(cselib_find_slot): New fn.  Use it instead of
htab_find_slot_with_hash everywhere.
(entry_and_rtx_equal_p): Use find_slot_memmode.
(autoinc_split): New fn.
(rtx_equal_for_cselib_p): Rename and implement in terms of...
(rtx_equal_for_cselib_1): ... this.  Take memmode, pass it on.
Deal with autoinc.  Special-case recursion into MEMs.
(cselib_hash_rtx): Likewise.
(cselib_lookup_mem): Infer pmode from address mode.  Distinguish
address and MEM modes.
(cselib_subst_to_values): Add memmode, pass it on.
Deal with autoinc.
(cselib_lookup): Add memmode argument, pass it on.
(cselib_lookup_from_insn): Add memmode.
(cselib_invalidate_rtx): Discard obsolete push_operand handling.
(struct cselib_record_autoinc_data): New.
(cselib_record_autoinc_cb): New fn.
(cselib_record_sets): Use it, along with for_each_inc_dec.  Pass MEM
mode to cselib_lookup.  Reset autoinced REGs here instead of...
(cselib_process_insn): ... here.
* var-tracking.c (replace_expr_with_values, use_type): Pass MEM mode
to cselib_lookup.
(add_uses): Likewise, also to cselib_subst_to_values.
(add_stores): Likewise.
* sched-deps.c 	(add_insn_mem_dependence): Pass mode to
cselib_subst_to_values.
(sched_analyze_1, sched_analyze_2): Likewise.  Adjusted.
* gcse.c (do_local_cprop): Adjusted.
* postreload.c (reload_cse_simplify_set): Adjusted.
(reload_cse_simplify_operands): Adjusted.
* sel-sched-dump (debug_mem_addr_value): Pass mode.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cselib.c
    trunk/gcc/cselib.h
    trunk/gcc/dse.c
    trunk/gcc/gcse.c
    trunk/gcc/postreload.c
    trunk/gcc/rtl.h
    trunk/gcc/rtlanal.c
    trunk/gcc/sched-deps.c
    trunk/gcc/sel-sched-dump.c
    trunk/gcc/var-tracking.c
Comment 32 Alexandre Oliva 2011-02-03 06:07:58 UTC
Fixed
Comment 33 Steve Ellcey 2011-02-09 22:44:20 UTC
I am reopening this bug because I am getting a failure of gcc.dg/torture/vector-2.c that started when this patch was checked in.  I think this is
the same test that used to be gcc.c-torture/execute/vector-2.c.

I have a cut down test case that fails (prints 'one') when compiled with -O1 -fno-guess-branch-probability starting at r169782.  The -fno-guess-branch-probability is needed to generate code that uses auto-inc and trigger the bug.
This fails on IA64 Linux and HP-UX.  It does not require the -fpic flag.

Let me know if I should open a new bug instead of re-opening this one.

extern int printf(const char *, ...);
extern void exit(int);
struct __attribute__((packed)) B { unsigned short j : 1, k : 11; };
struct B sB;

void testB (void) {
        int i;
        unsigned int mask, v, a, r;
        struct B x;
        sB.k = 750;
        r = 750;
        x.j = sB.j = 0;
        printf("%d %d\n", (int) x.j, (int) sB.j);
        printf("%d %d\n", (int) sB.k, (int) r);
        if (x.j != sB.j || sB.k != r) printf ("one\n");
}
int
main (void)
{
  testB ();
  exit (0);
}
Comment 34 Jakub Jelinek 2011-02-09 22:51:37 UTC
Maybe you're hitting PR47614?
Comment 35 Steve Ellcey 2011-02-09 23:42:57 UTC
I think it is PR47614, I tried the first patch in that defect report and it fixed my test case.
Comment 36 Eric Botcazou 2011-02-10 06:25:05 UTC
.