The attached testcase, from gcc's own gimplify.c, is optimized poorly at the tree stage. Initial RTL has ;; t_1->gsbase.plf = D.2014_8; (insn 8 6 9 (set (reg:QI 65) (mem/s:QI (plus:SI (reg/v/f:SI 58 [ t ]) (const_int 1 [0x1])) [0+1 S1 A8])) gimplify.i:48 -1 (nil)) (insn 9 8 10 (parallel [ (set (reg:QI 64) (lshiftrt:QI (reg:QI 65) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (expr_list:REG_EQUAL (lshiftrt:QI (mem/s:QI (plus:SI (reg/v/f:SI 58 [ t ]) (const_int 1 [0x1])) [0+1 S1 A8]) (const_int 3 [0x3])) (nil))) (insn 10 9 11 (parallel [ (set (reg:QI 66) (and:QI (reg:QI 64) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (nil)) (insn 11 10 13 (parallel [ (set (reg:QI 67) (ior:QI (reg:QI 66) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (nil)) (insn 13 11 14 (parallel [ (set (reg:QI 69) (and:QI (reg:QI 67) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (nil)) (insn 14 13 15 (parallel [ (set (reg:QI 70) (ashift:QI (reg:QI 69) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (nil)) (insn 15 14 16 (set (reg:QI 71) (mem/s/j:QI (plus:SI (reg/v/f:SI 58 [ t ]) (const_int 1 [0x1])) [0+1 S1 A8])) gimplify.i:48 -1 (nil)) (insn 16 15 17 (parallel [ (set (reg:QI 72) (and:QI (reg:QI 71) (const_int -25 [0xffffffe7]))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (nil)) (insn 17 16 18 (parallel [ (set (reg:QI 73) (ior:QI (reg:QI 72) (reg:QI 70))) (clobber (reg:CC 17 flags)) ]) gimplify.i:48 -1 (nil)) (insn 18 17 0 (set (mem/s/j:QI (plus:SI (reg/v/f:SI 58 [ t ]) (const_int 1 [0x1])) [0+1 S1 A8]) (reg:QI 73)) gimplify.i:48 -1 (nil)) This is not optimized by anything unless the combiner is extended to handle four insns. This PR should stay open even if the combiner is improved, until the tree optimizers handle this better.
Created attachment 21427 [details] A testcase which shows the problem.
Confirmed.
D.2047_5 = t_1->gsbase.plf; D.2048_6 = (unsigned char) D.2047_5; D.2049_7 = D.2048_6 | 1; D.2050_8 = (<unnamed-unsigned:2>) D.2049_7; t_1->gsbase.plf = D.2050_8; It could be optimized to just: D.2047_5 = t_1->gsbase.plf; D.2047_6 = D.2047_5 | 1 t_1->gsbase.plf = D.2050_6; But I will note that on MIPS64-Linux-gnu we get pretty good RTL at the beginning due to zero_extract: (insn 9 8 10 t.c:48 (set (reg:SI 201) (mem/s:SI (reg/v/f:SI 193 [ t ]) [0+0 S4 A32])) -1 (nil)) (insn 10 9 11 t.c:48 (set (reg:DI 203) (zero_extract:DI (subreg:DI (reg:SI 201) 0) (const_int 2 [0x2]) (const_int 19 [0x13]))) -1 (nil)) (insn 11 10 12 t.c:48 (set (reg:QI 204) (truncate:QI (reg:DI 203))) -1 (nil)) (insn 12 11 13 t.c:48 (set (reg:SI 205) (ior:SI (subreg:SI (reg:QI 204) 0) (const_int 1 [0x1]))) -1 (nil)) (insn 13 12 14 t.c:48 (set (reg:SI 206) (mem/s/j:SI (reg/v/f:SI 193 [ t ]) [0+0 S4 A32])) -1 (nil)) (insn 14 13 15 t.c:48 (set (reg:DI 207) (subreg:DI (reg:SI 206) 0)) -1 (nil)) (insn 15 14 16 t.c:48 (set (zero_extract:DI (reg:DI 207) (const_int 2 [0x2]) (const_int 19 [0x13])) (subreg:DI (reg:SI 205) 0)) -1 (nil)) (insn 16 15 17 t.c:48 (set (reg:SI 206) (truncate:SI (reg:DI 207))) -1 (nil)) (insn 17 16 0 t.c:48 (set (mem/s/j:SI (reg/v/f:SI 193 [ t ]) [0+0 S4 A32]) (reg:SI 206)) -1 (nil))
;; t_1->gsbase.plf = D.2722_6; (insn 7 6 8 (set (reg:QI 63) (const_int 1 [0x1])) t.c:48 -1 (nil)) (insn 8 7 9 (parallel [ (set (reg:QI 62) (ashift:QI (reg:QI 63) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) t.c:48 -1 (nil)) (insn 9 8 0 (parallel [ (set (mem/s/j:QI (plus:SI (reg/v/f:SI 59 [ t ]) (const_int 1 [0x1])) [0+1 S1 A8]) (ior:QI (mem/s/j:QI (plus:SI (reg/v/f:SI 59 [ t ]) (const_int 1 [0x1])) [0+1 S1 A8]) (reg:QI 62))) (clobber (reg:CC 17 flags)) ]) t.c:48 -1 (nil))