This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Do not use a REG_EQUAL value for gcse/cprop if SRC is a REG
On Sun, May 25, 2008 at 1:14 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sun, May 25, 2008 at 11:59 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>>> Bootstrapped&tested on powerpc-linux-gnu (32 and 64 bit). OK for trunk?
>>
>> This is ok.
>
> Thanks. Could you commit it for me?
Will do later.
> Of course I have also looked at the remaining things that a second PRE
> GCSE pass still catches. One simple test case (again for powerpc) is
> this:
>
> ======================
> struct cpp_dir
> {
> struct cpp_dir *next;
> };
>
> static struct cpp_dir *tails[4];
>
> void
> add_cpp_dir_path (struct cpp_dir *p, int chain)
> {
> if (tails[chain])
> tails[chain]->next = p;
> tails[chain] = p;
> }
> ======================
>
>
> After the GIMPLE optimizations, the function looks like this (from the
> final_cleanup dump):
> ======================
> ;; Function add_cpp_dir_path (add_cpp_dir_path)
>
> add_cpp_dir_path (struct cpp_dir * p, int chain)
> {
> struct cpp_dir * D.1256;
>
> <bb 2>:
> D.1256 = tails[chain];
> if (D.1256 != 0B)
> goto <bb 3>;
> else
> goto <bb 4>;
>
> <bb 3>:
> D.1256->next = p;
>
> <bb 4>:
> tails[chain] = p;
> return;
>
> }
> ======================
>
> The store "tails[chain] = p;" shows the missed optimization. Before
> the first gcse pass, the RTL looks like this:
>
> ======================
> 5 NOTE_INSN_BASIC_BLOCK
> 2 r120:SI=%3:SI
> REG_DEAD: %3:SI
> 3 r121:SI=%4:SI
> REG_DEAD: %4:SI
> 4 NOTE_INSN_FUNCTION_BEG
> 7 r123:SI=high(`*.LANCHOR0')
> 8 r122:SI=r123:SI+low(`*.LANCHOR0')
> REG_DEAD: r123:SI
> REG_EQUAL: `*.LANCHOR0'
> 9 r124:SI=r121:SI<<0x2
> 10 r125:SI=r122:SI+r124:SI
> REG_DEAD: r124:SI
> REG_DEAD: r122:SI
> 11 r119:SI=[r125:SI]
> REG_DEAD: r125:SI
> 12 r126:CC=cmp(r119:SI,0x0)
> 13 pc={(r126:CC==0x0)?L16:pc}
> REG_DEAD: r126:CC
> REG_BR_PROB: 0x3f6
> 14 NOTE_INSN_BASIC_BLOCK
> 15 [r119:SI]=r120:SI
> REG_DEAD: r119:SI
> L16:
> 17 NOTE_INSN_BASIC_BLOCK
> 18 r128:SI=high(`*.LANCHOR0')
> 19 r127:SI=r128:SI+low(`*.LANCHOR0')
> REG_DEAD: r128:SI
> REG_EQUAL: `*.LANCHOR0'
> 20 r129:SI=r121:SI<<0x2
> REG_DEAD: r121:SI
> 21 r130:SI=r127:SI+r129:SI
> REG_DEAD: r129:SI
> REG_DEAD: r127:SI
> 22 [r130:SI]=r120:SI
> REG_DEAD: r130:SI
> REG_DEAD: r120:SI
> ======================
>
> Insns 18 and 19 are fully redundant. Note that insn 18 actually
> wouldn't have to be eliminated if insn 19 is eliminated. But for insn
> 19 PRE GCSE performs PRE on the REG_EQUAL note. Had insns 18 and 19
> been partially redundant, the value of insn 18 would have been
> inserted twice (once for insn 18 itself and once when multiple insns
> are generated from the REG_EQUAL note on insn 19). Anyway, insn 19 is
> the interesting case, because it exposes the "secondary" redundant
> expression for the second PRE GCSE pass. After the first PRE GCSE
> pass, the RTL looks as below. The value of insn 19 is loaded into r133
> in insn 8 :
>
> ======================
> 5 NOTE_INSN_BASIC_BLOCK
> 2 r120:SI=%3:SI
> REG_DEAD: %3:SI
> 3 r121:SI=%4:SI
> REG_DEAD: %4:SI
> 4 NOTE_INSN_FUNCTION_BEG
> 7 r131:SI=high(`*.LANCHOR0')
> 8 r133:SI=r131:SI+low(`*.LANCHOR0')
> REG_DEAD: r131:SI
> REG_EQUAL: `*.LANCHOR0'
> 9 r132:SI=r121:SI<<0x2
> REG_DEAD: r121:SI
> 10 r125:SI=r133:SI+r132:SI
> 11 r119:SI=[r125:SI]
> REG_DEAD: r125:SI
> 12 r126:CC=cmp(r119:SI,0x0)
> 13 pc={(r126:CC==0x0)?L16:pc}
> REG_DEAD: r126:CC
> REG_BR_PROB: 0x3f6
> 14 NOTE_INSN_BASIC_BLOCK
> 15 [r119:SI]=r120:SI
> REG_DEAD: r119:SI
> L16:
> 17 NOTE_INSN_BASIC_BLOCK
> 21 r130:SI=r133:SI+r132:SI
> REG_DEAD: r133:SI
> REG_DEAD: r132:SI
> REG_EQUAL: r132:SI+`*.LANCHOR0'
> 22 [r130:SI]=r120:SI
> REG_DEAD: r130:SI
> REG_DEAD: r120:SI
> ======================
>
> Now, the expression in insn 21 is also fully redundant. It was
> already fully redundant before, from a values point of view, but since
> the expressions in insn 10 and insn 21 are not lexically equivalent
> before the first PRE GCSE pass, PRE GCSE cannot detect this
> redundancy. That's why I call it a "secondary redundancy": It is only
> exposed after eliminating other redundancies. After the first PRE
> GCSE pass and the cleanups that CPROP/CSE perform, the redundancy is
> eliminated in the second PRE GCSE pass, resulting in the following
> RTL:
>
> ======================
> 5 NOTE_INSN_BASIC_BLOCK
> 2 r120:SI=%3:SI
> REG_DEAD: %3:SI
> 3 r121:SI=%4:SI
> REG_DEAD: %4:SI
> 4 NOTE_INSN_FUNCTION_BEG
> 7 r131:SI=high(`*.LANCHOR0')
> 8 r133:SI=r131:SI+low(`*.LANCHOR0')
> REG_DEAD: r131:SI
> REG_EQUAL: `*.LANCHOR0'
> 9 r132:SI=r121:SI<<0x2
> REG_DEAD: r121:SI
> 10 r134:SI=r133:SI+r132:SI
> REG_DEAD: r133:SI
> REG_DEAD: r132:SI
> 11 r119:SI=[r134:SI]
> 12 r126:CC=cmp(r119:SI,0x0)
> 13 pc={(r126:CC==0x0)?L16:pc}
> REG_DEAD: r126:CC
> REG_BR_PROB: 0x3f6
> 14 NOTE_INSN_BASIC_BLOCK
> 15 [r119:SI]=r120:SI
> REG_DEAD: r119:SI
> L16:
> 17 NOTE_INSN_BASIC_BLOCK
> 22 [r134:SI]=r120:SI
> REG_DEAD: r134:SI
> REG_DEAD: r120:SI
> ======================
>
> GCC as-is (unpatched) does not fully optimize the test case.
>
> Can you think of a way to expose this redundancy at the tree level
> somehow? It'd be nice if we could just CSE the address of
> "tails[chain]" on GIMPLE...
We would need to lower the memory accesses exposing address arithmetic. On
the memref branch you see initial IL like
chain.0 = chain;
D.1187 = IDX <0 + chain.0 * 4>;
D.1183 = MEM <struct cpp_dir * {3}, &tails + D.1187>;
if (D.1183 != 0B) goto <D.1185>; else goto <D.1186>;
<D.1185>:;
chain.0 = chain;
D.1188 = IDX <0 + chain.0 * 4>;
D.1183 = MEM <struct cpp_dir * {3}, &tails + D.1188>;
IMEM <struct cpp_dir * {3}, D.1183> = p;
<D.1186>:;
chain.0 = chain;
D.1189 = IDX <0 + chain.0 * 4>;
MEM <struct cpp_dir * {3}, &tails + D.1189> = p;
with the optimized IL
<bb 2>:
D.1187_3 = IDX <0 + chain_1(D) * 4>;
D.1183_4 = MEM <struct cpp_dir * {3}, &tails + D.1187_3>;
if (D.1183_4 != 0B)
goto <bb 3>;
else
goto <bb 5>;
<bb 5>:
goto <bb 4>;
<bb 3>:
IMEM <struct cpp_dir * {3}, D.1183_4> = p_8(D);
<bb 4>:
MEM <struct cpp_dir * {3}, &tails + D.1187_3> = p_8(D);
and i686 asm
add_cpp_dir_path:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %edx
movl 8(%ebp), %ecx
movl tails(,%edx,4), %eax
testl %eax, %eax
je .L2
movl %ecx, (%eax)
.L2:
movl %ecx, tails(,%edx,4)
popl %ebp
ret
which is the same as without the address lowering. I don't know if it is worth
CSEing tails(,%edx,4) - but at least RTL CSE doesn't seem to decompose
the
(insn 17 16 0 4 t.i:13 (set (mem/s/f/j:SI (plus:SI (reg/f:SI 63)
(reg:SI 58 [ D.1187 ])) [3 tails S4 A32])
(reg/v/f:SI 60 [ p ])) 41 {*movsi_1} (nil))
to CSE the plus therein. (it interestingly also doesn't CSE the symbol_ref)
Richard.