This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Do not use a REG_EQUAL value for gcse/cprop if SRC is a REG


On Sun, May 25, 2008 at 1:14 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sun, May 25, 2008 at 11:59 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>>> Bootstrapped&tested on powerpc-linux-gnu (32 and 64 bit).  OK for trunk?
>>
>> This is ok.
>
> Thanks.  Could you commit it for me?

Will do later.

> Of course I have also looked at the remaining things that a second PRE
> GCSE pass still catches.  One simple test case (again for powerpc) is
> this:
>
> ======================
> struct cpp_dir
> {
>  struct cpp_dir *next;
> };
>
> static struct cpp_dir *tails[4];
>
> void
> add_cpp_dir_path (struct cpp_dir *p, int chain)
> {
>  if (tails[chain])
>    tails[chain]->next = p;
>  tails[chain] = p;
> }
> ======================
>
>
> After the GIMPLE optimizations, the function looks like this (from the
> final_cleanup dump):
> ======================
> ;; Function add_cpp_dir_path (add_cpp_dir_path)
>
> add_cpp_dir_path (struct cpp_dir * p, int chain)
> {
>  struct cpp_dir * D.1256;
>
> <bb 2>:
>  D.1256 = tails[chain];
>  if (D.1256 != 0B)
>    goto <bb 3>;
>  else
>    goto <bb 4>;
>
> <bb 3>:
>  D.1256->next = p;
>
> <bb 4>:
>  tails[chain] = p;
>  return;
>
> }
> ======================
>
> The store "tails[chain] = p;"  shows the missed optimization.  Before
> the first gcse pass, the RTL looks like this:
>
> ======================
>    5 NOTE_INSN_BASIC_BLOCK
>    2 r120:SI=%3:SI
>      REG_DEAD: %3:SI
>    3 r121:SI=%4:SI
>      REG_DEAD: %4:SI
>    4 NOTE_INSN_FUNCTION_BEG
>    7 r123:SI=high(`*.LANCHOR0')
>    8 r122:SI=r123:SI+low(`*.LANCHOR0')
>      REG_DEAD: r123:SI
>      REG_EQUAL: `*.LANCHOR0'
>    9 r124:SI=r121:SI<<0x2
>   10 r125:SI=r122:SI+r124:SI
>      REG_DEAD: r124:SI
>      REG_DEAD: r122:SI
>   11 r119:SI=[r125:SI]
>      REG_DEAD: r125:SI
>   12 r126:CC=cmp(r119:SI,0x0)
>   13 pc={(r126:CC==0x0)?L16:pc}
>      REG_DEAD: r126:CC
>      REG_BR_PROB: 0x3f6
>   14 NOTE_INSN_BASIC_BLOCK
>   15 [r119:SI]=r120:SI
>      REG_DEAD: r119:SI
> L16:
>   17 NOTE_INSN_BASIC_BLOCK
>   18 r128:SI=high(`*.LANCHOR0')
>   19 r127:SI=r128:SI+low(`*.LANCHOR0')
>      REG_DEAD: r128:SI
>      REG_EQUAL: `*.LANCHOR0'
>   20 r129:SI=r121:SI<<0x2
>      REG_DEAD: r121:SI
>   21 r130:SI=r127:SI+r129:SI
>      REG_DEAD: r129:SI
>      REG_DEAD: r127:SI
>   22 [r130:SI]=r120:SI
>      REG_DEAD: r130:SI
>      REG_DEAD: r120:SI
> ======================
>
> Insns 18 and 19 are fully redundant.   Note that insn 18 actually
> wouldn't have to be eliminated if insn 19 is eliminated.  But for insn
> 19 PRE GCSE performs PRE on the REG_EQUAL note.  Had insns 18 and 19
> been partially redundant, the value of insn 18 would have been
> inserted twice (once for insn 18 itself and once when multiple insns
> are generated from the REG_EQUAL note on insn 19).  Anyway, insn 19 is
> the interesting case, because it exposes the "secondary" redundant
> expression for the second PRE GCSE pass.  After the first PRE GCSE
> pass, the RTL looks as below. The value of insn 19 is loaded into r133
> in insn 8 :
>
> ======================
>    5 NOTE_INSN_BASIC_BLOCK
>    2 r120:SI=%3:SI
>      REG_DEAD: %3:SI
>    3 r121:SI=%4:SI
>      REG_DEAD: %4:SI
>    4 NOTE_INSN_FUNCTION_BEG
>    7 r131:SI=high(`*.LANCHOR0')
>    8 r133:SI=r131:SI+low(`*.LANCHOR0')
>      REG_DEAD: r131:SI
>      REG_EQUAL: `*.LANCHOR0'
>    9 r132:SI=r121:SI<<0x2
>      REG_DEAD: r121:SI
>   10 r125:SI=r133:SI+r132:SI
>   11 r119:SI=[r125:SI]
>      REG_DEAD: r125:SI
>   12 r126:CC=cmp(r119:SI,0x0)
>   13 pc={(r126:CC==0x0)?L16:pc}
>      REG_DEAD: r126:CC
>      REG_BR_PROB: 0x3f6
>   14 NOTE_INSN_BASIC_BLOCK
>   15 [r119:SI]=r120:SI
>      REG_DEAD: r119:SI
> L16:
>   17 NOTE_INSN_BASIC_BLOCK
>   21 r130:SI=r133:SI+r132:SI
>      REG_DEAD: r133:SI
>      REG_DEAD: r132:SI
>      REG_EQUAL: r132:SI+`*.LANCHOR0'
>   22 [r130:SI]=r120:SI
>      REG_DEAD: r130:SI
>      REG_DEAD: r120:SI
> ======================
>
> Now, the expression in insn 21 is also fully redundant.  It was
> already fully redundant before, from a values point of view, but since
> the expressions in insn 10 and insn 21 are not lexically equivalent
> before the first PRE GCSE pass, PRE GCSE cannot detect this
> redundancy.  That's why I call it a "secondary redundancy": It is only
> exposed after eliminating other redundancies.  After the first PRE
> GCSE pass and the cleanups that CPROP/CSE perform, the redundancy is
> eliminated in the second PRE GCSE pass, resulting in the following
> RTL:
>
> ======================
>    5 NOTE_INSN_BASIC_BLOCK
>    2 r120:SI=%3:SI
>      REG_DEAD: %3:SI
>    3 r121:SI=%4:SI
>      REG_DEAD: %4:SI
>    4 NOTE_INSN_FUNCTION_BEG
>    7 r131:SI=high(`*.LANCHOR0')
>    8 r133:SI=r131:SI+low(`*.LANCHOR0')
>      REG_DEAD: r131:SI
>      REG_EQUAL: `*.LANCHOR0'
>    9 r132:SI=r121:SI<<0x2
>      REG_DEAD: r121:SI
>   10 r134:SI=r133:SI+r132:SI
>      REG_DEAD: r133:SI
>      REG_DEAD: r132:SI
>   11 r119:SI=[r134:SI]
>   12 r126:CC=cmp(r119:SI,0x0)
>   13 pc={(r126:CC==0x0)?L16:pc}
>      REG_DEAD: r126:CC
>      REG_BR_PROB: 0x3f6
>   14 NOTE_INSN_BASIC_BLOCK
>   15 [r119:SI]=r120:SI
>      REG_DEAD: r119:SI
> L16:
>   17 NOTE_INSN_BASIC_BLOCK
>   22 [r134:SI]=r120:SI
>      REG_DEAD: r134:SI
>      REG_DEAD: r120:SI
> ======================
>
> GCC as-is (unpatched) does not fully optimize the test case.
>
> Can you think of a way to expose this redundancy at the tree level
> somehow?  It'd be nice if we could just CSE the address of
> "tails[chain]" on GIMPLE...

We would need to lower the memory accesses exposing address arithmetic.  On
the memref branch you see initial IL like

  chain.0 = chain;
  D.1187 = IDX <0 + chain.0 * 4>;
  D.1183 = MEM <struct cpp_dir * {3}, &tails + D.1187>;
  if (D.1183 != 0B) goto <D.1185>; else goto <D.1186>;
  <D.1185>:;
  chain.0 = chain;
  D.1188 = IDX <0 + chain.0 * 4>;
  D.1183 = MEM <struct cpp_dir * {3}, &tails + D.1188>;
  IMEM <struct cpp_dir * {3}, D.1183> = p;
  <D.1186>:;
  chain.0 = chain;
  D.1189 = IDX <0 + chain.0 * 4>;
  MEM <struct cpp_dir * {3}, &tails + D.1189> = p;

with the optimized IL

<bb 2>:
  D.1187_3 = IDX <0 + chain_1(D) * 4>;
  D.1183_4 = MEM <struct cpp_dir * {3}, &tails + D.1187_3>;
  if (D.1183_4 != 0B)
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  goto <bb 4>;

<bb 3>:
  IMEM <struct cpp_dir * {3}, D.1183_4> = p_8(D);

<bb 4>:
  MEM <struct cpp_dir * {3}, &tails + D.1187_3> = p_8(D);

and i686 asm

add_cpp_dir_path:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %edx
        movl    8(%ebp), %ecx
        movl    tails(,%edx,4), %eax
        testl   %eax, %eax
        je      .L2
        movl    %ecx, (%eax)
.L2:
        movl    %ecx, tails(,%edx,4)
        popl    %ebp
        ret

which is the same as without the address lowering.  I don't know if it is worth
CSEing tails(,%edx,4) - but at least RTL CSE doesn't seem to decompose
the

(insn 17 16 0 4 t.i:13 (set (mem/s/f/j:SI (plus:SI (reg/f:SI 63)
                (reg:SI 58 [ D.1187 ])) [3 tails S4 A32])
        (reg/v/f:SI 60 [ p ])) 41 {*movsi_1} (nil))

to CSE the plus therein.  (it interestingly also doesn't CSE the symbol_ref)

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]