[PATCH, PR43864] Gimple level duplicate block cleanup.
Richard Guenther
richard.guenther@gmail.com
Wed Jun 8 10:09:00 GMT 2011
On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wrote:
> Hi Richard,
>
> I have a patch for PR43864. The patch adds a gimple level duplicate block
> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the following
> table (%, lower is better).
>
> none pic
> thumb1 thumb2 thumb1 thumb2
> spec2000 99.9 99.9 99.8 99.8
>
> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that the
> optimizations proposed in PR20070 would fix this PR.
>
> The problem in this PR is that when compiling with -O2, the example below should
> only have one call to free. The original problem is formulated in terms of -Os,
> but currently we generate one call to free with -Os, although still not the
> smallest code possible. I'll show here the -O2 case, since that's similar to the
> original PR.
>
> #include <stdio.h>
> void foo (char*, FILE*);
> char* hprofStartupp(char *outputFileName, char *ctx)
> {
> char fileName[1000];
> FILE *fp;
> sprintf(fileName, outputFileName);
> if (access(fileName, 1) == 0) {
> free(ctx);
> return 0;
> }
>
> fp = fopen(fileName, 0);
> if (fp == 0) {
> free(ctx);
> return 0;
> }
>
> foo(outputFileName, fp);
>
> return ctx;
> }
>
> AFAIU, there are 2 complementary methods of rtl optimizations proposed in PR20070.
> - Merging 2 blocks which are identical expect for input registers, by using a
> conditional move to choose between the different input registers.
> - Merging 2 blocks which have different local registers, by ignoring those
> differences
>
> Blocks .L6 and.L7 have no difference in local registers, but they have a
> difference in input registers: r3 and r1. Replacing the move to r5 by a
> conditional move would probably be benificial in terms of size, but it's not
> clear what condition the conditional move should be using. Calculating such a
> condition would add in size and increase the execution path.
>
> gcc -O2 -march=armv7-a -mthumb pr43864.c -S:
> ...
> push {r4, r5, lr}
> mov r4, r0
> sub sp, sp, #1004
> mov r5, r1
> mov r0, sp
> mov r1, r4
> bl sprintf
> mov r0, sp
> movs r1, #1
> bl access
> mov r3, r0
> cbz r0, .L6
> movs r1, #0
> mov r0, sp
> bl fopen
> mov r1, r0
> cbz r0, .L7
> mov r0, r4
> bl foo
> .L3:
> mov r0, r5
> add sp, sp, #1004
> pop {r4, r5, pc}
> .L6:
> mov r0, r5
> mov r5, r3
> bl free
> b .L3
> .L7:
> mov r0, r5
> mov r5, r1
> bl free
> b .L3
> ...
>
> The proposed patch solved the problem by dealing with the 2 blocks at a level
> when they are still identical: at gimple level. It detect that the 2 blocks are
> identical, and removes one of them.
>
> The following table shows the impact of the patch on the example in terms of
> size for -march=armv7-a:
>
> without with delta
> Os : 108 104 -4
> O2 : 120 104 -16
> Os thumb: 68 64 -4
> O2 thumb: 76 64 -12
>
> The gain in size for -O2 is that of removing the entire block, plus the
> replacement of 2 moves by a constant set, which also decreases the execution
> path. The patch ensures optimal code for both -O2 and -Os.
>
>
> By keeping track of equivalent definitions in the 2 blocks, we can ignore those
> differences in comparison. Without this feature, we would only match blocks with
> resultless operations, due the the ssa-nature of gimples.
> For example, with this feature, we reduce the following function to its minimum
> at gimple level, rather than at rtl level.
>
> int f(int c, int b, int d)
> {
> int r, e;
>
> if (c)
> r = b + d;
> else
> {
> e = b + d;
> r = e;
> }
>
> return r;
> }
>
> ;; Function f (f)
>
> f (int c, int b, int d)
> {
> int e;
>
> <bb 2>:
> e_6 = b_3(D) + d_4(D);
> return e_6;
>
> }
>
> I'll send the patch with the testcases in a separate email.
>
> OK for trunk?
I don't like that you hook this into cleanup_tree_cfg - that is called
_way_ too often.
This also duplicates the literal matching done on the RTL level - instead
I think this optimization would be more related to value-numbering
(either that of SCCVN/FRE/PRE or that of DOM which also does
jump-threading).
Richard.
> Thanks,
> - Tom
>
> 2011-06-08 Tom de Vries <tom@codesourcery.com>
>
> PR middle-end/43864
> * tree-cfgcleanup.c (int_int_splay_lookup, int_int_splay_insert)
> (int_int_splay_node_contained_in, int_int_splay_contained_in)
> (equiv_lookup, equiv_insert, equiv_contained_in, equiv_init)
> (equiv_delete, gimple_base_equal_p, pt_solution_equal_p, gimple_equal_p)
> (bb_gimple_equal_p, update_debug_stmts, cleanup_duplicate_preds_1)
> (same_or_local_phi_alternatives, cleanup_duplicate_preds): New function.
> (cleanup_tree_cfg_bb): Use cleanup_duplicate_preds.
>
More information about the Gcc-patches
mailing list