This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, PR43864] Gimple level duplicate block cleanup.


On Wed, Jun 8, 2011 at 11:55 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wrote:
>> Hi Richard,
>>
>> I have a patch for PR43864. The patch adds a gimple level duplicate block
>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the following
>> table (%, lower is better).
>>
>> ? ? ? ? ? ? ? ? ? ? none ? ? ? ? ? ?pic
>> ? ? ? ? ? ? ? ?thumb1 ?thumb2 ?thumb1 thumb2
>> spec2000 ? ? ? ? ?99.9 ? ?99.9 ? ?99.8 ? 99.8
>>
>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that the
>> optimizations proposed in PR20070 would fix this PR.
>>
>> The problem in this PR is that when compiling with -O2, the example below should
>> only have one call to free. The original problem is formulated in terms of -Os,
>> but currently we generate one call to free with -Os, although still not the
>> smallest code possible. I'll show here the -O2 case, since that's similar to the
>> original PR.
>>
>> #include <stdio.h>
>> void foo (char*, FILE*);
>> char* hprofStartupp(char *outputFileName, char *ctx)
>> {
>> ? ?char fileName[1000];
>> ? ?FILE *fp;
>> ? ?sprintf(fileName, outputFileName);
>> ? ?if (access(fileName, 1) == 0) {
>> ? ? ? ?free(ctx);
>> ? ? ? ?return 0;
>> ? ?}
>>
>> ? ?fp = fopen(fileName, 0);
>> ? ?if (fp == 0) {
>> ? ? ? ?free(ctx);
>> ? ? ? ?return 0;
>> ? ?}
>>
>> ? ?foo(outputFileName, fp);
>>
>> ? ?return ctx;
>> }
>>
>> AFAIU, there are 2 complementary methods of rtl optimizations proposed in PR20070.
>> - Merging 2 blocks which are identical expect for input registers, by using a
>> ?conditional move to choose between the different input registers.
>> - Merging 2 blocks which have different local registers, by ignoring those
>> ?differences
>>
>> Blocks .L6 and.L7 have no difference in local registers, but they have a
>> difference in input registers: r3 and r1. Replacing the move to r5 by a
>> conditional move would probably be benificial in terms of size, but it's not
>> clear what condition the conditional move should be using. Calculating such a
>> condition would add in size and increase the execution path.
>>
>> gcc -O2 -march=armv7-a -mthumb pr43864.c -S:
>> ...
>> ? ? ? ?push ? ?{r4, r5, lr}
>> ? ? ? ?mov ? ? r4, r0
>> ? ? ? ?sub ? ? sp, sp, #1004
>> ? ? ? ?mov ? ? r5, r1
>> ? ? ? ?mov ? ? r0, sp
>> ? ? ? ?mov ? ? r1, r4
>> ? ? ? ?bl ? ? ?sprintf
>> ? ? ? ?mov ? ? r0, sp
>> ? ? ? ?movs ? ?r1, #1
>> ? ? ? ?bl ? ? ?access
>> ? ? ? ?mov ? ? r3, r0
>> ? ? ? ?cbz ? ? r0, .L6
>> ? ? ? ?movs ? ?r1, #0
>> ? ? ? ?mov ? ? r0, sp
>> ? ? ? ?bl ? ? ?fopen
>> ? ? ? ?mov ? ? r1, r0
>> ? ? ? ?cbz ? ? r0, .L7
>> ? ? ? ?mov ? ? r0, r4
>> ? ? ? ?bl ? ? ?foo
>> .L3:
>> ? ? ? ?mov ? ? r0, r5
>> ? ? ? ?add ? ? sp, sp, #1004
>> ? ? ? ?pop ? ? {r4, r5, pc}
>> .L6:
>> ? ? ? ?mov ? ? r0, r5
>> ? ? ? ?mov ? ? r5, r3
>> ? ? ? ?bl ? ? ?free
>> ? ? ? ?b ? ? ? .L3
>> .L7:
>> ? ? ? ?mov ? ? r0, r5
>> ? ? ? ?mov ? ? r5, r1
>> ? ? ? ?bl ? ? ?free
>> ? ? ? ?b ? ? ? .L3
>> ...
>>
>> The proposed patch solved the problem by dealing with the 2 blocks at a level
>> when they are still identical: at gimple level. It detect that the 2 blocks are
>> identical, and removes one of them.
>>
>> The following table shows the impact of the patch on the example in terms of
>> size for -march=armv7-a:
>>
>> ? ? ? ? ?without ? ? with ? ?delta
>> Os ? ? ?: ? ? 108 ? ? ?104 ? ? ? -4
>> O2 ? ? ?: ? ? 120 ? ? ?104 ? ? ?-16
>> Os thumb: ? ? ?68 ? ? ? 64 ? ? ? -4
>> O2 thumb: ? ? ?76 ? ? ? 64 ? ? ?-12
>>
>> The gain in size for -O2 is that of removing the entire block, plus the
>> replacement of 2 moves by a constant set, which also decreases the execution
>> path. The patch ensures optimal code for both -O2 and -Os.
>>
>>
>> By keeping track of equivalent definitions in the 2 blocks, we can ignore those
>> differences in comparison. Without this feature, we would only match blocks with
>> resultless operations, due the the ssa-nature of gimples.
>> For example, with this feature, we reduce the following function to its minimum
>> at gimple level, rather than at rtl level.
>>
>> int f(int c, int b, int d)
>> {
>> ?int r, e;
>>
>> ?if (c)
>> ? ?r = b + d;
>> ?else
>> ? ?{
>> ? ? ?e = b + d;
>> ? ? ?r = e;
>> ? ?}
>>
>> ?return r;
>> }
>>
>> ;; Function f (f)
>>
>> f (int c, int b, int d)
>> {
>> ?int e;
>>
>> <bb 2>:
>> ?e_6 = b_3(D) + d_4(D);
>> ?return e_6;
>>
>> }
>>
>> I'll send the patch with the testcases in a separate email.
>>
>> OK for trunk?
>
> I don't like that you hook this into cleanup_tree_cfg - that is called
> _way_ too often.
>
> This also duplicates the literal matching done on the RTL level - instead
> I think this optimization would be more related to value-numbering
> (either that of SCCVN/FRE/PRE or that of DOM which also does
> jump-threading).

And while at it: Put it in a separate file ;-)

Ciao!
Steven


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]