This is the mail archive of the
mailing list for the GCC project.
Re: conditional lim
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Evgeniya Maenkova <evgeniya dot maenkova at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 26 May 2015 12:31:18 +0200
- Subject: Re: conditional lim
- Authentication-results: sourceware.org; auth=none
- References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw at mail dot gmail dot com>
On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
> Could you please review my patch for predicated lim?
> Let me note some details about it:
> 1) Phi statements are still moved only if they have 1 or 2
> arguments. However, phi statements could be move under conditions (as
> itâs done for the other statements). Probably, phi statement motion
> with 3 + arguments could be implemented in the next patch after
> predicated lim.
> 2) Patch has limitations/features like (it was ok to me to
> implement it such way, maybe Iâm not correct. ):
> a) Loop1
> If (a)
> Stmt - Invariant for Loop1
> In this case Stmt will be moved only out of Loop2, because of if (a).
> b) Or
> If (cond1)
> If (cond2)
> If (cond3)
> Stmt will be moved out only if cond1 is always executed in Loop1.
> c) It took me a long time to write all of these code, so there
> might be other peculiarities which I forgot to mention. :)
> Letâs discuss these ones as you will review my patch.
> 3) Patch consists of 9 files:
> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â changed tests:
> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as
> predicated lim moves 2 more statements out of the loop;
> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â with conditional
> lim recip optimization in this test doesnât work (the corresponding
> value is below threshold as I could see in the code for recip, 1<3).
> So to have recip working in this test I changed test a little bit.
> b) gcc/tree-ssa-loop-im.c â the patched lim per se
> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
> the tests for conditional lim.
> 4) Patch testing:
> a) make âk check (no difference in results for me for the clean
> build and the patched one,
> - Revision: 222849,
> - uname -a
> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
> b) Bootstrap.
> It goes well now, however to fix it I have made a temporary hack in
> the lim code. And with this fix patch definitely shouldnât be
> I did so, as I would like to discuss this issue first.
> The problem is: I got stage2-stage3 comparison failure on the single
> file (tree-vect-data-refs.o). After some investigation I understood
> that tree-vect-data-refs.o differs being compiled with and without
> â-gâ option (yes, more exactly on stage 2 this is â-g âO2 âgtoggleâ,
> and for stage 3 this is â-g âO2â. But to simplify things I can
> reproduce this difference on the same build (even not bootstrapped),
> with option â âgâ and without it).
> Of course, the file compiled with âg option will contain debug
> information and will differ from the corresponding file without debug
> information. I mean there is the difference reported by
> contrib/compare-debug (I mean we talk about stripped files).
> Then I compared assemblers and lim dump files and made a conclusion
> the difference is:
> There is statement _484=something, which is conditionally moved out of
> loop X. In non debug cases no usages of _484 are left inside the loop
> X. In debug case, there is the single use in debug statement. So for
> debug case my patch generates phi statement for _484 to make it
> available inside the loop X. For non debug case, no such phi statement
> generated as there is no uses of _484.
> There is one more statement with lhs=_493, with exactly this pattern
> of use. But this is not important for our discussion.
> So, in my opinion itâs ok to generate additional phi node for debug
> case. But Iâm not a compiler expert and maybe there is some
> requirement that debug and non-debug versions should differ only by
> debug statements, I donât know.
As Andi said code generation needs to be the same.
> My temporary hack fix is just removing of such debug statements (no
> debug statements, no problems J).
That's fine and generally what is done in other passes.
> The alternatives I see are:
> - Move debug statements outside the loop (not good in my opinion);
> - Somehow reset values in debug statements (not good in my
> opinion, if I could provide correct information for them);
You could re-set them via gimple_debug_bind_reset_value which
will tell the user that at this point the value is optimized out
better than removing the debug stmts).
> - Generate phi for debug statements and fix bootstrap (say,
> letâs have some list of files, which we have special check for. I mean
> for my case, the difference is not in stage2 and stage3, itâs in debug
> and non-debug). I like this variant. However, as I donât see such list
> of special files in the whole gcc, I think I might be missing
The policy ;)
> What do you think?
Now about the patch. Generally there seem to be spurious whitespace-only
or formatting differences - try to avoid these.
All new functiuons need a comment explaining them and their function
+static cond_data_map *get_cond_data_map (struct loop * level)
per GCC coding conventions this should be formatted like
static cond_data_map *
get_cond_data_map (struct loop * level)
+#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\
+ BB->aux = (void *) (XCNEW (struct move_cond));\
+ BB_MOVE_COND (BB)->type=ALWAYS_EXECUTED_IN;\
+ BB_MOVE_COND (BB)->loop=VAL;
this kind of side-effects in macros are bad - better turn them into
- if (flag_unswitch_loops
- && gimple_code (stmt) == GIMPLE_COND)
- /* If we perform unswitching, force the operands of the invariant
- condition to be moved out of the loop. */
- return MOVE_POSSIBLE;
+ if (gimple_code (stmt) == GIMPLE_COND)
+ return MOVE_POSSIBLE;
any reason you removed the guard on flag_unswitch_loops? We may
want to add a new flag, certainly this transform might not be suitable
for -O[12s] for example. Like if not all stmts inside a conditional can
be moved the transform will increase code-size.
You are doing a (lot of?) refactoring which makes review of the patch
harder. For example
+bool calculate_tgt_level (gimple stmt, struct loop *outermost,
+ bool cond_executed, basic_block bb)
+ struct lim_aux_data *lim_dat
+ if (lim_data->cost >= LIM_EXPENSIVE)
+ set_profitable_level (stmt);
+ return true;
- lim_data = init_lim_data (stmt);
- lim_data->always_executed_in = outermost;
- if (!determine_max_movement (stmt, false))
- lim_data->max_loop = NULL;
- if (dump_file && (dump_flags & TDF_DETAILS))
- print_gimple_stmt (dump_file, stmt, 2, 0);
- fprintf (dump_file, " invariant up to level %d, cost %d.\n\n",
- loop_depth (lim_data->max_loop),
- if (lim_data->cost >= LIM_EXPENSIVE)
- set_profitable_level (stmt);
+ if (!calculate_tgt_level (stmt, outermost, cond_executed, bb))
where it is not obvious why moving set_profitable_level should be
in a function named calculate_tgt_level. In fact determine_max_movement
was supposed to be the abstraction already.
Without yet getting an idea of the overall algorithm (I'm really missing such
a thing...) it seems to me that the current handling of doing invariant motion
of PHIs should handle the analysis phase just fine (you might be adding
more advanced cases in this patch, but I'd rather do that as a followup for
The code-gen part looks somewhat ad-hoc to me. In principle if we have
a list of PHIs to hoist we can materialize the needed part of the original
basic-block structure on the pre-header edge together with the controlling
conditionals. The loop pre-header would be the always-executed BB
and the needed part would always start with the loop header and be a
single-entry multiple-exit region. So ideally we could use sth like
gimple_duplicate_sese_region with the BB copying worker copy_bbs
refactored/replaced with something only copying controlling conditionals
and statements/PHIs that a callback identifies (tree-loop-distribution.c
would also like sth like that - currently it copies whole loops and removes
stmts it didn't intend to copy...).
So basically invariantness_dom_walker would record a vector of PHIs
to hoist (or we'd construct that at move_computations time).
If there is any control-flow to hoist then we'd do an alternative
move_computations by "copying" the needed CFG and covered stmts.
In theory doing it the loop-distribution way (simply copy the whole
loop body and then remove anything not needed) would also be possible
though that wouldn't be the very best idea obviously ;)