[PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)
Wed Aug 19 18:18:02 GMT 2020
Segher Boessenkool <email@example.com> writes:
> On Wed, Aug 19, 2020 at 01:10:36PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool <firstname.lastname@example.org> writes:
>> > When the compgotos pass copies the tail of blocks ending in an indirect
>> > jump, there is a micro-optimization to not copy the last one, since the
>> > original block will then just be deleted. This does not work properly
>> > if cleanup_cfg does not merge all pairs of blocks we expect it to.
>> > v2: This also deletes the other use of single_pred_p, which has the same
>> > problem in principle, I just never have triggered it so far.
>> Could you go into more details? I thought the reason you wanted to
>> remove the test in the loop was that each successful iteration of the
>> loop removes one incoming edge, meaning that if all previous iterations
>> succeed, the final iteration is bound to see a single predecessor.
>> And there's no guarantee in that case that the jump in the final
>> block will be removed. I.e. the loop is unnecessarily leaving
>> cfgcleanup to finish off the last step for it. So I agree that the
>> loop part looks good. (I assume that was v1, but I missed it, sorry.)
> I didn't Cc: you, I should have :-)
> Yes, that is part of the problem. The case where it was noticed however
> was some release of GCC (some 9 release iirc) that did something wrong
> with cfg_cleanup apparenetly, so not everything was threaded optimally,
> making duplicate_computed_gotos run into jumps to jumps and stopping.
> Since it is such an important pass (for performance, for things with
> bigger jump tables or with threaded FSMs etc.), it seemed a good idea to
> make it more robust instead of slightly faster.
>> But it looks like the point of the initial test is to avoid “duplicating”
>> the block if there would still only be one copy of the code. That test
>> still seems reasonable since the case it rejects should be handled by
>> other cfg cleanups. If for some reason it isn't, there doesn't seem
>> to be any reason to apply the max_size limit for a single predecessor,
>> since no code duplication will take place. So ISTM that removing the
>> first condition is really a separate thing that should be its own patch.
> I have never seen the second case misfiring in practice, only the first
Shucks, I guessed the wrong way round :-)
I'd argue that the first check isn't a micro-optimisation though.
It's testing whether the duplication really is a duplication
(rather than a move).
The second test does look like a micro-optimisation.
> The original patch is at
> btw. I'll prepare a testcase for that (it doesn't trigger with recent
> GCC versions though :-/ )
Hmm, OK. If the first check is the important one, then I think it'd
be better to make the max_size stuff conditional on !single_pred_p
rather than drop the test entirely.
More information about the Gcc-patches