[PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

Richard Sandiford richard.sandiford@arm.com
Wed Aug 19 18:18:02 GMT 2020


Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Wed, Aug 19, 2020 at 01:10:36PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > When the compgotos pass copies the tail of blocks ending in an indirect
>> > jump, there is a micro-optimization to not copy the last one, since the
>> > original block will then just be deleted.  This does not work properly
>> > if cleanup_cfg does not merge all pairs of blocks we expect it to.
>> >
>> >
>> > v2: This also deletes the other use of single_pred_p, which has the same
>> > problem in principle, I just never have triggered it so far.
>> 
>> Could you go into more details?  I thought the reason you wanted to
>> remove the test in the loop was that each successful iteration of the
>> loop removes one incoming edge, meaning that if all previous iterations
>> succeed, the final iteration is bound to see a single predecessor.
>> And there's no guarantee in that case that the jump in the final
>> block will be removed.  I.e. the loop is unnecessarily leaving
>> cfgcleanup to finish off the last step for it.  So I agree that the
>> loop part looks good.  (I assume that was v1, but I missed it, sorry.)
>
> I didn't Cc: you, I should have :-)
>
> Yes, that is part of the problem.  The case where it was noticed however
> was some release of GCC (some 9 release iirc) that did something wrong
> with cfg_cleanup apparenetly, so not everything was threaded optimally,
> making duplicate_computed_gotos run into jumps to jumps and stopping.
> Since it is such an important pass (for performance, for things with
> bigger jump tables or with threaded FSMs etc.), it seemed a good idea to
> make it more robust instead of slightly faster.
>
>> But it looks like the point of the initial test is to avoid “duplicating”
>> the block if there would still only be one copy of the code.  That test
>> still seems reasonable since the case it rejects should be handled by
>> other cfg cleanups.  If for some reason it isn't, there doesn't seem
>> to be any reason to apply the max_size limit for a single predecessor,
>> since no code duplication will take place.  So ISTM that removing the
>> first condition is really a separate thing that should be its own patch.
>
> I have never seen the second case misfiring in practice, only the first
> one!

Shucks, I guessed the wrong way round :-)

I'd argue that the first check isn't a micro-optimisation though.
It's testing whether the duplication really is a duplication
(rather than a move).

The second test does look like a micro-optimisation.

> The original patch is at
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html
> btw.  I'll prepare a testcase for that (it doesn't trigger with recent
> GCC versions though :-/ )

Hmm, OK.  If the first check is the important one, then I think it'd
be better to make the max_size stuff conditional on !single_pred_p
rather than drop the test entirely.

Thanks,
Richard


More information about the Gcc-patches mailing list