[PATCH take 3] ivopts: Improve code generated for very simple loops.

Roger Sayle roger@nextmovesoftware.com
Thu Nov 25 18:17:18 GMT 2021


On Tue, Nov 23, 2021 at 12:46PM Richard Biener < richard.guenther@gmail.com> wrote:
> On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> > > The patch doesn't add any testcase.
> >
> > The three new attached tests check that the critical invariants have a
> > simpler form, and hopefully shouldn't be affected by whether the
> > optimizer and/or backend costs actually decide to perform this iv substitution
> or not.
> 
> The testcases might depend on lp64 though, did you test them with -m32?
> IMHO it's fine to require lp64 here.

Great catch.  You're right that when the loop index has the same precision as the
target's pointer, that fold is (already) able to simplify the ((EXPR)-1)+1, so that with
-m32 my new tests ivopts-[567].c fail.  I've added "require lp64" to those tests, but
I've also added two more tests, using char and unsigned char for the loop expression,
which are optimized on both ilp32 and lp64.

For example, with -O2 -m32, we see the following improvements in ivopts-8.c:
diff ivopts-8.old.s ivopts-8.new.s
14,16c14,15
<       subl    $1, %ecx
<       movzbl  %cl, %ecx
<       leal    4(%eax,%ecx,4), %ecx
---
>       movsbl  %cl, %ecx
>       leal    (%eax,%ecx,4), %ecx

This might also explain why GCC currently generates sub-optimal code.  Back when
ivopts was written, most folks were on i686, so the generated code was optimal.
But with the transition to x86_64, the code is correct, just slightly less efficient.

> I'm a bit unsure about adding this special-casing in cand_value_at in general - it
> does seem that we're doing sth wrong elsewhere - either by not simplifying even
> though enough knowledge is there or by throwing away knowledge earlier
> (during niter analysis?).

I agree this approach is a bit ugly.  Conceptually, an alternative might be to avoid
throwing away knowledge earlier, during niter analysis, by adding an extra tree field
to the tree_niter_desc structure, so that it returns both niter0 (the iteration count
at the top of the loop) and niter1 (the iteration count at the bottom of the loop),
so that later passes (cand_value_at) can use the tree that's relevant.  Alas, this too is
ugly, and inefficient as we're creating/folding trees that may never be used/useful.
A compromise might be to add an enum field describing how the niter was
calculated to tree_niter_desc, and this can be inspected/used by cand_value_at.
The current patch figures this out by examining the other fields already in
tree_niter_desc.


> Anyway, the patch does look quite safe - can you show some statistics in how
> many time there's extra simplification this way during say bootstrap?

Certainly.  During stage2 and stage3 of a bootstrap on x86_64-pc-linux-gnu,
cand_value_at is called 500657 times.  The majority of calls,
447607 (89.4%), request the value at the end of the loop (after_adjust),
while 53050 (10.6%) request the value at the start of the loop.
102437 calls (20.5%) are optimized by clause 1 [0..N loops]
27939 calls (5.6%) are optimized by clause 2 [beg..end loops]

Looking for opportunities to improve things further, I see that
319608 calls (63.8%) have a LT_EXPR exit test.
160965 calls (32.2%) have a NE_EXPR exit test.
20084 calls (4.0%) have a GT_EXPR exit test.
so handling descending loops wouldn’t be a big win.
I'll investigate whether (constant) step sizes other than 1 are
(i) sufficiently common and (ii) benefit from improved folding.


This revised patch has been test on x86_64-pc-linux-gnu with a
make bootstrap and make -k check, both with and without
--target-board=unix{-m32}, with no new failures.
Ok for mainline?

2021-11-25  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
	tree_niter_desc* argument instead of just a tree for NITER.
	If we require the iv candidate value at the end of the final
	loop iteration, try using the original loop bound as the
	NITER for sufficiently simple loops.
	(may_eliminate_iv): Update (only) call to cand_value_at.

gcc/testsuite
	* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
	* gcc.dg/tree-ssa/ivopts-5.c: New test case.
	* gcc.dg/tree-ssa/ivopts-6.c: New test case.
	* gcc.dg/tree-ssa/ivopts-7.c: New test case.
	* gcc.dg/tree-ssa/ivopts-8.c: New test case.
	* gcc.dg/tree-ssa/ivopts-9.c: New test case.


Roger
--

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patchn3.txt
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20211125/b2956d0b/attachment.txt>


More information about the Gcc-patches mailing list