This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Avoid mutual recursion between BIT_{IOR,AND}_EXPR optimizations (PR middle-end/34337)
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "Jakub Jelinek" <jakub at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org, "Roger Sayle" <roger at eyesopen dot com>
- Date: Wed, 5 Dec 2007 10:45:41 +0100
- Subject: Re: [PATCH] Avoid mutual recursion between BIT_{IOR,AND}_EXPR optimizations (PR middle-end/34337)
- References: <20071205090617.GI16835@devserv.devel.redhat.com>
On Dec 5, 2007 10:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> The (X << C3) & C1 optimization I've recently added apparently can mutually
> recurse with (X & C1) | C2 optimization Roger added almost two years ago.
> While the justification for the other PR23670 transformations is obvious, I
> wonder what the rationale is for trying to remove some bits from a constant
> when it won't be optimized into 0 anyway.
> The following patch just fixes the mutual recursion, but I wonder if we in
> case fold_binary doesn't simplify it (or, to avoid mutual recursion, doesn't
> simplify it back to the original X & C1) at least shouldn't decide which
> of the two constants (C1 vs. C1 & ~C2) is better for further optimizations.
> Say on:
> int foo (int x) { return (x & 65535) | 255; }
> with -Os on x86_64-linux we now generate:
> andl $65280, %edi
> orb $-1, %dil
> movl %edi, %eax
> ret
> while without this transformation we would generate:
> movzwl %di,%eax
> orb $-1, %al
> ret
> which is shorter. But I'm not exactly sure what criteria for better should
> we use. We can certainly just don't do this transformation at all if
> fold_binary has not simplified it, or e.g. (C1 & (C1 + 1)) == 0 constants
> can be considered better, or those which additionally have 8, 16, 32, 64
> bits set, something else?
I believe Rogers goal was to produce a canonical tree form which
enables for example operand_equal_p to recognize two forms as equal
that are not literally the same in their original form. So I do think this
transformation is worthwhile (though we can probably change what is
considered canonical, or undo / change the form later on RTL for
optimization purposes).
I think the problem I saw is that fold_binary is called repeatedly with
the same trees - without any change, which is something we do not
want to allow. Is there extra oscillation if you fix that?
Note there is also the parallel folding of (X | C1) & C2 to (X & C2) | (C1 & C2)
which might need similar fixing up.
Thanks,
Richard.