This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: Failure to combine SHIFT with ZERO_EXTEND
Hi Jeff,
Many thanks for the pointers. I will make the changes and attach the
patch to the bugzilla soon.
Cheers,
Rahul
-----Original Message-----
From: Jeff Law [mailto:law@redhat.com]
Sent: 09 February 2010 00:45
To: Rahul Kharche
Cc: gcc@gcc.gnu.org; sdkteam-gnu
Subject: Re: Failure to combine SHIFT with ZERO_EXTEND
On 02/04/10 08:39, Rahul Kharche wrote:
> Hi All,
>
> On our private port of GCC 4.4.1 we fail to combine successive SHIFT
> operations like in the following case
>
> #include<stdlib.h>
> #include<stdio.h>
>
> void f1 ()
> {
> unsigned short t1;
> unsigned short t2;
>
> t1 = rand();
> t2 = rand();
>
> t1<<= 1; t2<<= 1;
> t1<<= 1; t2<<= 1;
> t1<<= 1; t2<<= 1;
> t1<<= 1; t2<<= 1;
> t1<<= 1; t2<<= 1;
> t1<<= 1; t2<<= 1;
>
> printf("%d\n", (t1+t2));
> }
>
> This is a ZERO_EXTEND problem, because combining SHIFTs with whole
> integers works correctly, so do signed values. The problem seems to
> arise in the RTL combiner which combines the ZERO_EXTEND with the
> SHIFT to generate a SHIFT and an AND. Our architecture does not
> support AND with large constants and hence do not have a matching
> insn pattern (we prefer not doing this, because of large constants
> remain hanging at the end of all RTL optimisations and cause needless
> reloads).
>
> Fixing the combiner to convert masking AND operations to ZERO_EXTRACT
> fixes this issue without any obvious regressions. I'm adding the
> patch here against GCC 4.4.1 for any comments and/or suggestions.
>
Good catch. However, note we are a regression bugfix only phase of
development right now in preparation for branching for GCC 4.5. As a
result the patch can't be checked in at this time; I would recommend you
update the patch to the current sources & attach it to bug #41998 which
contains queued patches for after GCC 4.5 branches.
> Cheers,
> Rahul
>
>
> --- combine.c 2009-04-01 21:47:37.000000000 +0100
> +++ combine.c 2010-02-04 15:04:41.000000000 +0000
> @@ -446,6 +446,7 @@
> static void record_truncated_values (rtx *, void *);
> static bool reg_truncated_to_mode (enum machine_mode, const_rtx);
> static rtx gen_lowpart_or_truncate (enum machine_mode, rtx);
> +static bool can_zero_extract_p (rtx, rtx, enum machine_mode);
>
>
>
> /* It is not safe to use ordinary gen_lowpart in combine.
> @@ -6973,6 +6974,16 @@
> make_compound_operation (XEXP (x, 0),
> next_code),
> i, NULL_RTX, 1, 1, 0, 1);
> + else if (can_zero_extract_p (XEXP (x, 0), XEXP (x, 1), mode))
> + {
> + unsigned HOST_WIDE_INT len = HOST_BITS_PER_WIDE_INT
> + - CLZ_HWI (UINTVAL (XEXP (x,
> 1)));
> + new_rtx = make_extraction (mode,
> + make_compound_operation (XEXP (x, 0),
> + next_code),
> + 0, NULL_RTX, len, 1, 0,
> + in_code == COMPARE);
>
There should be a comment prior to this code fragment describing the
transformation being performed. Something like:
/* Convert (and (shift X Y) MASK) into ... when ... */
That will make it clear in the future when your transformation applies
rather than forcing someone scanning the code to read it in detail.
> + }
>
> break;
>
> @@ -7245,6 +7256,25 @@
> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
> }
>
> +static bool
> +can_zero_extract_p (rtx x, rtx mask_rtx, enum machine_mode mode)
>
There should be a comment prior to this function which briefly describes
what the function does, the parameters & return value. Use comments
prior to other functions to guide you.
> @@ -8957,7 +8987,6 @@
> op0 = UNKNOWN;
>
> *pop0 = op0;
> -
> /* ??? Slightly redundant with the above mask, but not entirely.
> Moving this above means we'd have to sign-extend the mode mask
> for the final test. */
>
Useless diff fragment. Remove this change as it's completely unrelated
and useless.
You should also write a ChangeLog entry for your patch. ChangeLogs
describe what changed, not why something changed. So a suitable entry
might look something like:
<date> Your Name <youremail@somewhere>
* combine.c (make_compound_operation): Convert shifts and masks
into zero_extract in certain cases.
(can_zero_extract_p): New function.
If you could make those changes and attach the result to PR 41998 they
should be able to go in once 4.5 branches from the mainline.
Jeff