This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: DSE replace_read revisited


> A few weeks ago, I posted a patch to fix some problems in the
> new RTL DSE replace_read code:
>
>     http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01316.html
>
> However, I found that the testcase added there still fails for some
> MIPS targets, and investigating that led me to some wrong-code and
> pessimisation issues too.

Thanks for following up on this.  While I agree that fixing the wrong code 
bugs is necessary, I'm less sure about the pessimisation issues for now.

> There seem to be several problems: 
>
>   (1) As I complained about in the message above, find_shift_sequence
>       does the shift in the mode class of the stored value, rather than
>       MODE_INT.  So if we have a V4HI store and a V2HI read (say), we'd
>       use a vector shift rather than an integer shift.  We'd therefore
>       shift the individual HI elements right, rather than shift the
>       whole bit pattern.

That seems indeed wrong.

>   (2) When reading the low end of a stored value, the original code used
>       gen_lowpart to do the mode punning:
>
>         emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));
>
>       This ICEd when a DFmode store was followed by an SFmode move, as in:
>
>         union u { double d; float f[2]; };
>
>         float
>         foo (union u *u, double d)
>         {
>           u->d = d;
>           return u->f[1];
>         }
>
>       for big-endian targets, because SFmode subregs of DFmode values
>       are not allowed.  My attempt to fix the integer cases for MIPS
>       meant that we'd now convert the DFmode value into SFmode instead,
>       which is of course also wrong.
>
>       (Although this particular example isn't very realistic,
>       the equivalent TFmode/DFmode one matters for targets with
>       IEEE long doubles.)

Can't we simply punt if we cannot form the lowpart?

>   (4) The code requires the store and read to have the same mode class:
>
>         if (GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode))
>           return false;
>
>       This seems unnecessarily restrictive.  I guess it was meant to
>       be a simplifying assumption, but as per (1) and (2) above,
>       it isn't really.

However let's keep it for 4.3 I'd think.

>   (5) The shift case can't deal with floats, even though it would be
>       profitable to do so on some soft-float targets.

Let's keep this limitation for now too.

>   (6) The code only deals with REGs:
>
>         if (store_info->is_set
>             /* No place to keep the value after ra.  */
>             && !reload_completed
>             /* The careful reviewer may wish to comment my checking that
> the rhs of a store is always a reg.  */
>             && REG_P (SET_SRC (body))
>             /* Sometimes the store and reload is used for truncation and
>               rounding.  */
>             && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store)))
>
>       This does indeed seem unnecessarily restrictive, and is the cause of
>       the dse-1.c failures that led me here originally.  The code could
>       easily handle subregs and constants as well.

Yes, I think that we can try and lift this one.

> Addressing (1)-(5) together, I think part of the problem is that the
> code has too many special exceptions.  From a high-level perspective,
> I think we need two things:
>
>   (a) a way of telling whether a mode change is profitable
>
>   (b) a way of extracting the low bits of a value of one arbitrary mode
>       into a value of another arbitrary mode
>
> (a) we already have in the form of MODES_TIEABLE_P.  So rather than
> check whether the two modes are the same class, I think we should check
> whether each individual mode change we make is profitable accordingly to
> MODES_TIEABLE_P.
>
> (b) we don't have (AFAIK), so I added a new routine called
> extract_low_bits. Quoting the function's comment:
>
>    Try to read the low bits of SRC as an rvalue of mode MODE, preserving
>    the bit pattern.  SRC_MODE is the mode of SRC; if this is smaller than
>    MODE, fill the upper bits with zeros.  Fail if the layout of either
>    mode is unknown (as for CC modes) or if the extraction would involve
>    unprofitable mode punning.  Return the value on success, otherwise
>    return null.
>
>    This is different from gen_lowpart* in these respects:
>
>      - the returned value must always be considered an rvalue
>
>      - when MODE is wider than SRC_MODE, the extraction involves
>        a zero extension
>
>      - when MODE is smaller than SRC_MODE, the extraction involves
>        a truncation (and is thus subject to TRULY_NOOP_TRUNCATION).
>
>    In other words, this routine performs a computation, whereas the
>    gen_lowpart* routines are conceptually lvalue or rvalue subreg
>    operations.

I'm a little reluctant to introduce this machinery at this point.  So I'd try 
to fix the wrong-code code as long as extract_low_bits is not required; if it 
is, I'd punt for now.

What do you think?

-- 
Eric Botcazou


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]