This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFA: DSE replace_read revisited
- From: Eric Botcazou <ebotcazou at libertysurf dot fr>
- To: Richard Sandiford <rsandifo at nildram dot co dot uk>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 7 Nov 2007 11:01:28 +0100
- Subject: Re: RFA: DSE replace_read revisited
- References: <87ejff7bhs.fsf@firetop.home>
> A few weeks ago, I posted a patch to fix some problems in the
> new RTL DSE replace_read code:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01316.html
>
> However, I found that the testcase added there still fails for some
> MIPS targets, and investigating that led me to some wrong-code and
> pessimisation issues too.
Thanks for following up on this. While I agree that fixing the wrong code
bugs is necessary, I'm less sure about the pessimisation issues for now.
> There seem to be several problems:
>
> (1) As I complained about in the message above, find_shift_sequence
> does the shift in the mode class of the stored value, rather than
> MODE_INT. So if we have a V4HI store and a V2HI read (say), we'd
> use a vector shift rather than an integer shift. We'd therefore
> shift the individual HI elements right, rather than shift the
> whole bit pattern.
That seems indeed wrong.
> (2) When reading the low end of a stored value, the original code used
> gen_lowpart to do the mode punning:
>
> emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));
>
> This ICEd when a DFmode store was followed by an SFmode move, as in:
>
> union u { double d; float f[2]; };
>
> float
> foo (union u *u, double d)
> {
> u->d = d;
> return u->f[1];
> }
>
> for big-endian targets, because SFmode subregs of DFmode values
> are not allowed. My attempt to fix the integer cases for MIPS
> meant that we'd now convert the DFmode value into SFmode instead,
> which is of course also wrong.
>
> (Although this particular example isn't very realistic,
> the equivalent TFmode/DFmode one matters for targets with
> IEEE long doubles.)
Can't we simply punt if we cannot form the lowpart?
> (4) The code requires the store and read to have the same mode class:
>
> if (GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode))
> return false;
>
> This seems unnecessarily restrictive. I guess it was meant to
> be a simplifying assumption, but as per (1) and (2) above,
> it isn't really.
However let's keep it for 4.3 I'd think.
> (5) The shift case can't deal with floats, even though it would be
> profitable to do so on some soft-float targets.
Let's keep this limitation for now too.
> (6) The code only deals with REGs:
>
> if (store_info->is_set
> /* No place to keep the value after ra. */
> && !reload_completed
> /* The careful reviewer may wish to comment my checking that
> the rhs of a store is always a reg. */
> && REG_P (SET_SRC (body))
> /* Sometimes the store and reload is used for truncation and
> rounding. */
> && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store)))
>
> This does indeed seem unnecessarily restrictive, and is the cause of
> the dse-1.c failures that led me here originally. The code could
> easily handle subregs and constants as well.
Yes, I think that we can try and lift this one.
> Addressing (1)-(5) together, I think part of the problem is that the
> code has too many special exceptions. From a high-level perspective,
> I think we need two things:
>
> (a) a way of telling whether a mode change is profitable
>
> (b) a way of extracting the low bits of a value of one arbitrary mode
> into a value of another arbitrary mode
>
> (a) we already have in the form of MODES_TIEABLE_P. So rather than
> check whether the two modes are the same class, I think we should check
> whether each individual mode change we make is profitable accordingly to
> MODES_TIEABLE_P.
>
> (b) we don't have (AFAIK), so I added a new routine called
> extract_low_bits. Quoting the function's comment:
>
> Try to read the low bits of SRC as an rvalue of mode MODE, preserving
> the bit pattern. SRC_MODE is the mode of SRC; if this is smaller than
> MODE, fill the upper bits with zeros. Fail if the layout of either
> mode is unknown (as for CC modes) or if the extraction would involve
> unprofitable mode punning. Return the value on success, otherwise
> return null.
>
> This is different from gen_lowpart* in these respects:
>
> - the returned value must always be considered an rvalue
>
> - when MODE is wider than SRC_MODE, the extraction involves
> a zero extension
>
> - when MODE is smaller than SRC_MODE, the extraction involves
> a truncation (and is thus subject to TRULY_NOOP_TRUNCATION).
>
> In other words, this routine performs a computation, whereas the
> gen_lowpart* routines are conceptually lvalue or rvalue subreg
> operations.
I'm a little reluctant to introduce this machinery at this point. So I'd try
to fix the wrong-code code as long as extract_low_bits is not required; if it
is, I'd punt for now.
What do you think?
--
Eric Botcazou