[RFC] Elimination of zext/sext - type promotion pass

Richard Biener richard.guenther@gmail.com
Mon Nov 10 12:56:00 GMT 2014


On Mon, Nov 10, 2014 at 12:29 AM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:
>
>> Well - the best way would be to expose the target specifics to GIMPLE
>> at some point in the optimization pipeline.  My guess would be that it's
>> appropriate after loop optimizations (but maybe before induction variable
>> optimization).
>>
>> That is, have a pass that applies register promotion to all SSA names
>> in the function, inserting appropriate truncations and extensions.  That
>> way you'd never see (set (subreg...) on RTL.  The VRP and DOM
>> passes running after that pass would then be able to aggressively
>> optimize redundant truncations and extensions.
>>
>> Effects on debug information are to be considered.  You can change
>> the type of SSA names in-place but you don't want to do that for
>> user DECLs (and we can't have the SSA name type and its DECL
>> type differ - and not sure if we might want to lift that restriction).
>
>
> Thanks for the comments. Here is a prototype patch that implements a
> type promotion pass. This pass records SSA variables that will have
> values in higher bits (than the original type precision) if promoted and
> uses this information in inserting appropriate truncations and
> extensions. This pass also classifies some of the stmts that sets ssa's
> to be unsafe to promote. Here is a gimple difference for the type
> promotion as compared to previous dump for a testcase.

Note that while GIMPLE has a way to zero-extend (using BIT_AND_EXPR)
it has no convenient way to sign-extend other than truncating to a signed
(non-promoted) type and then extending to the promoted type.  Thus
I think such pass should be accompanied with a new tree code,
SEXT_EXPR.  Otherwise we end up with "spurious" un-promoted
signed types which later optimizations may be confused about.

Not sure if that is the actual issue though.

Instead op "prmt" and "prmtn" I'd spell out promote and tree-type-prmtn
should be gimple-ssa-type-promote.c.  In the end all targets with
non-trivial PROMOTE_MODE should run the pass as a lowering step
so it should be enabled even at -O0 (and not disablable).

I'd definitely run the pass _after_ pass_lower_vector_ssa (and in the
end I'd like to run it before IVOPTs ... which means moving IVOPTs
later, after VRP which should be the pass optimizing away some of
the extensions).

In get_promoted_type I don't understand why you preserve qualifiers.
Also even for targets without PROMOTE_MODE it may be
beneficial to expose truncations required by expanding bit-precision
arithmetic earlier (that is, if !PROMOTE_MODE at least promote
to GET_MODE_PRECISION (TYPE_MODE (type))).  A testcase
for that is for example

struct { long i : 33; long j : 33; } a;
return a.i + a.j;

where bitfields of type > int do not promote so you get a
33 bit add which we expand to a 64bit add plus a sign-extension
(and nothing optimizes that later usually).

insert_next_bb sounds like you want to use insert_on_edge
somewhere.

in assign_rhs_promotable_p you handle comparisons special
but the ternary COND_EXPR and VEC_COND_EXPR can have
comparisons embedded in their first operand.  The comment
confuses me though - with proper sign- or zero-extensions inserted
you should be able to promote them anyway?

You seem to miss that a GIMPLE_ASSIGN can have 3 operands
in promote_cst_in_stmt as well.

In promote_assign_stmt_use I consider a default: case that ends
up doing nothing dangerous ;)  Please either use gcc_unreachable ()
or do the safe thing (fix = true;?).  You seem to be working with
a lattice of some kind - fixing up stmt uses the way you do - walking
over immediate uses - is not very cache friendly.  Why not use
a lattice for this - record promoted vars to be used for old SSA names
and walk over all stmts instead, replacing SSA uses on them?
Btw, you don't need to call update_stmt if you SET_USE and not
replace an SSA name with a constant.

You seem to "fix" with a single stmt but I don't see where you insert
zero- or sign-extensions for ssa_overflows_p cases?

Note that at least for SSA names with !SSA_NAME_VAR (thus
anonymous vars) you want to do a cheaper promotion by not
allocating a new SSA name but simply "fixing" its type by
assigning to its TREE_TYPE.   For SSA names with SSA_NAME_VAR
there is of course debug-info to consider and thus doing what you
do is better (but probably still will wreck debuginfo?).

GIMPLE_NOPs are not only used for parameters but also uninitialized
uses - for non-parameters you should simply adjust their type.  No
need to fixup their value.

The pass needs more comments.

It looks like you are not promoting all variables but only those
where compensation code (zero-/sign-extensions) is not necessary?

Thanks for trying to work on this.
Richard.

>  crc2 (short unsigned int crc, unsigned char data)
>  {
>    unsigned char carry;
>    unsigned char x16;
>    unsigned char i;
> -  unsigned char ivtmp_5;
> +  unsigned int _2;
> +  unsigned char _3;
> +  unsigned int _4;
> +  unsigned int _5;
>    unsigned char _9;
> -  unsigned char _10;
> -  unsigned char ivtmp_18;
> +  unsigned int _10;
> +  unsigned int _11;
> +  unsigned int _12;
> +  unsigned int _13;
> +  unsigned int _15;
> +  unsigned int _16;
> +  unsigned int _18;
> +  unsigned int _21;
> +  unsigned int _22;
> +  unsigned int _24;
> +  short unsigned int _26;
> +  unsigned char _27;
> +  unsigned int _28;
> +  unsigned int _29;
> +  unsigned int _30;
>
>    <bb 2>:
> +  _12 = (unsigned int) data_8(D);
> +  _2 = (unsigned int) crc_7(D);
>
>    <bb 3>:
> -  # crc_28 = PHI <crc_2(5), crc_7(D)(2)>
> -  # data_29 = PHI <data_12(5), data_8(D)(2)>
> -  # ivtmp_18 = PHI <ivtmp_5(5), 8(2)>
> -  _9 = (unsigned char) crc_28;
> -  _10 = _9 ^ data_29;
> -  x16_11 = _10 & 1;
> -  data_12 = data_29 >> 1;
> -  if (x16_11 == 1)
> +  # _30 = PHI <_28(5), _2(2)>
> +  # _16 = PHI <_29(5), _12(2)>
> +  # _4 = PHI <_18(5), 8(2)>
> +  _9 = (unsigned char) _30;
> +  _5 = (unsigned int) _9;
> +  _22 = _5 ^ _16;
> +  _10 = _22 & 1;
> +  _29 = _16 >> 1;
> +  _27 = (unsigned char) _10;
> +  if (_27 == 1)
>      goto <bb 4>;
>    else
>      goto <bb 7>;
>
>    <bb 4>:
> -  crc_13 = crc_28 ^ 16386;
> -  crc_24 = crc_13 >> 1;
> -  crc_15 = crc_24 | 32768;
> +  _11 = _30 ^ 16386;
> +  _13 = _11 >> 1;
> +  _24 = _13 | 32768;
>
>    <bb 5>:
> -  # crc_2 = PHI <crc_15(4), crc_21(7)>
> -  ivtmp_5 = ivtmp_18 - 1;
> -  if (ivtmp_5 != 0)
> +  # _28 = PHI <_24(4), _15(7)>
> +  _18 = _4 - 1;
> +  _3 = (unsigned char) _18;
> +  if (_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 6>;
>
>    <bb 6>:
> -  # crc_19 = PHI <crc_2(5)>
> -  return crc_19;
> +  # _21 = PHI <_28(5)>
> +  _26 = (short unsigned int) _21;
> +  return _26;
>
>    <bb 7>:
> -  crc_21 = crc_28 >> 1;
> +  _15 = _30 >> 1;
>    goto <bb 5>;
>
>  }
>
>
> I experimented with few simple test-cases and results so far are mixed.
> It also seems that subsequent passes are not always optimising as
> expected. I haven't looked in detail but will look into it based on the
> feedback.
>
> Please also note that this pass still doest handle debug instructions
> and there are couple regression failures for ARM.
>
> Thanks,
> Kugan
>
>



More information about the Gcc-patches mailing list