[0/7] Type promotion pass and elimination of zext/sext

Kugan kugan.vivekanandarajah@linaro.org
Mon Nov 2 09:17:00 GMT 2015



On 29/10/15 02:45, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 1:50 AM, kugan
> <kugan.vivekanandarajah@linaro.org> wrote:
>>
>>
>> On 23/10/15 01:23, Richard Biener wrote:
>>>
>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan
>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 21/10/15 23:45, Richard Biener wrote:
>>>>>
>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/09/15 12:53, Kugan wrote:
>>>>>>>
>>>>>>>
>>>>>>> This a new version of the patch posted in
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>>>>>> more testing and spitted the patch to make it more easier to review.
>>>>>>> There are still couple of issues to be addressed and I am working on
>>>>>>> them.
>>>>>>>
>>>>>>> 1. AARCH64 bootstrap now fails with the commit
>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
>>>>>>> mis-compiled
>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a
>>>>>>> latent
>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>>>>>> time being, I am using  patch
>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>>>>>> workaround. This meeds to be fixed before the patches are ready to be
>>>>>>> committed.
>>>>>>>
>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It
>>>>>>> works
>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as
>>>>>>> well.
>>>>>>
>>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> Now that stage 1 is going to close, I would like to get these patches
>>>>>> accepted for stage1. I will try my best to address your review comments
>>>>>> ASAP.
>>>>>
>>>>>
>>>>> Ok, can you make the whole patch series available so I can poke at the
>>>>> implementation a bit?  Please state the revision it was rebased on
>>>>> (or point me to a git/svn branch the work resides on).
>>>>>
>>>>
>>>> Thanks. Please find the patched rebated against trunk@229156. I have
>>>> skipped the test-case readjustment patches.
>>>
>>>
>>> Some quick observations.  On x86_64 when building
>>
>>
>> Hi Richard,
>>
>> Thanks for the review.
>>
>>>
>>> short bar (short y);
>>> int foo (short x)
>>> {
>>>    short y = bar (x) + 15;
>>>    return y;
>>> }
>>>
>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
>>> I get
>>>
>>>    <bb 2>:
>>>    _1 = (int) x_10(D);
>>>    _2 = (_1) sext (16);
>>>    _11 = bar (_2);
>>>    _5 = (int) _11;
>>>    _12 = (unsigned int) _5;
>>>    _6 = _12 & 65535;
>>>    _7 = _6 + 15;
>>>    _13 = (int) _7;
>>>    _8 = (_13) sext (16);
>>>    _9 = (_8) sext (16);
>>>    return _9;
>>>
>>> which looks fine but the VRP optimization doesn't trigger for the
>>> redundant sext
>>> (ranges are computed correctly but the 2nd extension is not removed).

Thanks for the comments. Please fond the attached patches with which I
am now getting
cat .192t.optimized

;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
symbol_order=0)

foo (short int x)
{
  signed int _1;
  int _2;
  signed int _5;
  unsigned int _6;
  unsigned int _7;
  signed int _8;
  int _9;
  short int _11;
  unsigned int _12;
  signed int _13;

  <bb 2>:
  _1 = (signed int) x_10(D);
  _2 = _1;
  _11 = bar (_2);
  _5 = (signed int) _11;
  _12 = (unsigned int) _11;
  _6 = _12 & 65535;
  _7 = _6 + 15;
  _13 = (signed int) _7;
  _8 = (_13) sext (16);
  _9 = _8;
  return _9;

}


There are still some redundancies. The asm difference after RTL
optimizations is

-	addl	$15, %eax
+	addw	$15, %ax


>>>
>>> This also makes me notice trivial match.pd patterns are missing, like
>>> for example
>>>
>>> (simplify
>>>   (sext (sext@2 @0 @1) @3)
>>>   (if (tree_int_cst_compare (@1, @3) <= 0)
>>>    @2
>>>    (sext @0 @3)))
>>>
>>> as VRP doesn't run at -O1 we must rely on those to remove rendudant
>>> extensions,
>>> otherwise generated code might get worse compared to without the pass(?)
>>
>>
>> Do you think that we should enable this pass only when vrp is enabled.
>> Otherwise, even when we do the simple optimizations you mentioned below, we
>> might not be able to remove all the redundancies.
>>
>>>
>>> I also notice that the 'short' argument does not get it's sign-extension
>>> removed
>>> as redundand either even though we have
>>>
>>> _1 = (int) x_8(D);
>>> Found new range for _1: [-32768, 32767]
>>>
>>
>> I am looking into it.
>>
>>> In the end I suspect that keeping track of the "simple" cases in the
>>> promotion
>>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP
>>> to do
>>> its work).  In some way whether the ABI guarantees promoted argument
>>> registers might need some other target hook queries.

I tried adding it in the attached patch with record_visit_stmt to track
whether an ssa would have value overflow or properly zero/sign extended
in promoted mode. We can use this to eliminate some of the zero/sign
extension at gimple level. As it is, it doesn't do much. If this is what
you had in mind, I will extend it based on your feedback.


>>>
>>> Now onto the 0002 patch.
>>>
>>> +static bool
>>> +type_precision_ok (tree type)
>>> +{
>>> +  return (TYPE_PRECISION (type)  == 8
>>> +         || TYPE_PRECISION (type) == 16
>>> +         || TYPE_PRECISION (type) == 32);
>>> +}
>>>
>>> that's a weird function to me.  You probably want
>>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>>> here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?
>>>
>>
>> I will change this. (I have a patch which I am testing with other changes
>> you have asked for)
>>
>>
>>> +/* Return the promoted type for TYPE.  */
>>> +static tree
>>> +get_promoted_type (tree type)
>>> +{
>>> +  tree promoted_type;
>>> +  enum machine_mode mode;
>>> +  int uns;
>>> +  if (POINTER_TYPE_P (type)
>>> +      || !INTEGRAL_TYPE_P (type)
>>> +      || !type_precision_ok (type))
>>> +    return type;
>>> +
>>> +  mode = TYPE_MODE (type);
>>> +#ifdef PROMOTE_MODE
>>> +  uns = TYPE_SIGN (type);
>>> +  PROMOTE_MODE (mode, uns, type);
>>> +#endif
>>> +  uns = TYPE_SIGN (type);
>>> +  promoted_type = lang_hooks.types.type_for_mode (mode, uns);
>>> +  if (promoted_type
>>> +      && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
>>> +    type = promoted_type;
>>>
>>> I think what you want to verify is that TYPE_PRECISION (promoted_type)
>>> == GET_MODE_PRECISION (mode).
>>> And to not even bother with this simply use
>>>
>>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
>>> uns);
>>>
>>
>> I am changing this too.
>>
>>> You use a domwalk but also might create new basic-blocks during it
>>> (insert_on_edge_immediate), that's a
>>> no-no, commit edge inserts after the domwalk.
>>
>>
>> I am sorry, I dont understand "commit edge inserts after the domwalk" Is
>> there a way to do this in the current implementation?
> 
> Yes, simply use gsi_insert_on_edge () and after the domwalk is done do
> gsi_commit_edge_inserts ().
> 
>>> ssa_sets_higher_bits_bitmap looks unused and
>>> we generally don't free dominance info, so please don't do that.
>>>
>>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc
>>> with
>>>
>>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/
>>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/
>>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
>>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem
>>> /usr/local/powerpc64-unknown-linux-gnu/sys-include    -g -O2 -O2  -g
>>> -O2 -DIN_GCC    -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual
>>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes
>>> -Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128
>>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc
>>> -fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I.
>>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.
>>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include
>>> -I../../../trunk/libgcc/../libdecnumber/dpd
>>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _divdi3.o
>>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c
>>> ../../../trunk/libgcc/libgcc2.c \
>>>            -fexceptions -fnon-call-exceptions -fvisibility=hidden
>>> -DHIDE_EXPORTS
>>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0:
>>> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’:
>>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in
>>> expand_debug_locations, at cfgexpand.c:5277
>>>

With the attached patch, now I am running into Bootstrap comparison
failure. I am looking into it. Please review this version so that I can
address them while fixing this issue.

Thanks,
Kugan

>>
>> I am testing on gcc computefarm. I will get it to bootstrap and will do the
>> regression testing before posting the next version.
>>
>>> as hinted at above a bootstrap on i?86 (yes, 32bit) with
>>> --with-tune=pentiumpro might be another good testing candidate.
>>>
>>> +      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE |
>>> SSA_OP_DEF)
>>> +       promote_def_and_uses (def);
>>>
>>> it looks like you are doing some redundant work by walking both defs
>>> and uses of each stmt.  I'd say you should separate
>>> def and use processing and use
>>>
>>>    FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>>      promote_use (use);
>>>    FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
>>>      promote_def (def);
>>>
>>
>> Name promote_def_and_uses in my implementation is a bit confusing. It is
>> promoting the SSA_NAMEs. We only have to do that for the definitions if we
>> can do the SSA_NAMEs defined by parameters.
>>
>> I also have a bitmap to see if we have promoted a variable and avoid doing
>> it again. I will try to improve this.
>>
>>
>>
>>> this should make processing more efficient (memory local) compared to
>>> doing the split handling
>>> in promote_def_and_uses.
>>>
>>> I think it will be convenient to have a SSA name info structure where
>>> you can remember the original
>>> type a name was promoted from as well as whether it was promoted or
>>> not.  This way adjusting
>>> debug uses should be "trivial":
>>>
>>> +static unsigned int
>>> +fixup_uses (tree use, tree promoted_type, tree old_type)
>>> +{
>>> +  gimple *stmt;
>>> +  imm_use_iterator ui;
>>> +  gimple_stmt_iterator gsi;
>>> +  use_operand_p op;
>>> +
>>> +  FOR_EACH_IMM_USE_STMT (stmt, ui, use)
>>> +    {
>>> +      bool do_not_promote = false;
>>> +      switch (gimple_code (stmt))
>>> +       {
>>> +       case GIMPLE_DEBUG:
>>> +           {
>>> +             gsi = gsi_for_stmt (stmt);
>>> +             gsi_remove (&gsi, true);
>>>
>>> rather than doing the above you'd do sth like
>>>
>>>    SET_USE (use, fold_convert (old_type, new_def));
>>>    update_stmt (stmt);
>>>
>>
>> We do have these information (original type a name was promoted from as well
>> as whether it was promoted or not). To make it easy to review, in the patch
>> that adds the pass,I am removing these debug stmts. But in patch 4, I am
>> trying to handle this properly. Maybe  I should combine them.
> 
> Yeah, it's a bit confusing otherwise.
> 
>>> note that while you may not be able to use promoted regs at all uses
>>> (like calls or asms) you can promote all defs, if only with a compensation
>>> statement after the original def.  The SSA name info struct can be used
>>> to note down the actual SSA name holding the promoted def.
>>>
>>> The pass looks a lot better than last time (it's way smaller!) but
>>> still needs some
>>> improvements.  There are some more fishy details with respect to how you
>>> allocate/change SSA names but I think those can be dealt with once the
>>> basic structure looks how I like it to be.
>>>
>>
>> I will post an updated patch in a day or two.
> 
> Thanks,
> Richard.
> 
>> Thanks again,
>> Kugan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Optimize-ZEXT_EXPR-with-tree-vrp.patch
Type: text/x-diff
Size: 3520 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151102/ce9a88ac/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-type-promotion-pass.patch
Type: text/x-diff
Size: 33012 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151102/ce9a88ac/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-new-SEXT_EXPR-tree-code.patch
Type: text/x-diff
Size: 5068 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151102/ce9a88ac/attachment-0002.bin>


More information about the Gcc-patches mailing list