[0/7] Type promotion pass and elimination of zext/sext
Kugan
kugan.vivekanandarajah@linaro.org
Mon Nov 2 09:17:00 GMT 2015
On 29/10/15 02:45, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 1:50 AM, kugan
> <kugan.vivekanandarajah@linaro.org> wrote:
>>
>>
>> On 23/10/15 01:23, Richard Biener wrote:
>>>
>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan
>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 21/10/15 23:45, Richard Biener wrote:
>>>>>
>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/09/15 12:53, Kugan wrote:
>>>>>>>
>>>>>>>
>>>>>>> This a new version of the patch posted in
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>>>>>> more testing and spitted the patch to make it more easier to review.
>>>>>>> There are still couple of issues to be addressed and I am working on
>>>>>>> them.
>>>>>>>
>>>>>>> 1. AARCH64 bootstrap now fails with the commit
>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
>>>>>>> mis-compiled
>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a
>>>>>>> latent
>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>>>>>> time being, I am using patch
>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>>>>>> workaround. This meeds to be fixed before the patches are ready to be
>>>>>>> committed.
>>>>>>>
>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It
>>>>>>> works
>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as
>>>>>>> well.
>>>>>>
>>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> Now that stage 1 is going to close, I would like to get these patches
>>>>>> accepted for stage1. I will try my best to address your review comments
>>>>>> ASAP.
>>>>>
>>>>>
>>>>> Ok, can you make the whole patch series available so I can poke at the
>>>>> implementation a bit? Please state the revision it was rebased on
>>>>> (or point me to a git/svn branch the work resides on).
>>>>>
>>>>
>>>> Thanks. Please find the patched rebated against trunk@229156. I have
>>>> skipped the test-case readjustment patches.
>>>
>>>
>>> Some quick observations. On x86_64 when building
>>
>>
>> Hi Richard,
>>
>> Thanks for the review.
>>
>>>
>>> short bar (short y);
>>> int foo (short x)
>>> {
>>> short y = bar (x) + 15;
>>> return y;
>>> }
>>>
>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
>>> I get
>>>
>>> <bb 2>:
>>> _1 = (int) x_10(D);
>>> _2 = (_1) sext (16);
>>> _11 = bar (_2);
>>> _5 = (int) _11;
>>> _12 = (unsigned int) _5;
>>> _6 = _12 & 65535;
>>> _7 = _6 + 15;
>>> _13 = (int) _7;
>>> _8 = (_13) sext (16);
>>> _9 = (_8) sext (16);
>>> return _9;
>>>
>>> which looks fine but the VRP optimization doesn't trigger for the
>>> redundant sext
>>> (ranges are computed correctly but the 2nd extension is not removed).
Thanks for the comments. Please fond the attached patches with which I
am now getting
cat .192t.optimized
;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
symbol_order=0)
foo (short int x)
{
signed int _1;
int _2;
signed int _5;
unsigned int _6;
unsigned int _7;
signed int _8;
int _9;
short int _11;
unsigned int _12;
signed int _13;
<bb 2>:
_1 = (signed int) x_10(D);
_2 = _1;
_11 = bar (_2);
_5 = (signed int) _11;
_12 = (unsigned int) _11;
_6 = _12 & 65535;
_7 = _6 + 15;
_13 = (signed int) _7;
_8 = (_13) sext (16);
_9 = _8;
return _9;
}
There are still some redundancies. The asm difference after RTL
optimizations is
- addl $15, %eax
+ addw $15, %ax
>>>
>>> This also makes me notice trivial match.pd patterns are missing, like
>>> for example
>>>
>>> (simplify
>>> (sext (sext@2 @0 @1) @3)
>>> (if (tree_int_cst_compare (@1, @3) <= 0)
>>> @2
>>> (sext @0 @3)))
>>>
>>> as VRP doesn't run at -O1 we must rely on those to remove rendudant
>>> extensions,
>>> otherwise generated code might get worse compared to without the pass(?)
>>
>>
>> Do you think that we should enable this pass only when vrp is enabled.
>> Otherwise, even when we do the simple optimizations you mentioned below, we
>> might not be able to remove all the redundancies.
>>
>>>
>>> I also notice that the 'short' argument does not get it's sign-extension
>>> removed
>>> as redundand either even though we have
>>>
>>> _1 = (int) x_8(D);
>>> Found new range for _1: [-32768, 32767]
>>>
>>
>> I am looking into it.
>>
>>> In the end I suspect that keeping track of the "simple" cases in the
>>> promotion
>>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP
>>> to do
>>> its work). In some way whether the ABI guarantees promoted argument
>>> registers might need some other target hook queries.
I tried adding it in the attached patch with record_visit_stmt to track
whether an ssa would have value overflow or properly zero/sign extended
in promoted mode. We can use this to eliminate some of the zero/sign
extension at gimple level. As it is, it doesn't do much. If this is what
you had in mind, I will extend it based on your feedback.
>>>
>>> Now onto the 0002 patch.
>>>
>>> +static bool
>>> +type_precision_ok (tree type)
>>> +{
>>> + return (TYPE_PRECISION (type) == 8
>>> + || TYPE_PRECISION (type) == 16
>>> + || TYPE_PRECISION (type) == 32);
>>> +}
>>>
>>> that's a weird function to me. You probably want
>>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>>> here? And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?
>>>
>>
>> I will change this. (I have a patch which I am testing with other changes
>> you have asked for)
>>
>>
>>> +/* Return the promoted type for TYPE. */
>>> +static tree
>>> +get_promoted_type (tree type)
>>> +{
>>> + tree promoted_type;
>>> + enum machine_mode mode;
>>> + int uns;
>>> + if (POINTER_TYPE_P (type)
>>> + || !INTEGRAL_TYPE_P (type)
>>> + || !type_precision_ok (type))
>>> + return type;
>>> +
>>> + mode = TYPE_MODE (type);
>>> +#ifdef PROMOTE_MODE
>>> + uns = TYPE_SIGN (type);
>>> + PROMOTE_MODE (mode, uns, type);
>>> +#endif
>>> + uns = TYPE_SIGN (type);
>>> + promoted_type = lang_hooks.types.type_for_mode (mode, uns);
>>> + if (promoted_type
>>> + && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
>>> + type = promoted_type;
>>>
>>> I think what you want to verify is that TYPE_PRECISION (promoted_type)
>>> == GET_MODE_PRECISION (mode).
>>> And to not even bother with this simply use
>>>
>>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
>>> uns);
>>>
>>
>> I am changing this too.
>>
>>> You use a domwalk but also might create new basic-blocks during it
>>> (insert_on_edge_immediate), that's a
>>> no-no, commit edge inserts after the domwalk.
>>
>>
>> I am sorry, I dont understand "commit edge inserts after the domwalk" Is
>> there a way to do this in the current implementation?
>
> Yes, simply use gsi_insert_on_edge () and after the domwalk is done do
> gsi_commit_edge_inserts ().
>
>>> ssa_sets_higher_bits_bitmap looks unused and
>>> we generally don't free dominance info, so please don't do that.
>>>
>>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc
>>> with
>>>
>>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/
>>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/
>>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
>>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem
>>> /usr/local/powerpc64-unknown-linux-gnu/sys-include -g -O2 -O2 -g
>>> -O2 -DIN_GCC -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual
>>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes
>>> -Wold-style-definition -isystem ./include -fPIC -mlong-double-128
>>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc
>>> -fno-stack-protector -fPIC -mlong-double-128 -mno-minimal-toc -I.
>>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.
>>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include
>>> -I../../../trunk/libgcc/../libdecnumber/dpd
>>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS -o _divdi3.o
>>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c
>>> ../../../trunk/libgcc/libgcc2.c \
>>> -fexceptions -fnon-call-exceptions -fvisibility=hidden
>>> -DHIDE_EXPORTS
>>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0:
>>> ../../../trunk/libgcc/libgcc2.c: In function â__divti3â:
>>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in
>>> expand_debug_locations, at cfgexpand.c:5277
>>>
With the attached patch, now I am running into Bootstrap comparison
failure. I am looking into it. Please review this version so that I can
address them while fixing this issue.
Thanks,
Kugan
>>
>> I am testing on gcc computefarm. I will get it to bootstrap and will do the
>> regression testing before posting the next version.
>>
>>> as hinted at above a bootstrap on i?86 (yes, 32bit) with
>>> --with-tune=pentiumpro might be another good testing candidate.
>>>
>>> + FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE |
>>> SSA_OP_DEF)
>>> + promote_def_and_uses (def);
>>>
>>> it looks like you are doing some redundant work by walking both defs
>>> and uses of each stmt. I'd say you should separate
>>> def and use processing and use
>>>
>>> FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>> promote_use (use);
>>> FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
>>> promote_def (def);
>>>
>>
>> Name promote_def_and_uses in my implementation is a bit confusing. It is
>> promoting the SSA_NAMEs. We only have to do that for the definitions if we
>> can do the SSA_NAMEs defined by parameters.
>>
>> I also have a bitmap to see if we have promoted a variable and avoid doing
>> it again. I will try to improve this.
>>
>>
>>
>>> this should make processing more efficient (memory local) compared to
>>> doing the split handling
>>> in promote_def_and_uses.
>>>
>>> I think it will be convenient to have a SSA name info structure where
>>> you can remember the original
>>> type a name was promoted from as well as whether it was promoted or
>>> not. This way adjusting
>>> debug uses should be "trivial":
>>>
>>> +static unsigned int
>>> +fixup_uses (tree use, tree promoted_type, tree old_type)
>>> +{
>>> + gimple *stmt;
>>> + imm_use_iterator ui;
>>> + gimple_stmt_iterator gsi;
>>> + use_operand_p op;
>>> +
>>> + FOR_EACH_IMM_USE_STMT (stmt, ui, use)
>>> + {
>>> + bool do_not_promote = false;
>>> + switch (gimple_code (stmt))
>>> + {
>>> + case GIMPLE_DEBUG:
>>> + {
>>> + gsi = gsi_for_stmt (stmt);
>>> + gsi_remove (&gsi, true);
>>>
>>> rather than doing the above you'd do sth like
>>>
>>> SET_USE (use, fold_convert (old_type, new_def));
>>> update_stmt (stmt);
>>>
>>
>> We do have these information (original type a name was promoted from as well
>> as whether it was promoted or not). To make it easy to review, in the patch
>> that adds the pass,I am removing these debug stmts. But in patch 4, I am
>> trying to handle this properly. Maybe I should combine them.
>
> Yeah, it's a bit confusing otherwise.
>
>>> note that while you may not be able to use promoted regs at all uses
>>> (like calls or asms) you can promote all defs, if only with a compensation
>>> statement after the original def. The SSA name info struct can be used
>>> to note down the actual SSA name holding the promoted def.
>>>
>>> The pass looks a lot better than last time (it's way smaller!) but
>>> still needs some
>>> improvements. There are some more fishy details with respect to how you
>>> allocate/change SSA names but I think those can be dealt with once the
>>> basic structure looks how I like it to be.
>>>
>>
>> I will post an updated patch in a day or two.
>
> Thanks,
> Richard.
>
>> Thanks again,
>> Kugan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Optimize-ZEXT_EXPR-with-tree-vrp.patch
Type: text/x-diff
Size: 3520 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151102/ce9a88ac/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-type-promotion-pass.patch
Type: text/x-diff
Size: 33012 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151102/ce9a88ac/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-new-SEXT_EXPR-tree-code.patch
Type: text/x-diff
Size: 5068 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151102/ce9a88ac/attachment-0002.bin>
More information about the Gcc-patches
mailing list