This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [0/7] Type promotion pass and elimination of zext/sext




On 23/10/15 01:23, Richard Biener wrote:
On Thu, Oct 22, 2015 at 12:50 PM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:


On 21/10/15 23:45, Richard Biener wrote:
On Tue, Oct 20, 2015 at 10:03 PM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:


On 07/09/15 12:53, Kugan wrote:

This a new version of the patch posted in
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
more testing and spitted the patch to make it more easier to review.
There are still couple of issues to be addressed and I am working on them.

1. AARCH64 bootstrap now fails with the commit
94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled
in stage2 and fwprop.c is failing. It looks to me that there is a latent
issue which gets exposed my patch. I can also reproduce this in x86_64
if I use the same PROMOTE_MODE which is used in aarch64 port. For the
time being, I am using  patch
0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
workaround. This meeds to be fixed before the patches are ready to be
committed.

2. vector-compare-1.c from c-c++-common/torture fails to assemble with
-O3 -g Error: unaligned opcodes detected in executable segment. It works
fine if I remove the -g. I am looking into it and needs to be fixed as well.

Hi Richard,

Now that stage 1 is going to close, I would like to get these patches
accepted for stage1. I will try my best to address your review comments
ASAP.

Ok, can you make the whole patch series available so I can poke at the
implementation a bit?  Please state the revision it was rebased on
(or point me to a git/svn branch the work resides on).


Thanks. Please find the patched rebated against trunk@229156. I have
skipped the test-case readjustment patches.

Some quick observations.  On x86_64 when building

Hi Richard,

Thanks for the review.

short bar (short y);
int foo (short x)
{
   short y = bar (x) + 15;
   return y;
}

with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
I get

   <bb 2>:
   _1 = (int) x_10(D);
   _2 = (_1) sext (16);
   _11 = bar (_2);
   _5 = (int) _11;
   _12 = (unsigned int) _5;
   _6 = _12 & 65535;
   _7 = _6 + 15;
   _13 = (int) _7;
   _8 = (_13) sext (16);
   _9 = (_8) sext (16);
   return _9;

which looks fine but the VRP optimization doesn't trigger for the redundant sext
(ranges are computed correctly but the 2nd extension is not removed).

This also makes me notice trivial match.pd patterns are missing, like
for example

(simplify
  (sext (sext@2 @0 @1) @3)
  (if (tree_int_cst_compare (@1, @3) <= 0)
   @2
   (sext @0 @3)))

as VRP doesn't run at -O1 we must rely on those to remove rendudant extensions,
otherwise generated code might get worse compared to without the pass(?)

Do you think that we should enable this pass only when vrp is enabled. Otherwise, even when we do the simple optimizations you mentioned below, we might not be able to remove all the redundancies.


I also notice that the 'short' argument does not get it's sign-extension removed
as redundand either even though we have

_1 = (int) x_8(D);
Found new range for _1: [-32768, 32767]


I am looking into it.

In the end I suspect that keeping track of the "simple" cases in the promotion
pass itself (by keeping a lattice) might be a good idea (after we fix VRP to do
its work).  In some way whether the ABI guarantees promoted argument
registers might need some other target hook queries.

Now onto the 0002 patch.

+static bool
+type_precision_ok (tree type)
+{
+  return (TYPE_PRECISION (type)  == 8
+         || TYPE_PRECISION (type) == 16
+         || TYPE_PRECISION (type) == 32);
+}

that's a weird function to me.  You probably want
TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?


I will change this. (I have a patch which I am testing with other changes you have asked for)

+/* Return the promoted type for TYPE.  */
+static tree
+get_promoted_type (tree type)
+{
+  tree promoted_type;
+  enum machine_mode mode;
+  int uns;
+  if (POINTER_TYPE_P (type)
+      || !INTEGRAL_TYPE_P (type)
+      || !type_precision_ok (type))
+    return type;
+
+  mode = TYPE_MODE (type);
+#ifdef PROMOTE_MODE
+  uns = TYPE_SIGN (type);
+  PROMOTE_MODE (mode, uns, type);
+#endif
+  uns = TYPE_SIGN (type);
+  promoted_type = lang_hooks.types.type_for_mode (mode, uns);
+  if (promoted_type
+      && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
+    type = promoted_type;

I think what you want to verify is that TYPE_PRECISION (promoted_type)
== GET_MODE_PRECISION (mode).
And to not even bother with this simply use

promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), uns);


I am changing this too.

You use a domwalk but also might create new basic-blocks during it
(insert_on_edge_immediate), that's a
no-no, commit edge inserts after the domwalk.

I am sorry, I dont understand "commit edge inserts after the domwalk" Is there a way to do this in the current implementation?

ssa_sets_higher_bits_bitmap looks unused and
we generally don't free dominance info, so please don't do that.

I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc with

/abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/
-B/usr/local/powerpc64-unknown-linux-gnu/bin/
-B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
/usr/local/powerpc64-unknown-linux-gnu/include -isystem
/usr/local/powerpc64-unknown-linux-gnu/sys-include    -g -O2 -O2  -g
-O2 -DIN_GCC    -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual
-Wno-format -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128
-mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc
-fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I.
-I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.
-I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include
-I../../../trunk/libgcc/../libdecnumber/dpd
-I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _divdi3.o
-MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c
../../../trunk/libgcc/libgcc2.c \
           -fexceptions -fnon-call-exceptions -fvisibility=hidden -DHIDE_EXPORTS
In file included from ../../../trunk/libgcc/libgcc2.c:56:0:
../../../trunk/libgcc/libgcc2.c: In function â__divti3â:
../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in
expand_debug_locations, at cfgexpand.c:5277


I am testing on gcc computefarm. I will get it to bootstrap and will do the regression testing before posting the next version.

as hinted at above a bootstrap on i?86 (yes, 32bit) with
--with-tune=pentiumpro might be another good testing candidate.

+      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE | SSA_OP_DEF)
+       promote_def_and_uses (def);

it looks like you are doing some redundant work by walking both defs
and uses of each stmt.  I'd say you should separate
def and use processing and use

   FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)
     promote_use (use);
   FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
     promote_def (def);


Name promote_def_and_uses in my implementation is a bit confusing. It is promoting the SSA_NAMEs. We only have to do that for the definitions if we can do the SSA_NAMEs defined by parameters.

I also have a bitmap to see if we have promoted a variable and avoid doing it again. I will try to improve this.


this should make processing more efficient (memory local) compared to
doing the split handling
in promote_def_and_uses.

I think it will be convenient to have a SSA name info structure where
you can remember the original
type a name was promoted from as well as whether it was promoted or
not.  This way adjusting
debug uses should be "trivial":

+static unsigned int
+fixup_uses (tree use, tree promoted_type, tree old_type)
+{
+  gimple *stmt;
+  imm_use_iterator ui;
+  gimple_stmt_iterator gsi;
+  use_operand_p op;
+
+  FOR_EACH_IMM_USE_STMT (stmt, ui, use)
+    {
+      bool do_not_promote = false;
+      switch (gimple_code (stmt))
+       {
+       case GIMPLE_DEBUG:
+           {
+             gsi = gsi_for_stmt (stmt);
+             gsi_remove (&gsi, true);

rather than doing the above you'd do sth like

   SET_USE (use, fold_convert (old_type, new_def));
   update_stmt (stmt);


We do have these information (original type a name was promoted from as well as whether it was promoted or not). To make it easy to review, in the patch that adds the pass,I am removing these debug stmts. But in patch 4, I am trying to handle this properly. Maybe I should combine them.

note that while you may not be able to use promoted regs at all uses
(like calls or asms) you can promote all defs, if only with a compensation
statement after the original def.  The SSA name info struct can be used
to note down the actual SSA name holding the promoted def.

The pass looks a lot better than last time (it's way smaller!) but
still needs some
improvements.  There are some more fishy details with respect to how you
allocate/change SSA names but I think those can be dealt with once the
basic structure looks how I like it to be.


I will post an updated patch in a day or two.

Thanks again,
Kugan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]