new sign/zero extension elimination pass

Kenneth Zadeck zadeck@naturalbridge.com
Thu Jul 12 12:05:00 GMT 2012


sorry about the two messages. i mis spelled the gcc-patches on the first 
try.

you are on the right track with the example but combine will not get 
this unless everything is in the same bb.
the whole point of having a separate pass for doing extension 
elimination is that it needs to be done over the entire function.

my example is also a little more complex because, since we are talking 
about induction vars, you have an initial assignment outside of a loop, 
and increment inside the loop and the test you describe at the bottom of 
the loop.

I would point out that with respect to speed optimizations, the case i 
am describing is in fact very important because getting code out of 
loops is were the important gains are.   I believe that the ppc has a 
some significant performance issues because of this kind of thing.

kenny


On 07/12/2012 05:20 AM, Tom de Vries wrote:
> Kenneth,
>
> I see I replied to your original message that had the wrong CC, I'm now CC-ing
> gcc-patches@gcc.gnu.org.
>
> Thanks,
> - Tom
>
> On 12/07/12 11:05, Tom de Vries wrote:
>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>> Tom,
>>>
>>> I have a problem with the approach that you have taken here.   I believe
>>> that this could be a very useful addition to gcc so I am in general very
>>> supportive, but i think you are missing an important case.
>>>
>>> My problem is that it the pass does not actually look at the target and
>>> make any decisions based on that target.
>>>
>>> for instance, we have a llp64 target.   As with many targets, the target
>>> has a rich set of compare and branch instructions.  In particular, it
>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>> upstream optimizations that take int (SI mode) index variables generate
>>> extension operations before doing 64 bit compare and branch
>>> instructions, even though there are 32 bit comparison and branches on
>>> the machine.     There are a lot of machines that can do more than one
>>> size of comparison.
>>>
>> 	 This optimization pass, as it is currently written will not remove those
>>> extensions because it believes that the length of the destination is the
>>> "final answer" unless it is wrapped in an explicit truncation.
>>> Instead it needs to ask the port if there is a shorted compare and
>>> branch instruction that does not cost more. in that case, those
>>> instructions should be rewritten to use the shorted compare and branch.
>>>
>>> There are many operations other than compare and branch where the pass
>>> should be asking "can i shorten the target for free and therefore get
>>> rid of the extension?"
>> Kenneth,
>>
>> I'm not sure I understand the optimization you're talking about, in particular
>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>> is the same.
>>
>> Assuming it's the same, my understanding is that you're talking about an example
>> like this:
>> ...
>>    (insn (set (reg:DI 5)
>>               (zero_extend:DI (reg:SI 4))))
>>
>>    (jump_insn (set (pc)
>>                    (if_then_else (eq (reg:DI 5)
>>                                      (const_int 0))
>>                                  (label_ref:DI 62)
>>                                  (pc))))
>>
>>    ->
>>
>>    (jump_insn (set (pc)
>>                    (if_then_else (eq (reg:SI 4)
>>                                      (const_int 0))
>>                                  (label_ref:DI 62)
>>                                  (pc))))
>>
>> ...
>> I would expect combine to optimize this.
>>
>> In case I got the example all backwards or it is a too simple one, please
>> provide an rtl example that illustrates the optimization.
>>
>> Thanks,
>> - Tom
>>
>>
>>>   right shifts, rotates, and stores are not in
>>> this class, but left shifts are as are all comparisons, compare and
>>> branches, conditional moves.   There may even be machines that have this
>>> for divide, but i do not know of any off the top of my head.
>>>
>>> What i am suggesting moves this pass into the target specific set of
>>> optimizations rather than target independent set, but at where this pass
>>> is to be put this is completely appropriate.    Any dest instruction
>>> where all of the operands have been extended should be checked to see if
>>> it was really necessary to use the longer form before doing the
>>> propagation pass.
>>>
>>> kenny
>>>
>>>
>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>> I profiled the pass on spec2000:
>>>>>>
>>>>>>                      -mabi=32     -mabi=64
>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>> total   (usr time):   919.30       879.26
>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>
>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>
>>>>>> Is it necessary to improve the runtime of this pass?
>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>> currently written, yours doesn't.
>>>>>
>>>> Eric,
>>>>
>>>> I've done the following:
>>>> - refactored the pass such that it now scans at most twice over all
>>>>    instructions.
>>>> - updated the patch to be applicable to current trunk
>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>    this thread), and added that one as test-case.
>>>> - added a part in the header comment illustrating the working of the pass
>>>>    on the motivating example.
>>>>
>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>
>>>> build and reg-tested on mips, mips64, and arm.
>>>>
>>>> OK for trunk?
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>
>>>> 	* ee.c: New file.
>>>> 	* tree-pass.h (pass_ee): Declare.
>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>> 	* timevar.def (TV_EE): New timevar.
>>>> 	* common.opt (fextension-elimination): New option.
>>>> 	* Makefile.in (ee.o): New rule.
>>>> 	* passes.c (pass_ee): Add it.
>>>>
>>>> 	* gcc.dg/extend-1.c: New test.
>>>> 	* gcc.dg/extend-2.c: Same.
>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>> 	* gcc.dg/extend-3.c: Same.
>>>> 	* gcc.dg/extend-4.c: Same.
>>>> 	* gcc.dg/extend-5.c: Same.
>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>> Index: gcc/tree-pass.h
>>>> ===================================================================
>>>> --- gcc/tree-pass.h (revision 189409)
>>>> +++ gcc/tree-pass.h (working copy)
>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>
>>>> extern struct rtl_opt_pass pass_expand;
>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>> +extern struct rtl_opt_pass pass_ee;
>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>> extern struct rtl_opt_pass pass_jump;
>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>> @@ -5,19 +5,19 @@
>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>
>>>> -NOMIPS16 int
>>>> -f (int n, int i)
>>>> +NOMIPS16 long int
>>>> +f (long int n, long int i)
>>>> {
>>>> -  int s = 0;
>>>> +  long int s = 0;
>>>>     for (; i & 1; i++)
>>>>       s += i;
>>>>     return s;
>>>> }
>>>>
>>>> -NOMIPS16 int
>>>> -g (int n, int i)
>>>> +NOMIPS16 long int
>>>> +g (long int n, long int i)
>>>> {
>>>> -  int s = 0;
>>>> +  long int s = 0;
>>>>     for (i = 0; i < n; i++)
>>>>       s += i;
>>>>     return s;
>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>> @@ -0,0 +1,16 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +unsigned char f(unsigned int a, int c)
>>>> +{
>>>> +  unsigned int b = a;
>>>> +  if (c)
>>>> +    b = a & 0x10ff;
>>>> +  return b;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>> +{
>>>> +  if (c)
>>>> +    *z = 0;
>>>> +  *p ^= (unsigned char)s;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +void f (short d[2][2])
>>>> +{
>>>> +  int d0 = d[0][0] + d[0][1];
>>>> +  int d1 = d[1][0] + d[1][1];
>>>> +  d[0][0] = d0 + d1;
>>>> +      d[0][1] = d0 - d1;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>> @@ -0,0 +1,20 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +/* { dg-require-effective-target ilp32 } */
>>>> +
>>>> +void f(unsigned char * p, short *s, int c)
>>>> +{
>>>> +  short or = 0;
>>>> +  while (c)
>>>> +    {
>>>> +      or = or | s[c];
>>>> +      c --;
>>>> +    }
>>>> +  *p = (unsigned char)or;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>> @@ -0,0 +1,20 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>> +/* { dg-require-effective-target mips64 } */
>>>> +
>>>> +void f(unsigned char * p, short *s, int c)
>>>> +{
>>>> +  short or = 0;
>>>> +  while (c)
>>>> +    {
>>>> +      or = or | s[c];
>>>> +      c --;
>>>> +    }
>>>> +  *p = (unsigned char)or;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>> +/* { dg-require-effective-target mips64 } */
>>>> +
>>>> +unsigned int f(unsigned char byte)
>>>> +{
>>>> +  return byte << 25;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/opts.c
>>>> ===================================================================
>>>> --- gcc/opts.c (revision 189409)
>>>> +++ gcc/opts.c (working copy)
>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>       { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>       { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>       { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>
>>>>       /* -O3 optimizations.  */
>>>>       { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>> Index: gcc/timevar.def
>>>> ===================================================================
>>>> --- gcc/timevar.def (revision 189409)
>>>> +++ gcc/timevar.def (working copy)
>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>> Index: gcc/ee.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/ee.c (revision 0)
>>>> @@ -0,0 +1,1190 @@
>>>> +/* Redundant extension elimination.
>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>> +the terms of the GNU General Public License as published by the Free
>>>> +Software Foundation; either version 3, or (at your option) any later
>>>> +version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>> +for more details.
>>>> +
>>>> +You should have received a copy of the GNU General Public License
>>>> +along with GCC; see the file COPYING3.  If not see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +/*
>>>> +
>>>> +  MOTIVATING EXAMPLE
>>>> +
>>>> +  The motivating example for this pass is the example from PR 40893:
>>>> +
>>>> +    void f (short d[2][2])
>>>> +    {
>>>> +      int d0 = d[0][0] + d[0][1];
>>>> +      int d1 = d[1][0] + d[1][1];
>>>> +      d[0][0] = d0 + d1;
>>>> +      d[0][1] = d0 - d1;
>>>> +    }
>>>> +
>>>> +  For MIPS, compilation results in the following insns.
>>>> +
>>>> +    (set (reg:SI 204)
>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>> +
>>>> +    (set (reg:SI 205)
>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>> +
>>>> +    (set (reg:SI 217)
>>>> +         (plus:SI (reg:SI 205)
>>>> +                  (reg:SI 204)))
>>>> +
>>>> +    (set (reg:SI 218)
>>>> +         (minus:SI (reg:SI 204)
>>>> +                   (reg:SI 205)))
>>>> +
>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>> +         (subreg:HI (reg:SI 217) 2))
>>>> +
>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>> +                 (const_int 2 [0x2])))
>>>> +         (subreg:HI (reg:SI 218) 2))
>>>> +
>>>> +
>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>> +  204 and 205, so the zero_extends are redundant.
>>>> +
>>>> +
>>>> +  INTENDED EFFECT
>>>> +
>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>> +  than the extension.
>>>> +
>>>> +
>>>> +  IMPLEMENTATION
>>>> +
>>>> +  The pass scans at most two times over all instructions.
>>>> +
>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>> +  done.
>>>> +
>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>> +  operands of the insns defining the pseudo.
>>>> +
>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>> +  of each reg, which allows us to find redundant extensions.
>>>> +
>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>> +  which we may find more redundant extensions.
>>>> +
>>>> +  Finally, redundant extensions are deleted or replaced.
>>>> +
>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>> +  of a paradoxical subreg instead.
>>>> +
>>>> +
>>>> +  ILLUSTRATION OF PASS
>>>> +
>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>> +
>>>> +  We find the 2 extensions:
>>>> +    found extension with preserved size 16 defining reg 204
>>>> +    found extension with preserved size 16 defining reg 205
>>>> +
>>>> +  We calculate the biggests uses of a register:
>>>> +    biggest_use
>>>> +    reg 204: size 32
>>>> +    reg 205: size 32
>>>> +    reg 217: size 16
>>>> +    reg 218: size 16
>>>> +
>>>> +  We propagate the biggest uses where possible:
>>>> +    propagations
>>>> +    205: 32 -> 16
>>>> +    204: 32 -> 16
>>>> +    214: 32 -> 16
>>>> +    215: 32 -> 16
>>>> +
>>>> +  We conclude that the extensions are redundant:
>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>> +
>>>> +  And we replace them with regcopies:
>>>> +    (set (reg:SI 204)
>>>> +        (reg:SI 213))
>>>> +
>>>> +    (set (reg:SI 205)
>>>> +        (reg:SI 216))
>>>> +
>>>> +
>>>> +  LIMITATIONS
>>>> +
>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>> +  done.
>>>> +
>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>> +  account, also the ones that are not uses of the extension.
>>>> +  The consideration is that using use-def chains will give a more precise
>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>> +
>>>> +#include "config.h"
>>>> +#include "system.h"
>>>> +#include "coretypes.h"
>>>> +#include "tm.h"
>>>> +#include "rtl.h"
>>>> +#include "tree.h"
>>>> +#include "tm_p.h"
>>>> +#include "flags.h"
>>>> +#include "regs.h"
>>>> +#include "hard-reg-set.h"
>>>> +#include "basic-block.h"
>>>> +#include "insn-config.h"
>>>> +#include "function.h"
>>>> +#include "expr.h"
>>>> +#include "insn-attr.h"
>>>> +#include "recog.h"
>>>> +#include "toplev.h"
>>>> +#include "target.h"
>>>> +#include "timevar.h"
>>>> +#include "optabs.h"
>>>> +#include "insn-codes.h"
>>>> +#include "rtlhooks-def.h"
>>>> +#include "output.h"
>>>> +#include "params.h"
>>>> +#include "timevar.h"
>>>> +#include "tree-pass.h"
>>>> +#include "cgraph.h"
>>>> +#include "vec.h"
>>>> +
>>>> +#define SKIP_REG (-1)
>>>> +#define NONE (-1)
>>>> +
>>>> +/* Number of registers at start of pass.  */
>>>> +
>>>> +static int n_regs;
>>>> +
>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>> +
>>>> +static int *biggest_use;
>>>> +
>>>> +/* Array to register the promoted subregs.  */
>>>> +
>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>> +
>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>> +
>>>> +static int *propagated_size;
>>>> +
>>>> +typedef struct use
>>>> +{
>>>> +  int regno;
>>>> +  int size;
>>>> +  int offset;
>>>> +  rtx *use;
>>>> +} use_type;
>>>> +
>>>> +DEF_VEC_O(use_type);
>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>> +
>>>> +/* Vector to register the uses.  */
>>>> +
>>>> +static VEC (use_type,heap) **uses;
>>>> +
>>>> +typedef struct prop
>>>> +{
>>>> +  rtx set;
>>>> +  int uses_regno;
>>>> +  int uses_index;
>>>> +} prop_type;
>>>> +
>>>> +DEF_VEC_O(prop_type);
>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>> +
>>>> +/* Vector to register the propagations.  */
>>>> +
>>>> +static VEC (prop_type,heap) **props;
>>>> +
>>>> +/* Work list for propragation.  */
>>>> +
>>>> +static VEC (int,heap) *wl;
>>>> +
>>>> +/* Array to register what regs are in the work list.  */
>>>> +
>>>> +static bool *in_wl;
>>>> +
>>>> +/* Vector that contains the extensions in the function.  */
>>>> +
>>>> +static VEC (rtx,heap) *extensions;
>>>> +
>>>> +/* Vector that contains the extensions in the function that are going to be
>>>> +   removed or replaced.  */
>>>> +
>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>> +
>>>> +/* Forward declaration.  */
>>>> +
>>>> +static void note_use (rtx *x, void *data);
>>>> +static bool skip_reg_p (int regno);
>>>> +static void register_prop (rtx set, use_type *use);
>>>> +
>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>> +
>>>> +static bool
>>>> +promoted_subreg_p (rtx subreg)
>>>> +{
>>>> +  return (GET_CODE (subreg) == SUBREG
>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>> +}
>>>> +
>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>> +   promotion.  */
>>>> +
>>>> +static bool
>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>> +{
>>>> +  int mre;
>>>> +
>>>> +  if (!promoted_subreg_p (subreg))
>>>> +    return false;
>>>> +
>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>> +  return mre != UNKNOWN;
>>>> +}
>>>> +
>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>> +   OFFSET.  Return true if successful.  */
>>>> +
>>>> +static bool
>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>> +{
>>>> +  rtx reg;
>>>> +
>>>> +  if (REG_P (use))
>>>> +    {
>>>> +      *regno = REGNO (use);
>>>> +      *offset = 0;
>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>> +      return true;
>>>> +    }
>>>> +  else if (GET_CODE (use) == SUBREG)
>>>> +    {
>>>> +      reg = SUBREG_REG (use);
>>>> +
>>>> +      if (!REG_P (reg))
>>>> +	return false;
>>>> +
>>>> +      *regno = REGNO (reg);
>>>> +
>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>> +	{
>>>> +	  *offset = 0;
>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  *offset = subreg_lsb (use);
>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>> +	}
>>>> +
>>>> +      return true;
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>> +
>>>> +static use_type *
>>>> +new_use (unsigned int regno)
>>>> +{
>>>> +  if (uses[regno] == NULL)
>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>> +
>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>> +
>>>> +  return VEC_last (use_type, uses[regno]);
>>>> +}
>>>> +
>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>> +
>>>> +static use_type *
>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>> +{
>>>> +  int *current;
>>>> +  use_type *p;
>>>> +
>>>> +  gcc_assert (size >= 0);
>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>> +
>>>> +  if (skip_reg_p (regno))
>>>> +    return NULL;
>>>> +
>>>> +  p = new_use (regno);
>>>> +  p->regno = regno;
>>>> +  p->size = size;
>>>> +  p->offset = offset;
>>>> +  p->use = use;
>>>> +
>>>> +  /* Update the bigest use.  */
>>>> +  current = &biggest_use[regno];
>>>> +  *current = MAX (*current, size);
>>>> +
>>>> +  return p;
>>>> +}
>>>> +
>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>> +
>>>> +static void
>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>> +{
>>>> +  const char *format_ptr;
>>>> +  int i, j;
>>>> +
>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>> +    if (format_ptr[i] == 'e')
>>>> +      note_use (&XEXP (use, i), pattern);
>>>> +    else if (format_ptr[i] == 'E')
>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>> +}
>>>> +
>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>> +
>>>> +static rtx
>>>> +get_set (rtx use, rtx pattern)
>>>> +{
>>>> +  rtx sub;
>>>> +  int i;
>>>> +
>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>> +    return pattern;
>>>> +
>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>> +      {
>>>> +	sub = XVECEXP (pattern, 0, i);
>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>> +	  return sub;
>>>> +      }
>>>> +
>>>> +  return NULL_RTX;
>>>> +}
>>>> +
>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>> +   an operand influences only the same bit or more significant bits in the
>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>> +
>>>> +static void
>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>> +{
>>>> +  unsigned int i, smallest;
>>>> +  int operand_size[2];
>>>> +  int operand_offset[2];
>>>> +  int used_size;
>>>> +  unsigned int operand_regno[2];
>>>> +  bool operand_reg[2];
>>>> +  bool operand_ignore[2];
>>>> +  use_type *p;
>>>> +
>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>> +  for (i = 0; i < nr_operands; ++i)
>>>> +    {
>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>> +      operand_ignore[i] = false;
>>>> +    }
>>>> +
>>>> +  /* Handle case of reg and-masked with const.  */
>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>> +    {
>>>> +      used_size =
>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>> +    }
>>>> +
>>>> +  /* Handle case of reg or-masked with const.  */
>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>> +    {
>>>> +      used_size =
>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>> +    }
>>>> +
>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>> +    for (i = 0; i < nr_operands; ++i)
>>>> +      operand_ignore[i] = (operand_reg[i]
>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>> +
>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>> +      && operand_size[0] != operand_size[1])
>>>> +    {
>>>> +      smallest = operand_size[0] > operand_size[1];
>>>> +
>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>> +    }
>>>> +
>>>> +  /* Register the operand use, if necessary.  */
>>>> +  for (i = 0; i < nr_operands; ++i)
>>>> +    if (!operand_reg[i])
>>>> +      note_use (&XEXP (use, i), pattern);
>>>> +    else if (!operand_ignore[i])
>>>> +      {
>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>> +			  &XEXP (use, i));
>>>> +	register_prop (set, p);
>>>> +      }
>>>> +}
>>>> +
>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>> +
>>>> +static void
>>>> +register_promoted_subreg (rtx subreg)
>>>> +{
>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>> +
>>>> +  if (promoted_subreg[index] == NULL)
>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>> +
>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>> +}
>>>> +
>>>> +/* Note promoted subregs in X.  */
>>>> +
>>>> +static int
>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>> +{
>>>> +  rtx subreg = *x;
>>>> +
>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>> +    register_promoted_subreg (subreg);
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>> +
>>>> +static void
>>>> +note_use (rtx *x, void *data)
>>>> +{
>>>> +  rtx use = *x;
>>>> +  rtx pattern = (rtx)data;
>>>> +  int use_size, use_offset;
>>>> +  unsigned int use_regno;
>>>> +  rtx set;
>>>> +  use_type *p;
>>>> +
>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>> +
>>>> +  set = get_set (use, pattern);
>>>> +
>>>> +  switch (GET_CODE (use))
>>>> +    {
>>>> +    case REG:
>>>> +    case SUBREG:
>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>> +      register_prop (set, p);
>>>> +      return;
>>>> +    case SIGN_EXTEND:
>>>> +    case ZERO_EXTEND:
>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>> +      register_prop (set, p);
>>>> +      return;
>>>> +    case IOR:
>>>> +    case AND:
>>>> +    case XOR:
>>>> +    case PLUS:
>>>> +    case MINUS:
>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>> +      return;
>>>> +    case NOT:
>>>> +    case NEG:
>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>> +      return;
>>>> +    case ASHIFT:
>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>> +			  use_offset, x);
>>>> +      return;
>>>> +    default:
>>>> +      note_embedded_uses (use, pattern);
>>>> +      return;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>> +
>>>> +static bool
>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>> +{
>>>> +#ifdef EPILOGUE_USES
>>>> +  if (EPILOGUE_USES (regno))
>>>> +    return true;
>>>> +#endif
>>>> +
>>>> +#ifdef EH_USES
>>>> +  if (EH_USES (regno))
>>>> +    return true;
>>>> +#endif
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>> +
>>>> +static bool
>>>> +skip_reg_p (int regno)
>>>> +{
>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>> +     We don't handle that properly.  */
>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>> +}
>>>> +
>>>> +/* Note the uses of argument registers in call INSN.  */
>>>> +
>>>> +static void
>>>> +note_call_uses (rtx insn)
>>>> +{
>>>> +  rtx link, link_expr;
>>>> +
>>>> +  if (!CALL_P (insn))
>>>> +    return;
>>>> +
>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>> +    {
>>>> +      link_expr = XEXP (link, 0);
>>>> +
>>>> +      if (GET_CODE (link_expr) == USE)
>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Dump the biggest uses found.  */
>>>> +
>>>> +static void
>>>> +dump_biggest_use (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  if (!dump_file)
>>>> +    return;
>>>> +
>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>> +
>>>> +  for (i = 0; i < n_regs; i++)
>>>> +    if (biggest_use[i] > 0)
>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>> +
>>>> +  fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Calculate the biggest use mode for all regs.  */
>>>> +
>>>> +static void
>>>> +calculate_biggest_use (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx insn;
>>>> +
>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>> +  FOR_EACH_BB (bb)
>>>> +    FOR_BB_INSNS (bb, insn)
>>>> +      {
>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>> +	  continue;
>>>> +
>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>> +
>>>> +	if (CALL_P (insn))
>>>> +	  note_call_uses (insn);
>>>> +      }
>>>> +
>>>> +  dump_biggest_use ();
>>>> +}
>>>> +
>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>> +
>>>> +static void
>>>> +register_prop (rtx set, use_type *use)
>>>> +{
>>>> +  prop_type *p;
>>>> +  int regno;
>>>> +
>>>> +  if (set == NULL_RTX || use == NULL)
>>>> +    return;
>>>> +
>>>> +  if (!REG_P (SET_DEST (set)))
>>>> +    return;
>>>> +
>>>> +  regno = REGNO (SET_DEST (set));
>>>> +
>>>> +  if (skip_reg_p (regno))
>>>> +    return;
>>>> +
>>>> +  if (props[regno] == NULL)
>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>> +
>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>> +  p = VEC_last (prop_type, props[regno]);
>>>> +  p->set = set;
>>>> +  p->uses_regno = use->regno;
>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>> +}
>>>> +
>>>> +/* Add REGNO to the worklist.  */
>>>> +
>>>> +static void
>>>> +add_to_wl (int regno)
>>>> +{
>>>> +  if (in_wl[regno])
>>>> +    return;
>>>> +
>>>> +  if (biggest_use[regno] > 0
>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>> +    return;
>>>> +
>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>> +    return;
>>>> +
>>>> +  if (propagated_size[regno] != NONE
>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>> +    return;
>>>> +
>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>> +  in_wl[regno] = true;
>>>> +}
>>>> +
>>>> +/* Pop a reg from the worklist and return it.  */
>>>> +
>>>> +static int
>>>> +pop_wl (void)
>>>> +{
>>>> +  int regno = VEC_pop (int, wl);
>>>> +  in_wl[regno] = false;
>>>> +  return regno;
>>>> +}
>>>> +
>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>> +
>>>> +static int
>>>> +propagate_size (int dest_size, use_type *p)
>>>> +{
>>>> +  if (dest_size == 0)
>>>> +    return 0;
>>>> +
>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>> +}
>>>> +
>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>> +
>>>> +static int
>>>> +get_biggest_use (unsigned int regno)
>>>> +{
>>>> +  int ix;
>>>> +  use_type *p;
>>>> +  int max = 0;
>>>> +
>>>> +  gcc_assert (uses[regno] != NULL);
>>>> +
>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>> +    max = MAX (max, p->size);
>>>> +
>>>> +  return max;
>>>> +}
>>>> +
>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>> +
>>>> +static void
>>>> +propagate_to_use (int dest_size, use_type *use)
>>>> +{
>>>> +  int new_use_size;
>>>> +  int prev_biggest_use;
>>>> +  int *current;
>>>> +
>>>> +  new_use_size = propagate_size (dest_size, use);
>>>> +
>>>> +  if (new_use_size >= use->size)
>>>> +    return;
>>>> +
>>>> +  use->size = new_use_size;
>>>> +
>>>> +  current = &biggest_use[use->regno];
>>>> +
>>>> +  prev_biggest_use = *current;
>>>> +  *current = get_biggest_use (use->regno);
>>>> +
>>>> +  if (*current >= prev_biggest_use)
>>>> +    return;
>>>> +
>>>> +  add_to_wl (use->regno);
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>> +	     *current);
>>>> +
>>>> +}
>>>> +
>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>> +   propagations in NR_PROPAGATIONS.  */
>>>> +
>>>> +static void
>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>> +{
>>>> +  int ix;
>>>> +  prop_type *p;
>>>> +
>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>> +
>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>> +    {
>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>> +      propagate_to_use (biggest_use[regno], use);
>>>> +      ++(*nr_propagations);
>>>> +    }
>>>> +
>>>> +  propagated_size[regno] = biggest_use[regno];
>>>> +}
>>>> +
>>>> +/* Improve biggest_use array iteratively.  */
>>>> +
>>>> +static void
>>>> +propagate (void)
>>>> +{
>>>> +  int i;
>>>> +  int nr_propagations = 0;
>>>> +
>>>> +  /* Initialize work list.  */
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    add_to_wl (i);
>>>> +
>>>> +  /* Work the work list.  */
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "propagations: \n");
>>>> +  while (!VEC_empty (int, wl))
>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>> +}
>>>> +
>>>> +/* Check whether this is a sign/zero extension.  */
>>>> +
>>>> +static bool
>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>> +{
>>>> +  rtx src, op0;
>>>> +
>>>> +  /* Detect set of reg.  */
>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>> +    return false;
>>>> +
>>>> +  src = SET_SRC (PATTERN (insn));
>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>> +
>>>> +  if (!REG_P (*dest))
>>>> +    return false;
>>>> +
>>>> +  /* Detect sign or zero extension.  */
>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>> +    {
>>>> +      op0 = XEXP (src, 0);
>>>> +
>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>> +      if (GET_CODE (src) == AND)
>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>> +      else
>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>> +
>>>> +      if (GET_CODE (op0) == SUBREG)
>>>> +	{
>>>> +	  if (subreg_lsb (op0) != 0)
>>>> +	    return false;
>>>> +
>>>> +	  *inner = SUBREG_REG (op0);
>>>> +
>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>> +	    return false;
>>>> +
>>>> +	  return true;
>>>> +	}
>>>> +      else if (REG_P (op0))
>>>> +	{
>>>> +	  *inner = op0;
>>>> +
>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>> +	    return false;
>>>> +
>>>> +	  return true;
>>>> +	}
>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>> +	{
>>>> +	  *inner = XEXP (op0, 0);
>>>> +	  return true;
>>>> +	}
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Find extensions and store them in the extensions vector.  */
>>>> +
>>>> +static bool
>>>> +find_extensions (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx insn, dest, inner;
>>>> +  int preserved_size;
>>>> +
>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>> +  FOR_EACH_BB (bb)
>>>> +    FOR_BB_INSNS (bb, insn)
>>>> +      {
>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>> +	  continue;
>>>> +
>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>> +	  continue;
>>>> +
>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>> +
>>>> +	if (dump_file)
>>>> +	  fprintf (dump_file,
>>>> +		   "found extension %u with preserved size %d defining"
>>>> +		   " reg %d\n",
>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>> +      }
>>>> +
>>>> +  if (dump_file)
>>>> +    {
>>>> +      if (!VEC_empty (rtx, extensions))
>>>> +	fprintf (dump_file, "\n");
>>>> +      else
>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>> +    }
>>>> +
>>>> +  return !VEC_empty (rtx, extensions);
>>>> +}
>>>> +
>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>> +
>>>> +static bool
>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>> +{
>>>> +  int biggest_dest_use;
>>>> +
>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>> +    gcc_unreachable ();
>>>> +
>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>> +
>>>> +  if (biggest_dest_use == SKIP_REG)
>>>> +    return false;
>>>> +
>>>> +  if (*preserved_size < biggest_dest_use)
>>>> +    return false;
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>> +   redundant_extensions vector.  */
>>>> +
>>>> +static void
>>>> +find_redundant_extensions (void)
>>>> +{
>>>> +  rtx insn, dest, inner;
>>>> +  int ix;
>>>> +  bool found = false;
>>>> +  int preserved_size;
>>>> +
>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>> +      {
>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>> +
>>>> +	if (dump_file)
>>>> +	  fprintf (dump_file,
>>>> +		   "found redundant extension %u with preserved size %d"
>>>> +		   " defining reg %d\n",
>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>> +	found = true;
>>>> +      }
>>>> +
>>>> +  if (dump_file && found)
>>>> +    fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Reset promotion of subregs or REG.  */
>>>> +
>>>> +static void
>>>> +reset_promoted_subreg (rtx reg)
>>>> +{
>>>> +  int ix;
>>>> +  rtx subreg;
>>>> +
>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>> +    return;
>>>> +
>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>> +    {
>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>> +    }
>>>> +
>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>> +}
>>>> +
>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>> +   writes to DEST.  */
>>>> +
>>>> +static void
>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>> +{
>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>> +
>>>> +  /* Check whether replacement is needed.  */
>>>> +  if (dest != inner)
>>>> +    {
>>>> +      start_sequence ();
>>>> +
>>>> +      /* Determine the proper replacement operation.  */
>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>> +	{
>>>> +	  cp_src = inner;
>>>> +	  cp_dest = dest;
>>>> +	}
>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>> +	{
>>>> +	  emit_clobber (dest);
>>>> +	  cp_src = inner;
>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>> +	  cp_dest = dest;
>>>> +	}
>>>> +
>>>> +      emit_move_insn (cp_dest, cp_src);
>>>> +
>>>> +      seq = get_insns ();
>>>> +      end_sequence ();
>>>> +
>>>> +      /* If the replacement is not supported, bail out.  */
>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>> +	  return;
>>>> +
>>>> +      /* Insert the replacement.  */
>>>> +      emit_insn_before (seq, insn);
>>>> +    }
>>>> +
>>>> +  /* Note replacement/removal in the dump.  */
>>>> +  if (dump_file)
>>>> +    {
>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>> +      if (dest != inner)
>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>> +      else
>>>> +	fprintf (dump_file, "removed\n");
>>>> +    }
>>>> +
>>>> +  /* Remove the extension.  */
>>>> +  delete_insn (insn);
>>>> +
>>>> +  reset_promoted_subreg (dest);
>>>> +}
>>>> +
>>>> +/* Setup the variables at the start of the pass.  */
>>>> +
>>>> +static void
>>>> +init_pass (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>> +
>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>> +  for (i = 0; i < n_regs; i++)
>>>> +    {
>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>> +      propagated_size[i] = NONE;
>>>> +    }
>>>> +
>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>> +
>>>> +  wl = VEC_alloc (int, heap, 50);
>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>> +
>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    {
>>>> +      uses[i] = NULL;
>>>> +      props[i] = NULL;
>>>> +      in_wl[i] = false;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>> +
>>>> +static void
>>>> +remove_redundant_extensions (void)
>>>> +{
>>>> +  rtx insn, dest, inner;
>>>> +  int preserved_size;
>>>> +  int ix;
>>>> +
>>>> +  if (!find_extensions ())
>>>> +    return;
>>>> +
>>>> +  calculate_biggest_use ();
>>>> +
>>>> +  find_redundant_extensions ();
>>>> +
>>>> +  if (!VEC_empty (rtx, extensions))
>>>> +    {
>>>> +      propagate ();
>>>> +
>>>> +      find_redundant_extensions ();
>>>> +    }
>>>> +
>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>> +
>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>> +    {
>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>> +    }
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Free the variables at the end of the pass.  */
>>>> +
>>>> +static void
>>>> +finish_pass (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  XDELETEVEC (propagated_size);
>>>> +
>>>> +  VEC_free (rtx, heap, extensions);
>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>> +
>>>> +  VEC_free (int, heap, wl);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    {
>>>> +      if (uses[i] != NULL)
>>>> +	VEC_free (use_type, heap, uses[i]);
>>>> +
>>>> +      if (props[i] != NULL)
>>>> +	VEC_free (prop_type, heap, props[i]);
>>>> +    }
>>>> +
>>>> +  XDELETEVEC (uses);
>>>> +  XDELETEVEC (props);
>>>> +  XDELETEVEC (biggest_use);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    if (promoted_subreg[i] != NULL)
>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>> +  XDELETEVEC (promoted_subreg);
>>>> +}
>>>> +
>>>> +/* Remove redundant extensions.  */
>>>> +
>>>> +static unsigned int
>>>> +rest_of_handle_ee (void)
>>>> +{
>>>> +  n_regs = max_reg_num ();
>>>> +
>>>> +  init_pass ();
>>>> +  remove_redundant_extensions ();
>>>> +  finish_pass ();
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>> +
>>>> +static bool
>>>> +gate_handle_ee (void)
>>>> +{
>>>> +  return (optimize > 0 && flag_ee);
>>>> +}
>>>> +
>>>> +struct rtl_opt_pass pass_ee =
>>>> +{
>>>> + {
>>>> +  RTL_PASS,
>>>> +  "ee",                                 /* name */
>>>> +  gate_handle_ee,                       /* gate */
>>>> +  rest_of_handle_ee,                    /* execute */
>>>> +  NULL,                                 /* sub */
>>>> +  NULL,                                 /* next */
>>>> +  0,                                    /* static_pass_number */
>>>> +  TV_EE,                                /* tv_id */
>>>> +  0,                                    /* properties_required */
>>>> +  0,                                    /* properties_provided */
>>>> +  0,                                    /* properties_destroyed */
>>>> +  0,                                    /* todo_flags_start */
>>>> +  TODO_ggc_collect |
>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>> + }
>>>> +};
>>>> Index: gcc/common.opt
>>>> ===================================================================
>>>> --- gcc/common.opt (revision 189409)
>>>> +++ gcc/common.opt (working copy)
>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>> Perform DWARF2 duplicate elimination
>>>>
>>>> +fextension-elimination
>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>> +Perform extension elimination
>>>> +
>>>> fipa-sra
>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>> Perform interprocedural reduction of aggregates
>>>> Index: gcc/Makefile.in
>>>> ===================================================================
>>>> --- gcc/Makefile.in (revision 189409)
>>>> +++ gcc/Makefile.in (working copy)
>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>> 	dwarf2asm.o \
>>>> 	dwarf2cfi.o \
>>>> 	dwarf2out.o \
>>>> +	ee.o \
>>>> 	ebitmap.o \
>>>> 	emit-rtl.o \
>>>> 	et-forest.o \
>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>      $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>      intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>      $(DF_H) $(CFGLOOP_H)
>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>      $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>      $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>> Index: gcc/passes.c
>>>> ===================================================================
>>>> --- gcc/passes.c (revision 189409)
>>>> +++ gcc/passes.c (working copy)
>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>         NEXT_PASS (pass_initialize_regs);
>>>>         NEXT_PASS (pass_ud_rtl_dce);
>>>>         NEXT_PASS (pass_combine);
>>>> +      NEXT_PASS (pass_ee);
>>>>         NEXT_PASS (pass_if_after_combine);
>>>>         NEXT_PASS (pass_partition_blocks);
>>>>         NEXT_PASS (pass_regmove);
>>>
>>
>




More information about the Gcc-patches mailing list