This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa] A pass to remove a large number of casts in C++ code


This new pass does a couple of things dealing with casts.
From the comments of the new file which explains what it tries does:
It tries to convert (OTHER_INT_TYPE)(x) CMP CONST into a true or false if the
CONST is outside of min/max of OTHER_INT_TYPE.


It also tries to remove the casts after an ADDR_EXPR aka (type &)&a is transformed
into &a where the resulting type is type&.


It also tries to remove the casts after an ADDR_EXPR of an field reference aka
(type&)&a->b. Note this should not happen until we disable the lowering in the
C and C++ front-ends.


It changes (INT_TYPE)!a into a == 0 iff a is an INT_TYPE, the reason why we do
not change it into !a where the type of the ! expression is because fold converts
(int)(bool)!a into !(int)(bool)a which is wrong, I have not figured out how to get
around this problem (but it might be fixed by the second next transformation which
is done, I have not tried it after adding that one).


It transforms (INT_TYPE)(x CMP y) into x CMP y with a type of INT_TYPE
(INT_TYPE can and will also be a boolean type).


It transforms (bool)x into x != 0 iff x is of integer type which helps out jump
threading, it also sometimes removes the need for a tempary variable.


It transforms (type1)(type2)a to (type1)a or just a where a is of type1
and type2's precission is larger than or equal to type1's precission.


This patch fixes the following PR's:
14729 semi high abstraction penalty due to address expressions with cast to a reference type.
14820 missed jump threading case due to boolean OR create BB and then a cast. (This shows up in GCC)
Also part of this is done at the RTL level but always, the example I give is not done at the RTL level.
14728 missed jump threading due to functions which are inlined and return bool types are casted first to
int and then casted back to bool by the inliner.
And the last cast of PR 14753 which talks about turning (_Bool)a into a != 0, eliminating a temporary variable.


As evident by fixing PR 14729, this patch fixes a large number of C++ performance problems.
It also helps DOM removes some basic blocks where were added because of gimplification creates then
when doing something like "bool temp = t || g || h;" (which shows up in GCC itself).


There is one regression gfortran.fortran-torture/compile/arrayio.f90 but it is not caused by the pass at all
really. The bug is introduced by a later pass (DOM1) which combining the following trees:
T.24_36 = &a - 24;
T.25_37 = T.23_35 + T.24_36;
into
T.25_37 = (int4 *)T.23_35 + &a - 24B;
which is wrong.
There are two PR related to this bug, PR 14468 and PR 14672, both are caused
before getting to DOM but both look very much related.


Tested on powerpc-apple-darwin with no regressions except for the above one.
Also tested on i686-pc-linux-gnu with no regressions except for the above one.
Also tested on x86_64 by Steven B. with no regressions except for the above one.


Even though the tree-ssa is in a freeze this improves C++ so much I was asked by Diego to submit it.

For PR 8361 at -O3, the pass itself takes 0.1 seconds (and that is with checking turned on) out of
66.57 seconds. Also without this pass it takes 65.56 seconds less (still with checking) but the
TREE-SSA verifier (which is not enabled without checking) takes an extra second (3.49 to 4.57).
So this pass does not take extra time at all with checking disabled.


OK?

Thanks,
Andrew Pinski


ChangeLog:


	* tree-ssa-cast.c: New file.
	* Makefile (OBJS-common): Add tree-ssa-cast.o
	* tree-optimize.c (init_tree_optimization_passes):
	Add pass_cast four times.
	* tree-pass.h (pass_cast): Declare.
	* timevar.def (TV_TREE_CAST): Define.

testsuite/ChangeLog:

	* gcc.dg/tree-ssa/cast-1.c: New test.
	* gcc.dg/tree-ssa/cast-2.c: New test.
	* gcc.dg/tree-ssa/cast-3.c: New test.
	* gcc.dg/tree-ssa/cast-4.c: New test.
	* gcc.dg/tree-ssa/cast-5.c: New test.
	* gcc.dg/tree-ssa/cast-6.c: New test.
	* gcc.dg/tree-ssa/cast-7.c: New test.
	* gcc.dg/tree-ssa/cast-8.c: New test.

Patch:

Attachment: cast.patch.txt
Description: Text document





PS here is the best example of what this patch can do for PR 8361:
<L34>:;
- <D193585> = (ptrdiff_t)index;
- __n = (const ptrdiff_t &)&<D193585>;
- <D193593> = (const TYPE *)this.14553->_M_start;
- __i = (const TYPE * const &)&<D193593>;
- <D193587>._M_current = *__i;
- this = (struct __normal_iterator<constPREDICATE_NAMES::TYPE*,std:: vector<PREDICATE_NAMES::TYPE, std::allocator<PREDI
CATE_NAMES::TYPE> > > * const)&<D193587>;
- <D193606> = this->_M_current + (const TYPE *)((unsigned int)*__n * 4);
- __i = (const TYPE * const &)&<D193606>;
- <D193584>._M_current = *__i;
- this = (struct __normal_iterator<constPREDICATE_NAMES::TYPE*,std:: vector<PREDICATE_NAMES::TYPE, std::allocator<PREDI
CATE_NAMES::TYPE> > > * const)&<D193584>;
- if ((bool)(int)(bool)((int)*(const TYPE &)this->_M_current == 4)) goto <L370>; else goto <L42>;
-
-<L370>:;
- <D193499> = 1;
- goto <bb 11> (<L53>);
+ <D193469> = (const TYPE *)this.14553->_M_start;
+ if (*(const TYPE &)(<D193469> + (const TYPE *)(index * 4)) == 4) goto <L309>; else goto <L42>;



Another good example:
<L227>:;
- this = (struct __normal_iterator<constTERM*,std::vector<TERM, std::allocator<TERM> > > * const)&i;
- this = (struct TERM *)(struct TERM &)this->_M_current;
+ this = i._M_current;
T.897 = this->type;
switch ((int)T.897)


And another:
- <D196011> = (ptrdiff_t)index;
- __n = (const ptrdiff_t &)&<D196011>;
- <D196019> = (const TYPE *)this.14553->_M_start;
- __i = (const TYPE * const &)&<D196019>;
- <D196013>._M_current = *__i;
- this = (struct __normal_iterator<constPREDICATE_NAMES::TYPE*,std:: vector<PREDICATE_NAMES::TYPE, std::allocator<PREDI
CATE_NAMES::TYPE> > > * const)&<D196013>;
- <D196032> = this->_M_current + (const TYPE *)((unsigned int)*__n * 4);
- __i = (const TYPE * const &)&<D196032>;
- <D196010>._M_current = *__i;
- this = (struct __normal_iterator<constPREDICATE_NAMES::TYPE*,std:: vector<PREDICATE_NAMES::TYPE, std::allocator<PREDI
CATE_NAMES::TYPE> > > * const)&<D196010>;
- if ((bool)(int)(bool)((int)*(const TYPE &)this->_M_current == 4) == 0) goto <L20>; else goto <L21>;
+ if ((int)*(const TYPE &)((const TYPE *)this.14553->_M_start + (const TYPE *)(index * 4)) != 4) goto <L20>; else goto
<L21>;


And the dump size is about a half the size with the pass:
-rw-r--r-- 1 pinskia tension 3.7M Apr 2 22:56 pr8361.ii.cast
-rw-r--r-- 1 pinskia tension 7.4M Apr 2 22:58 pr8361.ii.noncast

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]