This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [tree-ssa] A pass to remove a large number of casts in C++ code
- From: Andrew Pinski <pinskia at physics dot uc dot edu>
- To: Diego Novillo <dnovillo at redhat dot com>
- Cc: "gcc-patches at gcc dot gnu dot org Patches" <gcc-patches at gcc dot gnu dot org>, steven at gcc dot gnu dot org, Kazu Hirata <kazu at cs dot umass dot edu>, Andrew Pinski <pinskia at physics dot uc dot edu>
- Date: Fri, 9 Apr 2004 13:46:28 -0400
- Subject: Re: [tree-ssa] A pass to remove a large number of casts in C++ code
- References: <13D4C492-8529-11D8-BD72-000393A6D2F2@physics.uc.edu> <1081139414.3172.118.camel@localhost.localdomain>
On Apr 5, 2004, at 00:30, Diego Novillo wrote:
Andrew,
Good stuff. Thanks. A couple of things that I'd like to address
before
we commit to this:
1. The patch is poorly formatted and contains several typos and
grammar problems. Please format everything so that it fits on
80 columns. Functions need to have documentation about each of
their arguments and you'll need to add some spacing to make
things more clear.
I figured that when I submitted it, that I did not take my time in
writing the
comments; I am cleaning it up right now. I also had one big function
with
almost no comments at the point when you asked me to submit it so it was
much worse then. I had forgot about documenting the arguments, I need
more practice at commenting code :).
2. Why did you implement it as a separate pass? The
transformations use almost no data flow information. Wouldn't
it be better to implement these routines as subroutines of
fold_stmt()? I want to understand what led you to choose this
route. It doesn't seem to take long, but it does require a
full
IL scan, and since all the transformations are related to
"folding", perhaps they belong there?
It cannot be in fold_stmt is because later on I would like to be able to
create some PHI nodes where the variable has a different type than what
the
original variable had so we optimize PR 14820 and PR 14843. Both have
to do
with selecting the right type in the first place either by the coder
(14843)
or by the gimplifier (14820).
It is run multiple times and right after PHI-OPT. We still catch cast
elimination, mainly in C++ code even in the third and sometimes even at
the
fourth and final pass.
Also after the tree-ssa gets merged into the mainline, I will submit a
patch for PHI-OPT which handles cases like the following:
int a(int i)
{
int t;
if (i > 10)
t = 1;
else
t = 0;
return t;
}
To turn it into:
int a(int i)
{
int t;
_Bool t1;
t1 = i >10;
t = (int)t;
return t;
}
But it is not handled right now as PHI-OPT does different types right
now. So we
will want to clean up after PHI-OPT and other passes too.
Cleaning up after phiopt will help out also as then jump threading will
not be confused.
Note the previous case gets turned into (with the cast pass) and this
is PR 14466:
int a(int i)
{
int t;
t = i >10;
return t;
}
3. If we decide to have it as a separate pass, it should be
documented in passes.texi (I think there are other tree-ssa
passes missing from passes.texi that we will need to add before
the merge).
Okay I will document it then (and my other passes which I will add in
the
future and the ones I already added to the lno branch). I can also add
more
documentation for the PHI node optimizations.
I did some tests over the weekend and it looks pretty decent,
particularly for some C++ codes:
* For DLV, code size was reduced by 1.8% and compile time reduced
by 2.7%.
* For cc1-i-files, there was a 0.2% code reduction and almost no
reduction in compile time (less than a second).
* For tramp3d-v3.cpp (compiled with -O2) I noticed no change in
compile time, code size was reduced by 0.4% and run time was
reduced by 1.5% (from 6.8s/it to 6.7s/it).
Thanks for testing the patch and see that it improves the code
generation
and compile time which I figured it would do but I did hardly any
timings
myself. Also I noticed for DLV, the *.vars dump file was half the size
with
my patch, I did not look at the first rtl dump file but I would assume
it
was also much smaller also and I noticed there were less basic blocks in
the dump file.
SPEC2000 results are within the usual values: gzip, perlbmk and twolf
are the best performers, but we lose some in crafty and eon. Overall,
the scores are very similar, though.
Estimated
Estimated
Base Base Base Peak Peak
Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time
Ratio
------------ -------- -------- -------- -------- --------
--------
164.gzip 1400 220 637* 1400 218
643*
175.vpr 1400 336 417* 1400 332
422*
176.gcc X
X
181.mcf 1800 412 437* 1800 429
435*
186.crafty 1000 153 652* 1000 155
645*
197.parser 1800 314 573* 1800 319
564*
252.eon 1300 228 569* 1300 242
537*
253.perlbmk 1800 242 743* 1800 231
778*
254.gap 1100 152 725* 1100 149
737*
255.vortex 1900 223 852* 1900 224
850*
256.bzip2 1500 284 528* 1500 284
528*
300.twolf 3000 561 534* 3000 555
540*
Est. SPECint_base2000 593
Est. SPECint2000
593
The eon result is 6% which means that it needs to looked into but when
I looked into
dump files they were about the same, I need to look again. This might
also just be
a normal variation that I did not know about before.
I will add a flag to make my life easier and make it able to disable it
if it causes
a regression which I do not know about at this point.
I also tested the patch on ia64, alpha, ia32e and x86-64. No problems
on any arch, modulo the fortran regressions which I think we should
address by removing support for MINUS_EXPR in is_gimple_min_invariant.
This regression has now been fixed plus also I can now run the fortran
testsuite more
reliable also on powerpc-apple-darwin.
Thanks,
Andrew Pinski