This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch, fortran] Fix PR 42131, improvement in do loops
- From: Janne Blomqvist <blomqvist dot janne at gmail dot com>
- To: Jerry DeLisle <jvdelisle at verizon dot net>
- Cc: Thomas Koenig <tkoenig at netcologne dot de>, Richard Henderson <rth at redhat dot com>, fortran at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Tue, 1 Dec 2009 19:01:19 +0200
- Subject: Re: [patch, fortran] Fix PR 42131, improvement in do loops
- References: <1259608967.3212.6.camel@meiner.onlinehome.de> <4B14425B.3090202@redhat.com> <1259651906.4639.3.camel@meiner.onlinehome.de> <4B151F7F.9090209@verizon.net>
On Tue, Dec 1, 2009 at 15:51, Jerry DeLisle <jvdelisle@verizon.net> wrote:
> On 11/30/2009 11:18 PM, Thomas Koenig wrote:
>>
>> On Mon, 2009-11-30 at 14:08 -0800, Richard Henderson wrote:
>>>
>>> On 11/30/2009 11:22 AM, Thomas Koenig wrote:
>>>>
>>>> P.S: Richard, if you have a suggestion along the lines of what
>>>> you proposed in http://gcc.gnu.org/bugzilla/process_bug.cgi#c22 ,
>>>> please don't hesitate to say so.
>>>
>>> Richi had meant
>>>
>>> ? ?step_sign = fold_build3 (COND_EXPR, type,
>>> ? ? ? ? ? ? ? ?fold_build2 (LT_EXPR, boolean_type_node, step,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? build_int_cst (type, 0)),
>>> ? ? ? ? ? ? ? ? ?build_int_cst (type, -1), build_int_cst (type, 1));
>>>
>>> I.e. "step_size = (step< ?0 ? -1 : 1)".
>>
>> That would have worked as well, also for folding, I see. ?I am a bit
>> surprised because the version with the if didn't work.
>>
>> If anybody shows that this version is better than what I committed, this
>> is a trivial enough change that can be done easily.
>>
>> ? ? ? ?Thomas
>>
>>
> Thomas, you are approved to change this on trunk. ?It does seem cleaner,
> simpler.
>
> Jerry
>
With the patch below,
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index e9f76a0..32c6efc 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -1028,17 +1028,13 @@ gfc_trans_do (gfc_code * code)
{
tree pos, neg, step_sign, to2, from2, step2;
- /* Calculate SIGN (1,step) */
+ /* Calculate SIGN (1,step), as (step < 0 ? -1 : 1) */
- tmp = fold_build2 (RSHIFT_EXPR, type, step,
- build_int_cst (type,
- TYPE_PRECISION (type) - 1));
-
- tmp = fold_build2 (MULT_EXPR, type, tmp,
- build_int_cst (type, 2));
-
- step_sign = fold_build2 (PLUS_EXPR, type, tmp,
- fold_convert (type, integer_one_node));
+ tmp = fold_build2 (LT_EXPR, boolean_type_node, step,
+ build_int_cst (TREE_TYPE (step), 0));
+ step_sign = fold_build3 (COND_EXPR, type, tmp,
+ build_int_cst (type, -1),
+ build_int_cst (type, 1));
tmp = fold_build2 (LT_EXPR, boolean_type_node, to, from);
pos = fold_build3 (COND_EXPR, void_type_node, tmp,
it seems that the trunk version is actually slightly faster. Then
again, the difference is probably not statistically significant,
especially since I had a desktop session running at the same time as
the benchmark. Today's trunk:
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 1.68 42786 9.87 2 0.0882
air 3.63 77325 7.29 5 0.8470
aermod 57.03 1254397 33.25 5 0.3303
doduc 7.84 186762 27.88 2 0.0074
linpk 0.97 36360 13.20 2 0.0614
mdbx 2.32 75747 12.38 5 0.0588
tfft 0.71 26748 4.51 2 0.0188
capacita 2.74 79064 51.75 3 0.1411
channel 0.99 33835 3.57 4 0.1531
fatigue 3.34 85028 7.11 5 0.2978
gas_dyn 5.36 121347 5.37 2 0.1917
induct 6.88 179878 16.81 2 0.0791
nf 3.17 76270 14.00 2 0.1832
protein 7.99 122053 35.37 2 0.1787
rnflow 9.66 179982 25.25 2 0.1952
test_fpu 7.29 152184 8.79 5 2.4118
Geometric Mean Execution Time = 12.97 seconds
With the patch above:
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 1.67 42786 9.85 2 0.1563
air 3.80 77325 7.28 5 0.5034
aermod 54.77 1254397 33.48 5 0.5369
doduc 7.86 186762 28.22 5 0.6129
linpk 0.96 36360 13.29 3 0.1782
mdbx 2.31 75747 12.36 2 0.1597
tfft 0.74 26620 4.61 2 0.0813
capacita 2.73 79064 51.54 4 0.1646
channel 0.94 33835 3.61 2 0.0582
fatigue 3.33 85028 7.08 5 0.7254
gas_dyn 5.07 121347 5.25 5 0.6386
induct 6.48 179878 16.86 2 0.0486
nf 3.37 76270 14.05 2 0.0246
protein 7.61 122053 35.42 2 0.1489
rnflow 9.66 179982 25.69 5 1.1166
test_fpu 7.46 152184 9.54 2 0.0817
Geometric Mean Execution Time = 13.08 seconds
Also, Salvatore's benchmark from PR 42108 on trunk:
$ time ./eval-trunk <<EOF
> 40000
> EOF
real 0m23.177s
user 0m23.150s
sys 0m0.020s
And with the patch:
$ time ./a.out << EOF
40000
EOF
real 0m23.173s
user 0m23.170s
sys 0m0.010s
Again, not statistically significant.
So, any preferences?
--
Janne Blomqvist