This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch, fortran] Fix PR 42131, improvement in do loops


On Tue, Dec 1, 2009 at 15:51, Jerry DeLisle <jvdelisle@verizon.net> wrote:
> On 11/30/2009 11:18 PM, Thomas Koenig wrote:
>>
>> On Mon, 2009-11-30 at 14:08 -0800, Richard Henderson wrote:
>>>
>>> On 11/30/2009 11:22 AM, Thomas Koenig wrote:
>>>>
>>>> P.S: Richard, if you have a suggestion along the lines of what
>>>> you proposed in http://gcc.gnu.org/bugzilla/process_bug.cgi#c22 ,
>>>> please don't hesitate to say so.
>>>
>>> Richi had meant
>>>
>>> ? ?step_sign = fold_build3 (COND_EXPR, type,
>>> ? ? ? ? ? ? ? ?fold_build2 (LT_EXPR, boolean_type_node, step,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? build_int_cst (type, 0)),
>>> ? ? ? ? ? ? ? ? ?build_int_cst (type, -1), build_int_cst (type, 1));
>>>
>>> I.e. "step_size = (step< ?0 ? -1 : 1)".
>>
>> That would have worked as well, also for folding, I see. ?I am a bit
>> surprised because the version with the if didn't work.
>>
>> If anybody shows that this version is better than what I committed, this
>> is a trivial enough change that can be done easily.
>>
>> ? ? ? ?Thomas
>>
>>
> Thomas, you are approved to change this on trunk. ?It does seem cleaner,
> simpler.
>
> Jerry
>

With the patch below,

diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index e9f76a0..32c6efc 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -1028,17 +1028,13 @@ gfc_trans_do (gfc_code * code)
     {
       tree pos, neg, step_sign, to2, from2, step2;

-      /* Calculate SIGN (1,step) */
+      /* Calculate SIGN (1,step), as (step < 0 ? -1 : 1)  */

-      tmp = fold_build2 (RSHIFT_EXPR, type, step,
-                        build_int_cst (type,
-                                       TYPE_PRECISION (type) - 1));
-
-      tmp = fold_build2 (MULT_EXPR, type, tmp,
-                        build_int_cst (type, 2));
-
-      step_sign = fold_build2 (PLUS_EXPR, type, tmp,
-                              fold_convert (type, integer_one_node));
+      tmp = fold_build2 (LT_EXPR, boolean_type_node, step,
+                        build_int_cst (TREE_TYPE (step), 0));
+      step_sign = fold_build3 (COND_EXPR, type, tmp,
+                              build_int_cst (type, -1),
+                              build_int_cst (type, 1));

       tmp = fold_build2 (LT_EXPR, boolean_type_node, to, from);
       pos = fold_build3 (COND_EXPR, void_type_node, tmp,

it seems that the trunk version is actually slightly faster. Then
again, the difference is probably not statistically significant,
especially since I had a desktop session running at the same time as
the benchmark. Today's trunk:

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      1.68       42786      9.87       2  0.0882
         air      3.63       77325      7.29       5  0.8470
      aermod     57.03     1254397     33.25       5  0.3303
       doduc      7.84      186762     27.88       2  0.0074
       linpk      0.97       36360     13.20       2  0.0614
        mdbx      2.32       75747     12.38       5  0.0588
        tfft      0.71       26748      4.51       2  0.0188
    capacita      2.74       79064     51.75       3  0.1411
     channel      0.99       33835      3.57       4  0.1531
     fatigue      3.34       85028      7.11       5  0.2978
     gas_dyn      5.36      121347      5.37       2  0.1917
      induct      6.88      179878     16.81       2  0.0791
          nf      3.17       76270     14.00       2  0.1832
     protein      7.99      122053     35.37       2  0.1787
      rnflow      9.66      179982     25.25       2  0.1952
    test_fpu      7.29      152184      8.79       5  2.4118

Geometric Mean Execution Time =      12.97 seconds

With the patch above:

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      1.67       42786      9.85       2  0.1563
         air      3.80       77325      7.28       5  0.5034
      aermod     54.77     1254397     33.48       5  0.5369
       doduc      7.86      186762     28.22       5  0.6129
       linpk      0.96       36360     13.29       3  0.1782
        mdbx      2.31       75747     12.36       2  0.1597
        tfft      0.74       26620      4.61       2  0.0813
    capacita      2.73       79064     51.54       4  0.1646
     channel      0.94       33835      3.61       2  0.0582
     fatigue      3.33       85028      7.08       5  0.7254
     gas_dyn      5.07      121347      5.25       5  0.6386
      induct      6.48      179878     16.86       2  0.0486
          nf      3.37       76270     14.05       2  0.0246
     protein      7.61      122053     35.42       2  0.1489
      rnflow      9.66      179982     25.69       5  1.1166
    test_fpu      7.46      152184      9.54       2  0.0817

Geometric Mean Execution Time =      13.08 seconds

Also, Salvatore's benchmark from PR 42108 on trunk:

$ time ./eval-trunk <<EOF
> 40000
> EOF

real    0m23.177s
user    0m23.150s
sys     0m0.020s

And with the patch:

$ time ./a.out << EOF
40000
EOF


real    0m23.173s
user    0m23.170s
sys     0m0.010s


Again, not statistically significant.

So, any preferences?

-- 
Janne Blomqvist


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]