Bug 30115 - allocate() interface pessimizes aliasing
Summary: allocate() interface pessimizes aliasing
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: 4.2.0
Assignee: Richard Biener
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks: 30038
  Show dependency treegraph
 
Reported: 2006-12-07 20:54 UTC by Richard Biener
Modified: 2006-12-15 15:28 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.3.0 4.2.0
Known to fail:
Last reconfirmed: 2006-12-13 10:29:12


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2006-12-07 20:54:01 UTC
The fortran allocate() interface which is

  void allocate (void **, int)

to the middle-end does not allow aliasing to create a separate alias tag
for the allocated memory.  The middle-end only supports malloc style
interfaces for this, it looks like internal_malloc which the fortran
frontend defines uses this.

Why do we generate calls to the above inferior interface at all?
Comment 1 Richard Biener 2006-12-07 21:54:08 UTC
A little playing around and looking at the allocate() implementation in libgfortran suggests that it is possible to fix this.

Index: trans-array.c
===================================================================
--- trans-array.c       (revision 119622)
+++ trans-array.c       (working copy)
@@ -3355,31 +3355,29 @@ gfc_array_allocate (gfc_se * se, gfc_exp
                              lower, upper, &se->pre);

   /* Allocate memory to store the data.  */
-  tmp = gfc_conv_descriptor_data_addr (se->expr);
-  pointer = gfc_evaluate_now (tmp, &se->pre);
+  pointer = gfc_conv_descriptor_data_get (se->expr);
+  STRIP_NOPS (pointer);

   if (TYPE_PRECISION (gfc_array_index_type) == 32)
     {
       if (allocatable_array)
-       allocate = gfor_fndecl_allocate_array;
+       allocate = gfor_fndecl_internal_malloc;
       else
-       allocate = gfor_fndecl_allocate;
+       allocate = gfor_fndecl_internal_malloc;
     }
   else if (TYPE_PRECISION (gfc_array_index_type) == 64)
     {
       if (allocatable_array)
-       allocate = gfor_fndecl_allocate64_array;
+       allocate = gfor_fndecl_internal_malloc64;
       else
-       allocate = gfor_fndecl_allocate64;
+       allocate = gfor_fndecl_internal_malloc64;
     }
   else
     gcc_unreachable ();

-  tmp = gfc_chainon_list (NULL_TREE, pointer);
-  tmp = gfc_chainon_list (tmp, size);
-  tmp = gfc_chainon_list (tmp, pstat);
+  tmp = gfc_chainon_list (NULL_TREE, size);
   tmp = build_function_call_expr (allocate, tmp);
-  gfc_add_expr_to_block (&se->pre, tmp);
+  gfc_add_expr_to_block (&se->pre, build2 (MODIFY_EXPR, void_type_node, pointer, tmp));

   tmp = gfc_conv_descriptor_offset (se->expr);
   gfc_add_modify_expr (&se->pre, tmp, offset);


this generates 

  D.3240_65 = _gfortran_internal_malloc64 (size.310_298);
  #   SFT.762_6097 = V_MUST_DEF <SFT.762_9320>;
  strain_tensor.data = D.3240_65;

which then produces a HEAP_VAR and enables sincos transformation in PR30038.
Comment 2 Richard Biener 2006-12-07 22:00:40 UTC
That patch^Whack improves fatigue runtime from 27.3s to 9.5s (with the sincos patches applied).
Comment 3 Jerry DeLisle 2006-12-08 01:25:38 UTC
It's Magic!
Comment 4 Richard Biener 2006-12-08 07:38:52 UTC
Magic will not help here ;)
Comment 5 Thomas Koenig 2006-12-08 22:54:02 UTC
(In reply to comment #1)

>    if (TYPE_PRECISION (gfc_array_index_type) == 32)
>      {
>        if (allocatable_array)
> -       allocate = gfor_fndecl_allocate_array;
> +       allocate = gfor_fndecl_internal_malloc;
>        else
> -       allocate = gfor_fndecl_allocate;
> +       allocate = gfor_fndecl_internal_malloc;

This patch, as-is, will very likely violate the Fortran standard.

See, for example, multiple_allocation_1.f90:

  allocate(a(4))
  ! This should set the stat code and change the size.
  allocate(a(3),stat=i)
  if (i == 0) call abort
  if (.not. allocated(a)) call abort
  if (size(a) /= 3) call abort
  ! It's OK to allocate pointers twice (even though this causes
  ! a memory leak)
  allocate(b(4))
  allocate(b(4))

The check wether size(a) is three isn't required by the standard.
The logic that is currently within the library would need to
be moved into the front end.
Comment 6 Thomas Koenig 2006-12-08 23:35:36 UTC
I forgot

    integer, allocatable :: a(:)
    integer, pointer :: b(:)

:-)

>   allocate(a(4))
>   ! This should set the stat code and change the size.
>   allocate(a(3),stat=i)
>   if (i == 0) call abort
>   if (.not. allocated(a)) call abort
>   if (size(a) /= 3) call abort
>   ! It's OK to allocate pointers twice (even though this causes
>   ! a memory leak)
>   allocate(b(4))
>   allocate(b(4))
> 
> The check wether size(a) is three isn't required by the standard.
> The logic that is currently within the library would need to
> be moved into the front end.
> 

Comment 7 Richard Biener 2006-12-09 10:23:27 UTC
I guess it may work to only change

  void allocate (void **, size_t, int *)

to

  void *allocate (void *, size_t, int *)

and use it like

  descriptor.data = allocate (descriptor.data, size, pstat)

the point is that aliasing needs a SSA_NAME def for the newly allocated
memory to assign it a unique alias set.  With the current interface we
get only a clobber of the descriptor.data (a VDEF) from which we cannot
do any alias set modification.

I believe this interface change could work without violating any parts
of the standard (as it really doesn't change semantics)?

(confirming this, the middle-end is really pessimized by the current
interface)
Comment 8 Thomas Koenig 2006-12-09 19:03:21 UTC
(In reply to comment #7)
> I guess it may work to only change
> 
>   void allocate (void **, size_t, int *)

> to
> 
>   void *allocate (void *, size_t, int *)

> and use it like

>   descriptor.data = allocate (descriptor.data, size, pstat)

Sounds good.  Who'll write the patch? :-)
Comment 9 Paul Thomas 2006-12-10 09:02:30 UTC
Alternatively, how about something along the lines:

$ svn diff -x -cp gcc/fortran/trans-array.c gcc/fortran/trans-stmt.c
Index: gcc/fortran/trans-array.c
===================================================================
*** gcc/fortran/trans-array.c   (révision 119281)
--- gcc/fortran/trans-array.c   (copie de travail)
*************** gfc_array_init_size (tree descriptor, in
*** 3264,3270 ****
 /*GCC ARRAYS*/

 bool
! gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree pstat)
 {
   tree tmp;
   tree pointer;
--- 3288,3294 ----
 /*GCC ARRAYS*/

 bool
! gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree stat)
 {
   tree tmp;
   tree pointer;
*************** gfc_array_allocate (gfc_se * se, gfc_exp
*** 3323,3353 ****
                             lower, upper, &se->pre);

   /* Allocate memory to store the data.  */
!   tmp = gfc_conv_descriptor_data_addr (se->expr);
!   pointer = gfc_evaluate_now (tmp, &se->pre);

   if (TYPE_PRECISION (gfc_array_index_type) == 32)
     {
       if (allocatable_array)
!       allocate = gfor_fndecl_allocate_array;
       else
!       allocate = gfor_fndecl_allocate;
     }
   else if (TYPE_PRECISION (gfc_array_index_type) == 64)
     {
       if (allocatable_array)
!       allocate = gfor_fndecl_allocate64_array;
       else
!       allocate = gfor_fndecl_allocate64;
     }
   else
     gcc_unreachable ();

!   tmp = gfc_chainon_list (NULL_TREE, pointer);
!   tmp = gfc_chainon_list (tmp, size);
!   tmp = gfc_chainon_list (tmp, pstat);
   tmp = build_function_call_expr (allocate, tmp);
!   gfc_add_expr_to_block (&se->pre, tmp);

   tmp = gfc_conv_descriptor_offset (se->expr);
   gfc_add_modify_expr (&se->pre, tmp, offset);
--- 3347,3389 ----
                             lower, upper, &se->pre);

   /* Allocate memory to store the data.  */
!   pointer = gfc_conv_descriptor_data_get (se->expr);
!   STRIP_NOPS (pointer);

   if (TYPE_PRECISION (gfc_array_index_type) == 32)
     {
       if (allocatable_array)
!       allocate = gfor_fndecl_internal_malloc;
       else
!       allocate = gfor_fndecl_internal_malloc;
     }
   else if (TYPE_PRECISION (gfc_array_index_type) == 64)
     {
       if (allocatable_array)
!       allocate = gfor_fndecl_internal_malloc64;
       else
!       allocate = gfor_fndecl_internal_malloc64;
     }
   else
     gcc_unreachable ();

!   tmp = gfc_chainon_list (NULL_TREE, size);
   tmp = build_function_call_expr (allocate, tmp);
!   gfc_add_expr_to_block (&se->pre, build2 (MODIFY_EXPR, void_type_node,
!                        pointer, tmp));
!
!   tmp = build2 (EQ_EXPR, boolean_type_node, pointer,
!               convert (TREE_TYPE (pointer), integer_zero_node));
!
!   if (stat != NULL_TREE)
!     {
!       tmp = build3 (COND_EXPR, gfc_array_index_type, tmp,
!                   gfc_index_one_node, gfc_index_zero_node);
!       gfc_add_modify_expr (&se->pre, stat, tmp);
!     }
!   else
!     gfc_trans_runtime_check (tmp, "ALLOCATE: Out of memory.",
!                            &se->pre, &expr->where);

   tmp = gfc_conv_descriptor_offset (se->expr);
   gfc_add_modify_expr (&se->pre, tmp, offset);
Index: gcc/fortran/trans-stmt.c
===================================================================
*** gcc/fortran/trans-stmt.c    (révision 119281)
--- gcc/fortran/trans-stmt.c    (copie de travail)
*************** gfc_trans_allocate (gfc_code * code)
*** 3568,3574 ****
       se.descriptor_only = 1;
       gfc_conv_expr (&se, expr);

!       if (!gfc_array_allocate (&se, expr, pstat))
       {
         /* A scalar or derived type.  */
         tree val;
--- 3568,3574 ----
       se.descriptor_only = 1;
       gfc_conv_expr (&se, expr);

!       if (!gfc_array_allocate (&se, expr, stat))
       {
         /* A scalar or derived type.  */
         tree val;

etc., etc.

Paul



Comment 10 Richard Biener 2006-12-10 12:17:54 UTC
Sure, that would work - it's basically inlining the library implementation
without the reallocation that isn't required by the standard.

For the non-array variant doing the inlining might be too big - I don't know.

Inlining also get's rid of the two pointer arguments which otherwise cause
clobbers of all memory if we don't know what they point to.
Comment 11 Richard Biener 2006-12-10 12:36:35 UTC
Looking at trans-decl.c I wonder where the pstat argument is for the function
decl?  It seems we don't check for excess arguments to function calls...

I have a patch for the interface change to the allocate_array variant that I'll
post shortly.  It will of course change the library ABI ... but the interface
change still looks worthwhile even if we decide to inline the implementation
in cases where it does not increase code size (like if the stat argument is
not present).
Comment 12 Richard Biener 2006-12-13 10:29:12 UTC
Fixed on the mainline, backport to 4.2 pending.
Comment 13 Richard Biener 2006-12-15 15:28:22 UTC
Fixed.