Bug 57742 - memset(malloc(n),0,n) -> calloc(n,1)
Summary: memset(malloc(n),0,n) -> calloc(n,1)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: 5.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-27 22:28 UTC by Marc Glisse
Modified: 2015-09-17 19:32 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-10-14 00:00:00


Attachments
basic patch (1.39 KB, patch)
2013-10-11 16:44 UTC, Marc Glisse
Details | Diff
walk_aliased_vdefs experiment (1.77 KB, patch)
2013-10-14 20:53 UTC, Marc Glisse
Details | Diff
New patch (2.00 KB, patch)
2014-02-23 18:46 UTC, Marc Glisse
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2013-06-27 22:28:37 UTC
Hello,

calloc can sometimes be significantly faster than malloc+bzero because it has special knowledge that some memory is already zero. When other optimizations simplify some code to malloc+memset(0), it would thus be nice to replace it with calloc. Sadly, I don't think there is a way to do a similar optimization in C++ with new, which is where such code most easily appears (creating std::vector<int>(10000) for instance). And there would also be the complication there that the size of the memset would be a bit smaller than that of the malloc (using calloc would still be fine, but it gets harder to know if it is an improvement).
Comment 1 Marc Glisse 2013-10-11 16:44:24 UTC
Created attachment 30981 [details]
basic patch

This is a very limited version of this optimization. It is in simplify_builtin_call, so only triggers if malloc/calloc is SSA_NAME_DEF_STMT(gimple_vuse(memset_stmt)). However, generalizing it means we would need plenty of tests protecting against cases where the transformation would be wrong. Note that this transforms:
p=malloc(n);
if(cond)memset(p,0,n);
into:
p=calloc(n,1);
cond;
which is good if cond is p!=0 but may not always be so great otherwise.

I won't post this to gcc-patches, I think we want something more general (dereferencing a double* between the 2 statements shouldn't break it) or nothing.
Comment 2 Richard Biener 2013-10-14 08:55:15 UTC
(In reply to Marc Glisse from comment #1)
> Created attachment 30981 [details]
> basic patch
> 
> This is a very limited version of this optimization. It is in
> simplify_builtin_call, so only triggers if malloc/calloc is
> SSA_NAME_DEF_STMT(gimple_vuse(memset_stmt)). However, generalizing it means
> we would need plenty of tests protecting against cases where the
> transformation would be wrong. Note that this transforms:
> p=malloc(n);
> if(cond)memset(p,0,n);
> into:
> p=calloc(n,1);
> cond;
> which is good if cond is p!=0 but may not always be so great otherwise.

;)  post-dominator tests (or simply tests whether both calls are in the
same basic-block ...).

Also you can transform

p = malloc (n);
if (p)
  memset (p, 0, n);

which might be a common-enough case to optimize for.

> I won't post this to gcc-patches, I think we want something more general
> (dereferencing a double* between the 2 statements shouldn't break it) or
> nothing.

dereferencing a double wouldn't have a VDEF (unless you store a double).
Comment 3 Joost VandeVondele 2013-10-14 09:51:11 UTC
just wondering if this would also catch a Fortran testcase, which compiles into malloc and memset:

MODULE M
  REAL*8, DIMENSION(:), ALLOCATABLE, SAVE :: data
  INTEGER, PARAMETER :: n=2**16
CONTAINS
  SUBROUTINE TEST
    ALLOCATE(data(n))
    data(:)=0
  END SUBROUTINE
END MODULE
Comment 4 Marc Glisse 2013-10-14 10:07:30 UTC
(In reply to Richard Biener from comment #2)
> (In reply to Marc Glisse from comment #1)
> > This is a very limited version of this optimization. It is in
> > simplify_builtin_call, so only triggers if malloc/calloc is
> > SSA_NAME_DEF_STMT(gimple_vuse(memset_stmt)). However, generalizing it means
> > we would need plenty of tests protecting against cases where the
> > transformation would be wrong. Note that this transforms:
> > p=malloc(n);
> > if(cond)memset(p,0,n);
> > into:
> > p=calloc(n,1);
> > cond;
> > which is good if cond is p!=0 but may not always be so great otherwise.
> 
> ;)  post-dominator tests (or simply tests whether both calls are in the
> same basic-block ...).

Same basic block is quite limited, and for the condition below we don't directly have post-domination, we would need post-domination between the bbs with gimple_cond and malloc, and the bb of memset with the landing block of the gimple_cond. But even finding the gimple_cond in: malloc; loop; cond; loop; memset; can be hard. I guess I'll have to limit my expectations a bit...

> Also you can transform
> 
> p = malloc (n);
> if (p)
>   memset (p, 0, n);
> 
> which might be a common-enough case to optimize for.

Yes, that's the goal.

> dereferencing a double wouldn't have a VDEF (unless you store a double).

I do want to be able to store in between, so I think I have to walk the vdef chain. But as soon as I do that, I need to make sure that the writes are to places that can't alias, which complicates things a lot (and it can get a bit expensive in a function with many memset). Consider this program:

#include <vector>
void f(void*p,int n){ new(p)std::vector<int>(n,0); }

With -O3, we end up with:

  _27 = operator new (_26);
  MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_start = _27;
  MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_finish = _27;
  _16 = _27 + _26;
  MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_end_of_storage = _16;
  __builtin_memset (_27, 0, _26);

which has memory stores between the allocation and memset. That's exactly the type of code where I'd want the optimization to apply. Joost's example has the same pattern: malloc, test for 0, several unrelated memory stores, memset.

(how to handle the fact that we have operator new and not malloc is a different issue, I am thinking of having a mode/flag where we promise not to replace operator new so it can be inlined, which will include an if(p!=0) test)

It would be great (in particular for application-specific plugins) to have an easy way to say things like: this is the next read/write use of this memory region (but other memory regions may be used in between), and it isn't post-dominated only because of this gimple_cond, etc. It's almost noon, too late to be dreaming ;-)
Comment 5 Richard Biener 2013-10-14 10:45:57 UTC
(In reply to Marc Glisse from comment #4)
> (In reply to Richard Biener from comment #2)
> > (In reply to Marc Glisse from comment #1)
> > > This is a very limited version of this optimization. It is in
> > > simplify_builtin_call, so only triggers if malloc/calloc is
> > > SSA_NAME_DEF_STMT(gimple_vuse(memset_stmt)). However, generalizing it means
> > > we would need plenty of tests protecting against cases where the
> > > transformation would be wrong. Note that this transforms:
> > > p=malloc(n);
> > > if(cond)memset(p,0,n);
> > > into:
> > > p=calloc(n,1);
> > > cond;
> > > which is good if cond is p!=0 but may not always be so great otherwise.
> > 
> > ;)  post-dominator tests (or simply tests whether both calls are in the
> > same basic-block ...).
> 
> Same basic block is quite limited, and for the condition below we don't
> directly have post-domination, we would need post-domination between the bbs
> with gimple_cond and malloc, and the bb of memset with the landing block of
> the gimple_cond. But even finding the gimple_cond in: malloc; loop; cond;
> loop; memset; can be hard. I guess I'll have to limit my expectations a
> bit...
> 
> > Also you can transform
> > 
> > p = malloc (n);
> > if (p)
> >   memset (p, 0, n);
> > 
> > which might be a common-enough case to optimize for.
> 
> Yes, that's the goal.
> 
> > dereferencing a double wouldn't have a VDEF (unless you store a double).
> 
> I do want to be able to store in between, so I think I have to walk the vdef
> chain. But as soon as I do that, I need to make sure that the writes are to
> places that can't alias, which complicates things a lot (and it can get a
> bit expensive in a function with many memset). Consider this program:
> 
> #include <vector>
> void f(void*p,int n){ new(p)std::vector<int>(n,0); }
> 
> With -O3, we end up with:
> 
>   _27 = operator new (_26);
>   MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_start = _27;
>   MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_finish = _27;
>   _16 = _27 + _26;
>   MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_end_of_storage = _16;
>   __builtin_memset (_27, 0, _26);
> 
> which has memory stores between the allocation and memset. That's exactly
> the type of code where I'd want the optimization to apply. Joost's example
> has the same pattern: malloc, test for 0, several unrelated memory stores,
> memset.

We have walk_aliased_vdefs for this.  Basically the first callback
you receive has to be the malloc, otherwise there is an aliasing
stmt inbetween.  Initialize the ao_ref with ao_ref_init_from_ptr_and_size.

> (how to handle the fact that we have operator new and not malloc is a
> different issue, I am thinking of having a mode/flag where we promise not to
> replace operator new so it can be inlined, which will include an if(p!=0)
> test)
> 
> It would be great (in particular for application-specific plugins) to have
> an easy way to say things like: this is the next read/write use of this
> memory region (but other memory regions may be used in between), and it
> isn't post-dominated only because of this gimple_cond, etc. It's almost
> noon, too late to be dreaming ;-)

See above ;)
Comment 6 Marc Glisse 2013-10-14 11:48:22 UTC
(In reply to Richard Biener from comment #5)
> We have walk_aliased_vdefs for this.  Basically the first callback
> you receive has to be the malloc, otherwise there is an aliasing
> stmt inbetween.

Cool! Last time I looked into a similar optimization, I needed to look also at the memory reads, not just the writes, so it was significantly more complicated. walk_aliased_vdefs looks perfect here, both for malloc+memset where there is nothing to read before the memset, and for calloc+memset where reading before or after the memset returns the same :-)
Comment 7 Marc Glisse 2013-10-14 20:51:20 UTC
(In reply to Richard Biener from comment #5)
> We have walk_aliased_vdefs for this.  Basically the first callback
> you receive has to be the malloc, otherwise there is an aliasing
> stmt inbetween.  Initialize the ao_ref with ao_ref_init_from_ptr_and_size.

Hmm, there is a problem with that: I don't get a callback for malloc. stmt_may_clobber_ref_p_1 only looks at the lhs of a call statement if it isn't an SSA_NAME, so it considers that p=malloc(n) does not clobber MEM_REF[p]. This kind of makes sense, it creates this memory, which is different from clobbering. I can look at the def_stmt of the first argument of memset to find the malloc, at least, but that doesn't help me with the memory checks.

Also, for this testcase:
void* f(int n,double*d){
  int* p=__builtin_malloc(n);
  ++*d;
  __builtin_memset(p,0,n);
  return p;
}
I actually get a callback for the store in *d, which gcc believes might alias :-(

For this example:
void g(int*);
void* f(int n){
  int* p=__builtin_malloc(n);
  for(int i=0;i<10000;++i){
    __builtin_memset(p,0,n);
    g(p);
    p[5]=10;
  }
  return p;
}
if I modify the aliasing machinery to make it believe that p=malloc does alias, malloc is the first callback. I haven't added the dominance checks, but I assume they will tell me that malloc dominates memset and memset postdominates malloc, although I still shouldn't do the transformation.

Pretty depressed at this point...
Comment 8 Marc Glisse 2013-10-14 20:53:21 UTC
Created attachment 31003 [details]
walk_aliased_vdefs experiment

Incomplete patch I used for my previous comment.
Comment 9 Richard Biener 2013-10-15 07:57:14 UTC
(In reply to Marc Glisse from comment #7)
> (In reply to Richard Biener from comment #5)
> > We have walk_aliased_vdefs for this.  Basically the first callback
> > you receive has to be the malloc, otherwise there is an aliasing
> > stmt inbetween.  Initialize the ao_ref with ao_ref_init_from_ptr_and_size.
> 
> Hmm, there is a problem with that: I don't get a callback for malloc.
> stmt_may_clobber_ref_p_1 only looks at the lhs of a call statement if it
> isn't an SSA_NAME, so it considers that p=malloc(n) does not clobber
> MEM_REF[p]. This kind of makes sense, it creates this memory, which is
> different from clobbering. I can look at the def_stmt of the first argument
> of memset to find the malloc, at least, but that doesn't help me with the
> memory checks.
> 
> Also, for this testcase:
> void* f(int n,double*d){
>   int* p=__builtin_malloc(n);
>   ++*d;
>   __builtin_memset(p,0,n);
>   return p;
> }
> I actually get a callback for the store in *d, which gcc believes might
> alias :-(

Yeah well, either because of pass placement or because of points-to
analysis being not context sensitive.

> For this example:
> void g(int*);
> void* f(int n){
>   int* p=__builtin_malloc(n);
>   for(int i=0;i<10000;++i){
>     __builtin_memset(p,0,n);
>     g(p);
>     p[5]=10;
>   }
>   return p;
> }
> if I modify the aliasing machinery to make it believe that p=malloc does
> alias, malloc is the first callback. I haven't added the dominance checks,
> but I assume they will tell me that malloc dominates memset and memset
> postdominates malloc, although I still shouldn't do the transformation.
> 
> Pretty depressed at this point...

Nobody said it was going to be trivial ;)

Exact pattern matching of the CFG involved might be the easiest, plus
manually implementing walk_aliased_vdefs by simply walking the use-def
chain of the virtual operands from the memset operation to the malloc
and checking stmt_may_clobber_ref_p_1 on the ao_ref_init_from_ptr_and_size
ref.
Comment 10 Marc Glisse 2013-10-15 14:11:29 UTC
(In reply to Richard Biener from comment #9)
> (In reply to Marc Glisse from comment #7)
> > Also, for this testcase:
> > void* f(int n,double*d){
> >   int* p=__builtin_malloc(n);
> >   ++*d;
> >   __builtin_memset(p,0,n);
> >   return p;
> > }
> > I actually get a callback for the store in *d, which gcc believes might
> > alias :-(
> 
> Yeah well, either because of pass placement or because of points-to
> analysis being not context sensitive.

forwprop is run 4 times, it would be really unlucky if all 4 were badly placed. I am surprised that in FRE/PRE stmt_may_clobber_ref_p_1 returns false for p and q in the following code (while called from forwprop in the above testcase it returns true). The main difference I can see is that the SSA_NAME_PTR_INFO of p has vars_contains_global=1 when I test my code and 0 when I test the following in FRE.

int g(int*a,int n,double*q){
  int*p=__builtin_malloc(n);
  p[2]=3;
  *q=5.;
  return p[2];
}


> Exact pattern matching of the CFG involved might be the easiest, plus
> manually implementing walk_aliased_vdefs by simply walking the use-def
> chain of the virtual operands from the memset operation to the malloc

That was also my conclusion, and it is a bit disappointing, since the transformation shouldn't mind if for instance there is an unrelated loop between malloc and memset. Better than nothing I guess :-/

> and checking stmt_may_clobber_ref_p_1 on the ao_ref_init_from_ptr_and_size
> ref.

Well, that won't help since stmt_may_clobber_ref_p_1 doesn't notice that unrelated stores are unrelated, I first need to work on that.
Comment 11 Marc Glisse 2013-10-15 16:38:25 UTC
(In reply to Richard Biener from comment #9)
> because of points-to analysis being not context sensitive.

Is "context sensitivity" the thing that is missing to perform the obvious optimization (i=3) on the following code without adding restrict to q? If so, I assume I don't need to file a PR about it... (there are too many PRs that mention alias analysis, I can't find anything)

int i;
int* g(int n,int*q){
  int*p=__builtin_malloc(n);
  p[2]=3;
  *q=5;
  i=p[2];
  return p;
}

(the "easy" cases are when we don't return p (no "global" var) or when p and q have incompatible types (but malloc+memset seems to lose any type I could try to force on the pointer))
Comment 12 Richard Biener 2013-10-16 14:11:41 UTC
(In reply to Marc Glisse from comment #11)
> (In reply to Richard Biener from comment #9)
> > because of points-to analysis being not context sensitive.
> 
> Is "context sensitivity" the thing that is missing to perform the obvious
> optimization (i=3) on the following code without adding restrict to q? If
> so, I assume I don't need to file a PR about it... (there are too many PRs
> that mention alias analysis, I can't find anything)

Yes, the fact that the return value p cannot be equal to q inside the
function is not exposable.

> int i;
> int* g(int n,int*q){
>   int*p=__builtin_malloc(n);
>   p[2]=3;
>   *q=5;
>   i=p[2];
>   return p;
> }
> 
> (the "easy" cases are when we don't return p (no "global" var) or when p and
> q have incompatible types (but malloc+memset seems to lose any type I could
> try to force on the pointer))
Comment 13 Marc Glisse 2014-02-22 15:44:45 UTC
(In reply to Richard Biener from comment #12)
> Yes, the fact that the return value p cannot be equal to q inside the
> function is not exposable.

Richard fixed this in PR 50262, yay!
So this PR is worth working on again.
Comment 14 Marc Glisse 2014-02-23 18:46:29 UTC
Created attachment 32204 [details]
New patch

This seems to work. It also handles the fortran example from comment #3. With a comment before the new function and a testcase, it will be good to go to gcc-patches.

Side note: at -O3, if I provide an inline version of operator new (see PR 59894), it handles std::vector<int>(n). However, I had to provide a simple one (call malloc, if null throw). The one in libsupc++ is way too complicated (2 calls to malloc), and if I refactor it slightly so "malloc" only appears once, I end up with the following. The edge probabilities are strange (malloc fails in 95% of cases?), but mostly we have a PHI node with a single argument which hides the fact that the variables are the same. It is far from the first time I notice this, is there a real reason to keep those unary PHIs, or should we optimize them more aggressively?

  p_24 = mallocD.1405 (sz_20);
  if (p_24 == 0B)
    goto <bb 7>;
  else
    goto <bb 11>;
;;    succ:       7 [95.5%]  (TRUE_VALUE,EXECUTABLE)
;;                11 [4.5%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 11, loop depth 0, count 0, freq 349, maybe hot
;;    prev block 10, next block 12, flags: (NEW, REACHABLE)
;;    pred:       10 [4.5%]  (FALSE_VALUE,EXECUTABLE)
  # PT = { D.16587 } (escaped heap)
  # ALIGN = 8, MISALIGN = 0
  # p_41 = PHI <p_24(10)>
  # .MEM_42 = VDEF <.MEM_34>
  MEM[(struct _Vector_baseD.14156 *)p_2(D)]._M_implD.15030._M_startD.15032 = p_41;
  # PT = { D.16587 } (escaped heap)
  # ALIGN = 4, MISALIGN = 0
  _19 = p_41 + sz_20;
  # .MEM_44 = VDEF <.MEM_42>
  MEM[(struct _Vector_baseD.14156 *)p_2(D)]._M_implD.15030._M_end_of_storageD.15034 = _19;
  # .MEM_8 = VDEF <.MEM_44>
  # USE = anything 
  # CLB = anything 
  memsetD.1000 (p_41, 0, sz_20);
Comment 15 Marc Glisse 2014-06-24 18:50:32 UTC
Author: glisse
Date: Tue Jun 24 18:50:00 2014
New Revision: 211956

URL: https://gcc.gnu.org/viewcvs?rev=211956&root=gcc&view=rev
Log:
2014-06-24  Marc Glisse  <marc.glisse@inria.fr>

	PR tree-optimization/57742
gcc/
	* tree-ssa-strlen.c (get_string_length): Ignore malloc.
	(handle_builtin_malloc, handle_builtin_memset): New functions.
	(strlen_optimize_stmt): Call them.
	* passes.def: Move strlen after loop+dom but before vrp.
gcc/testsuite/
	* g++.dg/tree-ssa/calloc.C: New testcase.
	* gcc.dg/tree-ssa/calloc-1.c: Likewise.
	* gcc.dg/tree-ssa/calloc-2.c: Likewise.
	* gcc.dg/strlenopt-9.c: Adapt.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/calloc.C
    trunk/gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/calloc-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/strlenopt-9.c
    trunk/gcc/tree-ssa-strlen.c
Comment 16 Marc Glisse 2014-06-24 19:04:36 UTC
Done. Joost, feel free to add your testcase from comment #3 if you want to (I can't write a "hello world" in fortran so I will avoid adding such testcases myself).
Comment 17 Joost VandeVondele 2014-06-25 06:46:45 UTC
(In reply to Marc Glisse from comment #16)
> Done. Joost, feel free to add your testcase from comment #3 if you want to
> (I can't write a "hello world" in fortran so I will avoid adding such
> testcases myself).

Thanks Marc, I don't have write access, but I can try to dg-ify the testcase from comment #3.. however, first test, it still seems to contain a call to builtin_malloc at -O2, seems to work at -O3... expected ?


Also, my nightly CP2K tester fails with :

0xa63a0f crash_signal
        ../../gcc/gcc/toplev.c:337
0x871f76 bb_seq_addr
        ../../gcc/gcc/gimple.h:1389
0x871f76 gsi_start_bb
        ../../gcc/gcc/gimple-iterator.h:118
0x871f76 gsi_for_stmt(gimple_statement_base*)
        ../../gcc/gcc/gimple-iterator.c:620
0xbfe1c1 handle_builtin_memset
        ../../gcc/gcc/tree-ssa-strlen.c:1653
0xbfe1c1 strlen_optimize_stmt
        ../../gcc/gcc/tree-ssa-strlen.c:1917
0xbfe1c1 strlen_dom_walker::before_dom_children(basic_block_def*)
        ../../gcc/gcc/tree-ssa-strlen.c:2096
0xfa483a dom_walker::walk(basic_block_def*)
        ../../gcc/gcc/domwalk.c:177
0xbf963d execute
        ../../gcc/gcc/tree-ssa-strlen.c:2170
Please submit a full bug report,

which I suppose is related to this patch... I'll see if I can get a testcase.
Comment 18 Joost VandeVondele 2014-06-25 07:41:51 UTC
The following now fails, so'll reopen this PR. It is at least related to zeroing pvec twice in a row, and doesn seem to happen if I manually inline the routine get_pseudo_param .

> cat bug.f90
MODULE atom_fit
  INTEGER, PARAMETER :: dp=8
CONTAINS
  SUBROUTINE atom_fit_pseudo ()
    REAL(KIND=dp), ALLOCATABLE, DIMENSION(:) :: x, xi
    LOGICAL :: lsdpot
    ALLOCATE(xi(200),STAT=ierr)
    CALL get_pseudo_param(xi,lsdpot)
    CALL foo(xi)
  END SUBROUTINE atom_fit_pseudo
  SUBROUTINE get_pseudo_param (pvec,lsdpot)
    REAL(KIND=dp), DIMENSION(:), INTENT(out) :: pvec
    LOGICAL :: lsdpot
    IF(lsdpot) THEN
      pvec = 0
      pvec = 0
    END IF
  END SUBROUTINE get_pseudo_param
END MODULE atom_fit

> gfortran -c -O3 bug.f90
bug.f90: In function ‘atom_fit_pseudo’:
bug.f90:4:0: internal compiler error: Segmentation fault
   SUBROUTINE atom_fit_pseudo ()
 ^
0xa63a0f crash_signal
	../../gcc/gcc/toplev.c:337
0x871f76 bb_seq_addr
	../../gcc/gcc/gimple.h:1389
0x871f76 gsi_start_bb
	../../gcc/gcc/gimple-iterator.h:118
0x871f76 gsi_for_stmt(gimple_statement_base*)
	../../gcc/gcc/gimple-iterator.c:620
0xbfe1c1 handle_builtin_memset
	../../gcc/gcc/tree-ssa-strlen.c:1653
0xbfe1c1 strlen_optimize_stmt
	../../gcc/gcc/tree-ssa-strlen.c:1917
0xbfe1c1 strlen_dom_walker::before_dom_children(basic_block_def*)
	../../gcc/gcc/tree-ssa-strlen.c:2096
0xfa483a dom_walker::walk(basic_block_def*)
	../../gcc/gcc/domwalk.c:177
0xbf963d execute
	../../gcc/gcc/tree-ssa-strlen.c:2170
Please submit a full bug report,
Comment 19 Marc Glisse 2014-06-25 07:43:59 UTC
(In reply to Joost VandeVondele from comment #17)
> Thanks Marc, I don't have write access, but I can try to dg-ify the testcase
> from comment #3.. however, first test, it still seems to contain a call to
> builtin_malloc at -O2, seems to work at -O3... expected ?

Yes, at -O2 you don't have a call to memset, so my patch does nothing. It is the same as my C++ testcase basically, so we don't really need the extra testcase.

> Also, my nightly CP2K tester fails with :
> 
> 0xa63a0f crash_signal
>         ../../gcc/gcc/toplev.c:337
> 0x871f76 bb_seq_addr
>         ../../gcc/gcc/gimple.h:1389
> 0x871f76 gsi_start_bb
>         ../../gcc/gcc/gimple-iterator.h:118
> 0x871f76 gsi_for_stmt(gimple_statement_base*)
>         ../../gcc/gcc/gimple-iterator.c:620
> 0xbfe1c1 handle_builtin_memset
>         ../../gcc/gcc/tree-ssa-strlen.c:1653
> 0xbfe1c1 strlen_optimize_stmt
>         ../../gcc/gcc/tree-ssa-strlen.c:1917
> 0xbfe1c1 strlen_dom_walker::before_dom_children(basic_block_def*)
>         ../../gcc/gcc/tree-ssa-strlen.c:2096
> 0xfa483a dom_walker::walk(basic_block_def*)
>         ../../gcc/gcc/domwalk.c:177
> 0xbf963d execute
>         ../../gcc/gcc/tree-ssa-strlen.c:2170
> Please submit a full bug report,
> 
> which I suppose is related to this patch... I'll see if I can get a testcase.

Yes, please open a new PR with the testcase and Cc: me, thanks.
Comment 20 Marc Glisse 2014-06-25 07:53:17 UTC
(In reply to Joost VandeVondele from comment #18)
> The following now fails, so'll reopen this PR. It is at least related to
> zeroing pvec twice in a row, and doesn seem to happen if I manually inline
> the routine get_pseudo_param .

Hum, right, I thought I had tested that, but it was in an earlier version of the patch and I forgot to add it to one of the testcases :-(

void*f(){
  char*p=malloc(42);
  memset(p,0,42);
  memset(p,0,42);
  return p;
};
Comment 21 Marc Glisse 2014-06-25 08:09:06 UTC
I am testing the following:

--- tree-ssa-strlen.c	(revision 211967)
+++ tree-ssa-strlen.c	(working copy)
@@ -1646,20 +1646,22 @@ handle_builtin_memset (gimple_stmt_itera
   enum built_in_function code1 = DECL_FUNCTION_CODE (callee1);
   tree size = gimple_call_arg (stmt2, 2);
   if (code1 == BUILT_IN_CALLOC)
     /* Not touching stmt1 */ ;
   else if (code1 == BUILT_IN_MALLOC
 	   && operand_equal_p (gimple_call_arg (stmt1, 0), size, 0))
     {
       gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt1);
       update_gimple_call (&gsi1, builtin_decl_implicit (BUILT_IN_CALLOC), 2,
 			  size, build_one_cst (size_type_node));
+      si1->length = build_int_cst (size_type_node, 0);
+      si1->stmt = gsi_stmt (gsi1);
     }
   else
     return true;
   tree lhs = gimple_call_lhs (stmt2);
   unlink_stmt_vdef (stmt2);
   if (lhs)
     {
       gimple assign = gimple_build_assign (lhs, ptr);
       gsi_replace (gsi, assign, false);
     }
Comment 22 Marc Glisse 2014-06-25 12:27:45 UTC
Author: glisse
Date: Wed Jun 25 12:27:13 2014
New Revision: 211977

URL: https://gcc.gnu.org/viewcvs?rev=211977&root=gcc&view=rev
Log:
2014-06-25  Marc Glisse  <marc.glisse@inria.fr>

	PR tree-optimization/57742
gcc/
	* tree-ssa-strlen.c (handle_builtin_memset): Update strinfo
	after replacing the statement.
gcc/testsuite/
	* gcc.dg/tree-ssa/calloc-3.c: New file.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/calloc-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-strlen.c
Comment 23 Marc Glisse 2014-06-25 12:29:12 UTC
Fixed.
Comment 24 Daniel Gutson 2015-09-17 19:32:45 UTC
This optimization breaks code, please see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67618