Bug 53081 - memcpy/memset loop recognition
Summary: memcpy/memset loop recognition
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.8.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2012-04-23 04:51 UTC by davidxl
Modified: 2023-12-29 01:04 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-04-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description davidxl 2012-04-23 04:51:31 UTC
Both LLVM and icc recognize initialization and copy loop and synthesize calls to memcpy and memset.  memmove call can also be synthesized when src/target may overlap.

Option needs to provided to disable such optimization in signal handlers.

I consider this as optimization for benchmarking ;) For instance, the prime number finder program sieve.c is one of the benchmarks in LLVM. Both LLVM and icc beats gcc in this one because of the missing optimization.


#ifndef T
#define T int
#endif

T arr[1000];

void foo(int n)
{
  int i;
  for (i = 0; i < n; i++)
    {
      arr[i] = 0;
    }
}

void foo2(int n, T* p)
{
  int i;
  for (i = 0; i < n; i++)
    {
      *p++ = 0;
    }
}

#ifndef T
#define T int
#endif

T arr[1000];
T arr2[1000];

void foo(int n)
{
  int i;
  for (i = 0; i < n; i++)
    {
      arr[i] = arr2[i];
    }
}


// sieve.c

/* -*- mode: c -*-
 * $Id: sieve.c 36673 2007-05-03 16:55:46Z laurov $
 * http://www.bagley.org/~doug/shootout/
 */

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 17000
#else
#define LENGTH 170000
#endif
    int NUM = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    static char flags[8192 + 1];
    long i, k;
    int count = 0;

    while (NUM--) {
        count = 0; 
        for (i=2; i <= 8192; i++) {
            flags[i] = 1;
        }
        for (i=2; i <= 8192; i++) {
            if (flags[i]) {
                /* remove all multiples of prime: i */
                for (k=i+i; k <= 8192; k+=i) {
                    flags[k] = 0;
                }
                count++;
            }
        }
    }
    printf("Count: %d\n", count);
    return(0);
}
Comment 1 Andrew Pinski 2012-04-23 05:03:28 UTC
I thought there is already one part of graphite.
Comment 2 Andrew Pinski 2012-04-23 05:07:30 UTC
I should mention I made one patch before based on the vectorizer code which did detection of at least memset; it was while I was an intern at Apple.  I posted it and there was some review.  And a few years back there was a paper at the GCC summit about it and expanding it to memcpy.  I don't know what happened to that code though.
Comment 3 davidxl 2012-04-23 05:34:24 UTC
(In reply to comment #2)
> I should mention I made one patch before based on the vectorizer code which did
> detection of at least memset; it was while I was an intern at Apple.  I posted
> it and there was some review.  And a few years back there was a paper at the
> GCC summit about it and expanding it to memcpy.  I don't know what happened to
> that code though.

Some simple analysis using scev to identify loads and stores with linear address should be good enough. 

LLVM's version also tries to merge smaller memsets into a larger one.
Comment 4 davidxl 2012-04-23 05:34:55 UTC
(In reply to comment #2)
> I should mention I made one patch before based on the vectorizer code which did
> detection of at least memset; it was while I was an intern at Apple.  I posted
> it and there was some review.  And a few years back there was a paper at the
> GCC summit about it and expanding it to memcpy.  I don't know what happened to
> that code though.

Some simple analysis using scev to identify loads and stores with linear address should be good enough. 

LLVM's version also tries to merge smaller memsets into a larger one.
Comment 5 Alexander Monakov 2012-04-23 06:53:11 UTC
The memset part can be handled by -ftree-loop-distribution (not enabled at -O3)
Comment 6 Andrew Pinski 2012-04-23 06:54:43 UTC
(In reply to comment #4)
> LLVM's version also tries to merge smaller memsets into a larger one.

That is filed as PR 49872.


And the original issue is related to PR 30442.
Comment 7 Jakub Jelinek 2012-04-23 08:18:42 UTC
Note that -ftree-loop-distribution for some weird reason only handles clearing, but not initialization by arbitrary constants that are convertible to memset (i.e. all bytes have the same constant stored, such as 0x0a0a0a0a etc.).
Comment 8 Richard Biener 2012-04-23 10:26:45 UTC
clearing recognition is enabled by default at -O2+ with -ftree-loop-distribute-patterns.  I agree more general memset handling would be nice, as well as
memcpy and memmove recognition.
Comment 9 Richard Biener 2012-06-05 09:24:48 UTC
Author: rguenth
Date: Tue Jun  5 09:24:43 2012
New Revision: 188232

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188232
Log:
2012-06-05  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/53081
	* tree-loop-distribution.c (generate_memset_builtin): Handle all
	kinds of byte-sized stores.
	(classify_partition): Likewise.
	(tree_loop_distribution): Adjust seed statements used for
	!flag_tree_loop_distribution.

	* gcc.dg/tree-ssa/ldist-19.c: New testcase.
	* gcc.c-torture/execute/builtins/builtins.exp: Always pass
	-fno-tree-loop-distribute-patterns.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-19.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp
    trunk/gcc/tree-loop-distribution.c
Comment 10 Richard Biener 2012-06-06 09:45:33 UTC
Author: rguenth
Date: Wed Jun  6 09:45:27 2012
New Revision: 188261

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188261
Log:
2012-06-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/53081
	* tree-data-ref.h (adjacent_store_dr_p): Rename to ...
	(adjacent_dr_p): ... this and make it work for reads, too.
	* tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMCPY.
	(struct partition_s): Change main_stmt to main_dr, add
	secondary_dr member.
	(build_size_arg_loc): Change to date data-reference and not
	gimplify here.
	(build_addr_arg_loc): New function split out from ...
	(generate_memset_builtin): ... here.  Use it and simplify.
	(generate_memcpy_builtin): New function.
	(generate_code_for_partition): Adjust.
	(classify_partition): Streamline pattern detection.  Detect
	memcpy.
	(ldist_gen): Adjust.
	(tree_loop_distribution): Adjust seed statements for memcpy
	recognition.

	* gcc.dg/tree-ssa/ldist-20.c: New testcase.
	* gcc.dg/tree-ssa/loop-19.c: Add -fno-tree-loop-distribute-patterns.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-20.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
    trunk/gcc/tree-data-ref.h
    trunk/gcc/tree-loop-distribution.c
Comment 11 Richard Biener 2012-06-06 09:46:24 UTC
Fixed.
Comment 12 Joost VandeVondele 2012-06-06 11:32:08 UTC
It doesn't quite seem to work for this simple Fortran testcase yet

SUBROUTINE S(a,N)
  INTEGER :: N,a(N)
  a=1
END SUBROUTINE S

(works for memset to 0)
Comment 13 rguenther@suse.de 2012-06-06 11:39:25 UTC
On Wed, 6 Jun 2012, Joost.VandeVondele at mat dot ethz.ch wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081
> 
> --- Comment #12 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2012-06-06 11:32:08 UTC ---
> It doesn't quite seem to work for this simple Fortran testcase yet
> 
> SUBROUTINE S(a,N)
>   INTEGER :: N,a(N)
>   a=1
> END SUBROUTINE S
> 
> (works for memset to 0)

Well, you can't transform this to a memset ;)
Comment 14 Joost VandeVondele 2012-06-06 11:54:22 UTC
(In reply to comment #13)
> Well, you can't transform this to a memset ;)

blush

things work as advertised for correct testcases... thanks!
Comment 15 Igor Zamyatin 2012-06-06 12:43:09 UTC
Fix causes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53588
Comment 16 Igor Zamyatin 2012-06-20 08:44:44 UTC
Also cause of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726