53081 – memcpy/memset loop recognition

Bug 53081 - memcpy/memset loop recognition

Summary: memcpy/memset loop recognition

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	4.8.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2012-04-23 04:51 UTC by davidxl
Modified:	2023-12-29 01:04 UTC (History)
CC List:	5 users (show)

See Also:	56888
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-04-23 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description davidxl 2012-04-23 04:51:31 UTC

Both LLVM and icc recognize initialization and copy loop and synthesize calls to memcpy and memset.  memmove call can also be synthesized when src/target may overlap.

Option needs to provided to disable such optimization in signal handlers.

I consider this as optimization for benchmarking ;) For instance, the prime number finder program sieve.c is one of the benchmarks in LLVM. Both LLVM and icc beats gcc in this one because of the missing optimization.


#ifndef T
#define T int
#endif

T arr[1000];

void foo(int n)
{
  int i;
  for (i = 0; i < n; i++)
    {
      arr[i] = 0;
    }
}

void foo2(int n, T* p)
{
  int i;
  for (i = 0; i < n; i++)
    {
      *p++ = 0;
    }
}

#ifndef T
#define T int
#endif

T arr[1000];
T arr2[1000];

void foo(int n)
{
  int i;
  for (i = 0; i < n; i++)
    {
      arr[i] = arr2[i];
    }
}


// sieve.c

/* -*- mode: c -*-
 * $Id: sieve.c 36673 2007-05-03 16:55:46Z laurov $
 * http://www.bagley.org/~doug/shootout/
 */

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 17000
#else
#define LENGTH 170000
#endif
    int NUM = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    static char flags[8192 + 1];
    long i, k;
    int count = 0;

    while (NUM--) {
        count = 0; 
        for (i=2; i <= 8192; i++) {
            flags[i] = 1;
        }
        for (i=2; i <= 8192; i++) {
            if (flags[i]) {
                /* remove all multiples of prime: i */
                for (k=i+i; k <= 8192; k+=i) {
                    flags[k] = 0;
                }
                count++;
            }
        }
    }
    printf("Count: %d\n", count);
    return(0);
}

Comment 1 Andrew Pinski 2012-04-23 05:03:28 UTC

I thought there is already one part of graphite.

Comment 2 Andrew Pinski 2012-04-23 05:07:30 UTC

I should mention I made one patch before based on the vectorizer code which did detection of at least memset; it was while I was an intern at Apple.  I posted it and there was some review.  And a few years back there was a paper at the GCC summit about it and expanding it to memcpy.  I don't know what happened to that code though.

Comment 3 davidxl 2012-04-23 05:34:24 UTC

(In reply to comment #2)
> I should mention I made one patch before based on the vectorizer code which did
> detection of at least memset; it was while I was an intern at Apple.  I posted
> it and there was some review.  And a few years back there was a paper at the
> GCC summit about it and expanding it to memcpy.  I don't know what happened to
> that code though.

Some simple analysis using scev to identify loads and stores with linear address should be good enough. 

LLVM's version also tries to merge smaller memsets into a larger one.

Comment 4 davidxl 2012-04-23 05:34:55 UTC

(In reply to comment #2)
> I should mention I made one patch before based on the vectorizer code which did
> detection of at least memset; it was while I was an intern at Apple.  I posted
> it and there was some review.  And a few years back there was a paper at the
> GCC summit about it and expanding it to memcpy.  I don't know what happened to
> that code though.

Some simple analysis using scev to identify loads and stores with linear address should be good enough. 

LLVM's version also tries to merge smaller memsets into a larger one.

Comment 5 Alexander Monakov 2012-04-23 06:53:11 UTC

The memset part can be handled by -ftree-loop-distribution (not enabled at -O3)

Comment 6 Andrew Pinski 2012-04-23 06:54:43 UTC

(In reply to comment #4)
> LLVM's version also tries to merge smaller memsets into a larger one.

That is filed as PR 49872.


And the original issue is related to PR 30442.

Comment 7 Jakub Jelinek 2012-04-23 08:18:42 UTC

Note that -ftree-loop-distribution for some weird reason only handles clearing, but not initialization by arbitrary constants that are convertible to memset (i.e. all bytes have the same constant stored, such as 0x0a0a0a0a etc.).

Comment 8 Richard Biener 2012-04-23 10:26:45 UTC

clearing recognition is enabled by default at -O2+ with -ftree-loop-distribute-patterns.  I agree more general memset handling would be nice, as well as
memcpy and memmove recognition.

Comment 9 Richard Biener 2012-06-05 09:24:48 UTC

Author: rguenth
Date: Tue Jun  5 09:24:43 2012
New Revision: 188232

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188232
Log:
2012-06-05  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/53081
	* tree-loop-distribution.c (generate_memset_builtin): Handle all
	kinds of byte-sized stores.
	(classify_partition): Likewise.
	(tree_loop_distribution): Adjust seed statements used for
	!flag_tree_loop_distribution.

	* gcc.dg/tree-ssa/ldist-19.c: New testcase.
	* gcc.c-torture/execute/builtins/builtins.exp: Always pass
	-fno-tree-loop-distribute-patterns.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-19.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp
    trunk/gcc/tree-loop-distribution.c

Comment 10 Richard Biener 2012-06-06 09:45:33 UTC

Author: rguenth
Date: Wed Jun  6 09:45:27 2012
New Revision: 188261

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188261
Log:
2012-06-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/53081
	* tree-data-ref.h (adjacent_store_dr_p): Rename to ...
	(adjacent_dr_p): ... this and make it work for reads, too.
	* tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMCPY.
	(struct partition_s): Change main_stmt to main_dr, add
	secondary_dr member.
	(build_size_arg_loc): Change to date data-reference and not
	gimplify here.
	(build_addr_arg_loc): New function split out from ...
	(generate_memset_builtin): ... here.  Use it and simplify.
	(generate_memcpy_builtin): New function.
	(generate_code_for_partition): Adjust.
	(classify_partition): Streamline pattern detection.  Detect
	memcpy.
	(ldist_gen): Adjust.
	(tree_loop_distribution): Adjust seed statements for memcpy
	recognition.

	* gcc.dg/tree-ssa/ldist-20.c: New testcase.
	* gcc.dg/tree-ssa/loop-19.c: Add -fno-tree-loop-distribute-patterns.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-20.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
    trunk/gcc/tree-data-ref.h
    trunk/gcc/tree-loop-distribution.c

Comment 11 Richard Biener 2012-06-06 09:46:24 UTC

Fixed.

Comment 12 Joost VandeVondele 2012-06-06 11:32:08 UTC

It doesn't quite seem to work for this simple Fortran testcase yet

SUBROUTINE S(a,N)
  INTEGER :: N,a(N)
  a=1
END SUBROUTINE S

(works for memset to 0)

Comment 13 rguenther@suse.de 2012-06-06 11:39:25 UTC

On Wed, 6 Jun 2012, Joost.VandeVondele at mat dot ethz.ch wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081
> 
> --- Comment #12 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2012-06-06 11:32:08 UTC ---
> It doesn't quite seem to work for this simple Fortran testcase yet
> 
> SUBROUTINE S(a,N)
>   INTEGER :: N,a(N)
>   a=1
> END SUBROUTINE S
> 
> (works for memset to 0)

Well, you can't transform this to a memset ;)

Comment 14 Joost VandeVondele 2012-06-06 11:54:22 UTC

(In reply to comment #13)
> Well, you can't transform this to a memset ;)

blush

things work as advertised for correct testcases... thanks!

Comment 15 Igor Zamyatin 2012-06-06 12:43:09 UTC

Fix causes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53588

Comment 16 Igor Zamyatin 2012-06-20 08:44:44 UTC

Also cause of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726