This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] More aggressive __builtin_memcpy optimizations (PR middle-end/29215)
On Thu, Nov 20, 2008 at 7:54 AM, Jakub Jelinek <email@example.com> wrote:
> On Wed, Nov 19, 2008 at 06:23:28PM +0100, Richard Guenther wrote:
>> The patch is ok.
> Before checking in, I've bootstrapped/regtested it on ia64-linux
> in addition to the usual x86_64-linux bootstrap/regtest.
> Unfortunately this revealed a few issues:
> 1) on ia64 allocatable_function_3.f90 ICEd at -O1 and higher, because
> memcpy from a TREE_STATIC readonly VAR_DECL with a CONSTRUCTOR
> initializer was optimized into a store, but TREE_PURPOSE in the
> ctor elements weren't filled up, which upsets e.g.
> gimplify_init_ctor_eval or categorize_ctor_elements_1.
> When this wasn't optimized, Fortran FE managed to survive,
> because none of these routines were called. Either we could teach
> these what to do if purpose isn't present, or, as done in the patch
> below, fill in TREE_PURPOSE like the C/C++ FEs do.
> 2) when looking why that test didn't fail on x86_64-linux as well,
> I found out the
> if (!var_decl_component_p (srcvar)
> /* Accept: memcpy (*char_var, "test", 1); that simplify
> to char_var='t'; */
> || is_gimple_min_invariant (srcvar)
> || readonly_data_expr (src))
> return NULL_TREE;
> lines (in current SVN, my previous patch just moved these around).
> On ia64 src was actually emitted into .sdata section, even when
> read-only and thus memcpy was optimized, on x86_64 where src
> lived in .rodata it wasn't. This logic seems to be very much
> broken, I mean if something is var_decl_component_p, then it
> won't be gimple invariant, and whether it is in readonly data
> doesn't matter too. return NULL_TREE here means "don't optimize".
> I've tried changing this into !var_decl_component_p () &&
> !is_gimple_min_invariant () && !readonly_data_expr (),
> but that created invalid GIMPLE, assigning STRING_CST into something.
> So the patch below just removes those, optimization of the
> memcpy (*char_var, "test", 1) can be looked at later, but
> would certainly need extra code.
> 3) the new testcase failed on ia64, because memcpy wasn't optimized there.
> memcpy is optimized only if the other pointer is known to be sufficiently
> aligned (in the testcase it wasn't) or if unaligned access is allowed
> and not slow (true on x86_64, false on ia64). I've adjusted
> the testcase so that gcc knows the buffer is aligned.
> 4) array_memcpy_3.f90 test failed to match on x86_64-linux after the
> 2) changes, memcpy is now optimized into a store, the pattern has
> been adjusted not to care if memcpy has been optimized or not.
> Ok for trunk (together with the earlier patch)?
> 2008-11-20 Jakub Jelinek <firstname.lastname@example.org>
> PR middle-end/29215
> * builtins.c (fold_builtin_memory_op): Remove
> is_gimple_min_invariant and readonly_data_expr src check.
> * trans-array.c (trans_array_constructor_value,
> gfc_build_constant_array_constructor): Fill in TREE_PURPOSE.
> * gfortran.dg/array_memcpy_3.f90: Adjust pattern to match even
> memcpy optimized into ref-all store.
> * gcc.dg/pr29215.c: Adjust so that all stores are known to be