This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RS6000] PR60737, expand_block_clear uses word stores
- From: David Edelsohn <dje dot gcc at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>, Anton Blanchard <anton at samba dot org>
- Date: Fri, 2 May 2014 11:28:27 -0400
- Subject: Re: [RS6000] PR60737, expand_block_clear uses word stores
- Authentication-results: sourceware.org; auth=none
- References: <20140502102023 dot GM16139 at bubble dot grove dot modra dot org>
On Fri, May 2, 2014 at 6:20 AM, Alan Modra <amodra@gmail.com> wrote:
> In cases where the compiler has no alignment info, powerpc64le-linux
> gcc generates byte at a time copies for -mstrict-align (which is on
> for little-endian power7). That's awful code, a problem shared by
> other strict-align targets, see pr50417. However, we also have a case
> when -mno-strict-align generates less than ideal code, which I believe
> stems from using alignment as a proxy for testing an address offset.
> See http://gcc.gnu.org/ml/gcc-patches/1999-09n/msg01072.html.
>
> So my first attempt at fixing this problem looked at address offsets
> directly. That worked fine too, but on thinking some more, I believe
> we no longer have the movdi restriction. Nowadays we'll reload the
> address if we have an offset that doesn't satisfy the "Y" constraint
> (ie. a multiple of 4 offset). Which led to this simpler patch.
> Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux
> and powerpc-linux. OK to apply?
Hi, Alan
Thanks for finding and addressing this.
As you mention, recent server-class processors, at least POWER8, do
not have the performance degradation for common, mis-aligned loads and
stores of wider modes. But the patch should not impose this default on
the large, installed based of processors, where mis-aligned loads can
be a severe performance penalty. This heuristic has become
processor-dependent and should not be hard-coded in the block_move and
block_clear algorithms.
PROCESSOR_DEFAULT is POWER8 for ELFv2 (and should be updated as the
default for PowerLinux in general). Please update the patch to test
rs6000_cpu, probably another boolean flag set in
rs6000_option_override_internal(). Because of the processor defaults,
the preferred instruction sequence will be the default without
encoding an assumption about the heuristics in the algorithm itself.
Thanks, David