This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ivopt] Try aligned offset when get_address_cost
- From: Zhenqiang Chen <zhenqiang dot chen at linaro dot org>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Zhenqiang Chen <zhenqiang dot chen at arm dot com>, "Bin.Cheng" <amker dot cheng at gmail dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 6 Aug 2014 14:34:38 +0800
- Subject: Re: [PATCH, ivopt] Try aligned offset when get_address_cost
- Authentication-results: sourceware.org; auth=none
- References: <000001cfafad$5de00840$19a018c0$ at arm dot com> <CAHFci2_t4E9+Heb0RU2LbzUmvUa1CqMtETyw1LQtELdrd+eTKA at mail dot gmail dot com> <000001cfafc3$d5372c00$7fa58400$ at arm dot com> <CAFiYyc1P16LAHOMAt-QEKsJ0gzMKBJ4oagSB=L5Mik7Uoenwuw at mail dot gmail dot com>
On 5 August 2014 21:59, Richard Biener <richard.guenther@gmail.com> wrote:
> On Mon, Aug 4, 2014 at 11:09 AM, Zhenqiang Chen <zhenqiang.chen@arm.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Bin.Cheng [mailto:amker.cheng@gmail.com]
>>> Sent: Monday, August 04, 2014 4:41 PM
>>> To: Zhenqiang Chen
>>> Cc: gcc-patches List
>>> Subject: Re: [PATCH, ivopt] Try aligned offset when get_address_cost
>>>
>>> On Mon, Aug 4, 2014 at 2:28 PM, Zhenqiang Chen
>>> <zhenqiang.chen@arm.com> wrote:
>>> > Hi,
>>> >
>>> > For some TARGET, like ARM THUMB1, the offset in load/store should be
>>> > nature aligned. But in function get_address_cost, when computing
>>> > max_offset, it only tries byte-aligned offsets:
>>> >
>>> > ((unsigned HOST_WIDE_INT) 1 << i) - 1
>>> >
>>> > which can not meet thumb_legitimate_offset_p check called from
>>> > thumb1_legitimate_address_p for HImode and SImode.
>>> >
>>> > The patch adds additional try for aligned offset:
>>> >
>>> > ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode).
>>> >
>>> > Bootstrap and no make check regression on X86-64.
>>> > No make check regression on qemu for Cortex-m0 and Cortex-m3.
>>> > For Cortex-m0, no performance changes with coremark and dhrystone.
>>> > Coremark code size is ~0.44 smaller. And eembcv2 code size is ~0.22
>>> > smaller. CSiBE code size is ~0.05% smaller.
>>> >
>>> > OK for trunk?
>>> >
>>> > Thanks!
>>> > -Zhenqiang
>>> >
>>> > ChangeLog
>>> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com>
>>> >
>>> > * tree-ssa-loop-ivopts.c (get_address_cost): Try aligned offset.
>>> >
>>> > testsuite/ChangeLog:
>>> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com>
>>> >
>>> > * gcc.target/arm/get_address_cost_aligned_max_offset.c: New
>> test.
>>> >
>>> > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>> > index 3b4a6cd..562122a 100644
>>> > --- a/gcc/tree-ssa-loop-ivopts.c
>>> > +++ b/gcc/tree-ssa-loop-ivopts.c
>>> > @@ -3308,6 +3308,18 @@ get_address_cost (bool symbol_present, bool
>>> > var_present,
>>> > XEXP (addr, 1) = gen_int_mode (off, address_mode);
>>> > if (memory_address_addr_space_p (mem_mode, addr, as))
>>> > break;
>>> > + /* For some TARGET, like ARM THUMB1, the offset should be
>> nature
>>> > + aligned. Try an aligned offset if address_mode is not
>> QImode.
>>> > */
>>> > + off = (address_mode == QImode)
>>> > + ? 0
>>> > + : ((unsigned HOST_WIDE_INT) 1 << i)
>>> > + - GET_MODE_SIZE (address_mode);
>>> > + if (off > 0)
>>> > + {
>>> > + XEXP (addr, 1) = gen_int_mode (off, address_mode);
>>> > + if (memory_address_addr_space_p (mem_mode, addr, as))
>>> > + break;
>>> > + }
>>> Hi, Why not just check "address_mode != QImode"? Set off to 0 then check
>> it
>>> seems unnecessary.
>>
>> Thanks for the comments.
>>
>> ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode) might be a
>> negative value except QImode. A negative value can not be max_offset. So we
>> do not need to check it.
>>
>> For QImode, "((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE
>> (address_mode)" == "((unsigned HOST_WIDE_INT) 1 << i) - 1". It is already
>> checked. So no need to check it again.
>>
>> I think the compiler can optimize the patch like
>>
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index 3b4a6cd..213598a 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -3308,6 +3308,19 @@ get_address_cost (bool symbol_present, bool
>> var_present,
>> XEXP (addr, 1) = gen_int_mode (off, address_mode);
>> if (memory_address_addr_space_p (mem_mode, addr, as))
>> break;
>> + /* For some TARGET, like ARM THUMB1, the offset should be nature
>> + aligned. Try an aligned offset if address_mode is not QImode.
>> */
>> + if (address_mode != QImode)
>> + {
>> + off = ((unsigned HOST_WIDE_INT) 1 << i)
>> + - GET_MODE_SIZE (address_mode);
>> + if (off > 0)
>> + {
>> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
>> + if (memory_address_addr_space_p (mem_mode, addr, as))
>> + break;
>> + }
>> + }
>> }
>> if (i == -1)
>> off = 0;
>
> But is off now guaranteed to be the max value? (1 << (i-1) ) - 1 for
> small i is larger than (1 << i) - GET_MODE_SIZE (address_mode).
>
> That is, I think you want to guard this with 1 << (i - 1) >
> GET_MODE_SIZE (address_mode)?
Yes. Without off > 0, it can not guarantee the off is the max value.
With off > 0, it can guarantee that
(1 << i) - GET_MODE_SIZE (address_mode) is greater than (1 << (i-1) ) - 1.
> You don't adjust the negative offset side - why?
-((unsigned HOST_WIDE_INT) 1 << i) is already the min aligned offset.
Thanks!
-Zhenqiang
> Richard.
>
>>
>>> Thanks,
>>> bin
>>> > }
>>> > if (i == -1)
>>> > off = 0;
>>> > diff --git
>>> > a/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c
>>> > b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c
>>> > new file mode 100644
>>> > index 0000000..cc3e2f7
>>> > --- /dev/null
>>> > +++
>>> b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset
>>> > +++ .c
>>> > @@ -0,0 +1,28 @@
>>> > +/* { dg-do compile } */
>>> > +/* { dg-options "-mthumb -O2" } */
>>> > +/* { dg-require-effective-target arm_thumb1_ok } */
>>> > +
>>> > +unsigned int
>>> > +test (const short p16[6 * 64])
>>> > +{
>>> > + unsigned int i = 6;
>>> > + unsigned int ret = 0;
>>> > +
>>> > + do
>>> > + {
>>> > + unsigned long long *p64 = (unsigned long long*) p16;
>>> > + unsigned int *p32 = (unsigned int*) p16;
>>> > + ret += ret;
>>> > + if (p16[1] || p32[1])
>>> > + ret++;
>>> > + else if (p64[1] | p64[2] | p64[3])
>>> > + ret++;
>>> > + p16 += 64;
>>> > + i--;
>>> > + } while (i != 0);
>>> > +
>>> > + return ret;
>>> > +}
>>> > +
>>> > +/* { dg-final { scan-assembler-not "#22" } } */
>>> > +/* { dg-final { scan-assembler-not "#14" } } */
>>> >
>>> >
>>> >
>>
>>
>>
>>