This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Eric Botcazou <ebotcazou at adacore dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>
- Date: Fri, 3 Jan 2014 17:04:39 +0100
- Subject: Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
- Authentication-results: sourceware.org; auth=none
- References: <20131112123633 dot GC34333 at msticlxl57 dot ims dot intel dot com> <201401022318 dot 15106 dot ebotcazou at adacore dot com> <CAFULd4bqaCZcJZmqZ9Cj=5vUJzofKJWg2hDxCm=-2g6yte66zQ at mail dot gmail dot com> <201401031220 dot 34808 dot ebotcazou at adacore dot com> <CAFULd4ZvCFhW=VhhQ89Zp6KYPVjjDET6f71cu-iEFCBDmTFBtQ at mail dot gmail dot com> <20140103115939 dot GF892 at tucnak dot redhat dot com> <CAFULd4bhLUho1Yj9m5=vvpEFvyk5XGEhY5SdTjrzgDxN6s2Oqw at mail dot gmail dot com> <CAFULd4a8g2GCLYkBpXoszsofmCbienNZzqNHxOqEB_n3rjCFpw at mail dot gmail dot com> <20140103134326 dot GH892 at tucnak dot redhat dot com> <CAFULd4bPOAaRjb-G8=oGntEMnpB9T=uQQ7uWT1UgAoVTjV-Cug at mail dot gmail dot com>
On Fri, Jan 3, 2014 at 3:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> Like in the patch below. Please note, that the block_tune setting for
>>> the nocona is wrong, -march=native on my trusted old P4 returns:
>>>
>>> --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param
>>> "l2-cache-size=2048" "-mtune=nocona"
>>>
>>> which is consistent with the above quote from manual.
>>>
>>> 2014-01-02 Uros Bizjak <ubizjak@gmail.com>
>>>
>>> * config/i386/i386.c (ix86_data_alignment): Calculate max_align
>>> from prefetch_block tune setting.
>>> (nocona_cost): Correct size of prefetch block to 64.
>>>
>>> The patch was bootstrapped on x86_64-pc-linux-gnu and is currently in
>>> regression testing. If there are no comments, I will commit it to
>>> mainline and release branches after a couple of days.
>>
>> That still has the effect of not aligning (for most tunings) 32 to 63 bytes
>> long aggregates to 32 bytes, while previously they were aligned. Forcing
>> aligning 32 byte long aggregates to 64 bytes would be overkill, 32 byte
>> alignment is just fine for those (and ensures it never crosses 64 byte
>> boundary), for 33 to 63 bytes perhaps using 64 bytes alignment wouldn't
>> be that bad, just wouldn't match what we have done before.
>
> Please note that previous value was based on earlier (pre P4)
> recommendation and it was appropriate for older chips with 32byte
> cache line. The value should be updated long ago, when 64bit cache
> lines were introduced, but was probably missed due to usage of magic
> value without comment.
>
> Ah, I see. My patch deals only with structures, larger than cache
> line. As recommended in "As long as 16-byte boundaries (and cache
> lines) are never crossed, natural alignment is not strictly necessary
> (though it is an easy way to enforce this)." part of the manual, we
> should align smaller structures to 16 or 32 bytes.
>
> Yes, I agree. Can you please merge your patch together with the proposed patch?
On a second thought, the crossing of 16-byte boundaries is mentioned
for the data *access* (the instruction itself) if it is not naturally
aligned (please see example 3-40 and fig 3-2), which is *NOT* in our
case.
So, we don't have to align 32 byte structures in any way for newer
processors, since this optimization applies to 64+ byte (larger or
equal to cache line size) structures only. Older processors are
handled correctly, modulo nocona, where its cache line size value has
to be corrected.
Following that, my original patch implements this optimization in the
correct way.
Uros.