This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Eric Botcazou <ebotcazou at adacore dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>
- Date: Tue, 14 Jan 2014 11:00:26 -0800
- Subject: Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
- Authentication-results: sourceware.org; auth=none
- References: <201401022318 dot 15106 dot ebotcazou at adacore dot com> <CAFULd4bqaCZcJZmqZ9Cj=5vUJzofKJWg2hDxCm=-2g6yte66zQ at mail dot gmail dot com> <201401031220 dot 34808 dot ebotcazou at adacore dot com> <CAFULd4ZvCFhW=VhhQ89Zp6KYPVjjDET6f71cu-iEFCBDmTFBtQ at mail dot gmail dot com> <20140103115939 dot GF892 at tucnak dot redhat dot com> <CAFULd4bhLUho1Yj9m5=vvpEFvyk5XGEhY5SdTjrzgDxN6s2Oqw at mail dot gmail dot com> <CAFULd4a8g2GCLYkBpXoszsofmCbienNZzqNHxOqEB_n3rjCFpw at mail dot gmail dot com> <20140103134326 dot GH892 at tucnak dot redhat dot com> <CAFULd4bPOAaRjb-G8=oGntEMnpB9T=uQQ7uWT1UgAoVTjV-Cug at mail dot gmail dot com> <CAFULd4aNKHpQsh0OYPZt3uFfU+AnVA8r5UCw-_tfsFCMXwLuYA at mail dot gmail dot com> <20140114170932 dot GR892 at tucnak dot redhat dot com> <CAFULd4bo+A8BJ8oajQimNNbOH1fn9-vVawJGe2iftFecXUiEpw at mail dot gmail dot com>
On Tue, Jan 14, 2014 at 10:37 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Jan 14, 2014 at 6:09 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>
>>> On a second thought, the crossing of 16-byte boundaries is mentioned
>>> for the data *access* (the instruction itself) if it is not naturally
>>> aligned (please see example 3-40 and fig 3-2), which is *NOT* in our
>>> case.
>>>
>>> So, we don't have to align 32 byte structures in any way for newer
>>> processors, since this optimization applies to 64+ byte (larger or
>>> equal to cache line size) structures only. Older processors are
>>> handled correctly, modulo nocona, where its cache line size value has
>>> to be corrected.
>>>
>>> Following that, my original patch implements this optimization in the
>>> correct way.
>>
>> Sorry for catching this late, but on the 4.8 and earlier branches
>> there is no opt argument and thus any ix86_data_alignment change is
>> unfortunately an ABI change. So I'd think we should revert
>> r206433 and r206436. And for the trunk we need to ensure even for
>
> OK, let's play safe. I'll revert these two changes (modulo size of
> nocona prefetch block).
>
>> opt we never return a smaller number from ix86_data_alignment than
>> we did in 4.8 and earlier, because otherwise if you have 4.8 compiled
>> code that assumes the alignment 4.8 would use for something that is defined
>> in a compilation unit built by gcc 4.9+, if we don't align it at least
>> as much as we did in the past, the linked mix of 4.8 user and 4.9 definition
>> could misbehave.
>
> From 4.9 onwards, we would like to align >= 64byte structures on
> 64byte boundary. Should we add a compatibility rule to align >= 32byte
> structures to 32 bytes?
That is why we issue a warning when alignment was changed
with AVX support:
[hjl@gnu-6 tmp]$ cat a1.i
typedef long long __m256i __attribute__ ((__vector_size__ (32), __may_alias__));
extern __m256i y;
void
f1(__m256i x)
{
y = x;
}
[hjl@gnu-6 tmp]$ gcc -S a1.i
a1.i: In function ‘f1’:
a1.i:4:1: note: The ABI for passing parameters with 32-byte alignment
has changed in GCC 4.6
f1(__m256i x)
^
a1.i:4:1: warning: AVX vector argument without AVX enabled changes the
ABI [enabled by default]
[hjl@gnu-6 tmp]$
> Please also note that in 4.7 and 4.8, we have
>
> int max_align = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
>
> so, in effect -Os code will be incompatible with other optimization levels.
>
> I guess that for 4.7 and 4.8, we should revert to this anyway, but
> what to do with 4.9?
>
> Uros.
--
H.J.