This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.


On Fri, Jan 3, 2014 at 1:27 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>> I am testing a patch that removes "max_align" part from ix86_data_alignment.
>>
>> That looks like unnecessary pessimization.  Note the hunk in question is
>> guarded with opt, which means it is an optimization rather than ABI issue,
>> it can increase alignment, but the compiler can only assume the increased
>> alignment if the symbol is not public or if it is public, but can't be
>> preempted by another TU's definition.  Even in that case it can be
>> worthwhile to increase the alignment, say if doing versioning for alignment,
>> or say just doing some AVX/AVX2/AVX512F version of memcpy/memset that can
>> handle it faster if it is sufficiently aligned by testing it at runtime.
>>
>> So, IMHO we shouldn't drop it, just improve it.
>> Perhaps:
>>
>>   int max_align = optimize_size ? BITS_PER_WORD
>>                                 : MIN (512, MAX_OFILE_ALIGNMENT);
>>
>>   if (opt
>>       && AGGREGATE_TYPE_P (type)
>>       && TYPE_SIZE (type)
>>       && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
>>       && align < max_align)
>>     {
>>       int this_align = 256;
>>       for (this_align = 256; this_align <= max_align; this_align *= 2)
>>         if (TREE_INT_CST_LOW (TYPE_SIZE (type)) < (unsigned) this_align
>>             && !TREE_INT_CST_HIGH (TYPE_SIZE (type)))
>>           break;
>>         else if (align < this_align)
>>           align = this_align;
>>     }
>>
>> which will handle both the 256 bit alignment for >= 256 bit objects,
>> 512 bit alignment for >= 512 bit objects and will be prepared for the
>> future.
>>
>> 128 bit I think doesn't need to be handled, DATA_ALIGNMENT has been
>> using 256-bit test already since it's introduction in 1998:
>> http://gcc.gnu.org/ml/gcc-bugs/1998-03/msg00011.html
>
> Thanks for the pointer, there is indeed the recommendation in
> optimization manual [1], section 3.6.4, where it is said:
>
> --quote--
> Misaligned data access can incur significant performance penalties.
> This is particularly true for cache line
> splits. The size of a cache line is 64 bytes in the Pentium 4 and
> other recent Intel processors, including
> processors based on Intel Core microarchitecture.
> An access to data unaligned on 64-byte boundary leads to two memory
> accesses and requires several
> Îops to be executed (instead of one). Accesses that span 64-byte
> boundaries are likely to incur a large
> performance penalty, the cost of each stall generally are greater on
> machines with longer pipelines.
>
> ...
>
> A 64-byte or greater data structure or array should be aligned so that
> its base address is a multiple of 64.
> Sorting data in decreasing size order is one heuristic for assisting
> with natural alignment. As long as 16-
> byte boundaries (and cache lines) are never crossed, natural alignment
> is not strictly necessary (though
> it is an easy way to enforce this).
> --/quote--
>
> So, this part has nothing to do with AVX512, but with cache line
> width. And we do have a --param "l1-cache-line-size=64", detected with
> -march=native that could come handy here.
>
> This part should be rewritten (and commented) with the information
> above in mind.

Like in the patch below. Please note, that the block_tune setting for
the nocona is wrong, -march=native on my trusted old P4 returns:

--param "l1-cache-size=16" --param "l1-cache-line-size=64" --param
"l2-cache-size=2048" "-mtune=nocona"

which is consistent with the above quote from manual.

2014-01-02  Uros Bizjak  <ubizjak@gmail.com>

    * config/i386/i386.c (ix86_data_alignment): Calculate max_align
    from prefetch_block tune setting.
    (nocona_cost): Correct size of prefetch block to 64.

The patch was bootstrapped on x86_64-pc-linux-gnu and is currently in
regression testing. If there are no comments, I will commit it to
mainline and release branches after a couple of days.

Uros.

Attachment: p.diff.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]