Starting with gcc 10, the following code (based on https://github.com/ebiggers/libdeflate/blob/v1.5/lib/decompress_template.h#L353) isn't compiled as expected at -O3 on x86_64: #include <stddef.h> #include <stdio.h> #define WORDSIZE sizeof(size_t) struct word_unaligned { size_t x; } __attribute__((packed, may_alias)); static inline size_t load_word_unaligned(const char *p) { return ((struct word_unaligned *)p)->x; } static inline void store_word_unaligned(size_t x, char *p) { ((struct word_unaligned *)p)->x = x; } void __attribute__((noinline)) copy(char *dst, const char *src, size_t word_count) { do { store_word_unaligned(load_word_unaligned(src), dst); src += WORDSIZE; dst += WORDSIZE; } while (--word_count); } int main() { char buf[9 + 6 * WORDSIZE + 1] = "012345678"; copy(&buf[9], &buf[0], 6); puts(buf); } The code is supposed to copy 6 eight-byte words from &buf[0] to &buf[9], one word at a time, resulting in the output "012345678012345678012345678012345678012345678012345678012". But the actual output is "012345678012345678". Based on the diassembly, this seems to be caused by gcc assuming that the src and dst pointers are offset from each other by a multiple of 8 bytes. It uses this assumption to generate 16-byte-at-a-time SSE copy code. But that's invalid when the pointers are actually offset by 9 bytes as in the example. Is this working as intended? I don't think so, since the use of packed and may_alias should make gcc assume that the pointers can have any alignment, and each store can alias the next load. But perhaps there's some reason I'm missing why the code could nevertheless be considered incorrect, or perhaps there's some ambiguity in what 'packed' is supposed to do (it's a gcc extension after all). I'm going to change my code to use memcpy() anyway, since the bugs where gcc generated bad code for memcpy() in some cases have supposedly been fixed in recent gcc's. But I thought I'd point this out since I'm not sure it's working as intended, and probably other people will run into it too. Decompression code is most likely to be affected by this. gcc 9.3.0 works fine. I didn't test anything in between that and 10.1.
This changed with r10-4803-g8489e1f45b50600c01eb8ed8c5d0ca914ded281c
Confirmed.
GCC 10.2 is released, adjusting target milestone.
Seems to be caused by: /* Get the minimum alignment for all the scalar accesses that DR_INFO describes. */ static unsigned int vect_vfa_align (dr_vec_info *dr_info) { return TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_info->dr))); } providing duff alignment information. Changing it to: return dr_alignment (dr_info->dr); fixes the bug.
The master branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>: https://gcc.gnu.org/g:9fa5b473b5b8e289b6542adfd5cfaddfb3036048 commit r11-6380-g9fa5b473b5b8e289b6542adfd5cfaddfb3036048 Author: Richard Sandiford <richard.sandiford@arm.com> Date: Thu Dec 31 16:51:33 2020 +0000 vect: Fix bogus alignment assumption in alias checks [PR94994] This PR is about a case in which the vectoriser was feeding incorrect alignment information to tree-data-ref.c, leading to incorrect runtime alias checks. The alignment was taken from the TREE_TYPE of the DR_REF, which in this case was a COMPONENT_REF with a normally-aligned type. However, the underlying MEM_REF was only byte-aligned. This patch uses dr_alignment to calculate the (byte) alignment instead, just like we do when creating vector MEM_REFs. gcc/ PR tree-optimization/94994 * tree-vect-data-refs.c (vect_vfa_align): Use dr_alignment. gcc/testsuite/ PR tree-optimization/94994 * gcc.dg/vect/pr94994.c: New test.
Fixed on trunk so far, will backport in a week or so.
Fixed for GCC 10 by r10-9261.
*** Bug 100410 has been marked as a duplicate of this bug. ***