Bug 94994 - [10 Regression] possible miscompilation of word-at-a-time copy via packed structs
Summary: [10 Regression] possible miscompilation of word-at-a-time copy via packed str...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 10.1.0
: P2 normal
Target Milestone: 10.3
Assignee: Richard Sandiford
URL:
Keywords: wrong-code
: 100410 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-05-08 07:25 UTC by Eric Biggers
Modified: 2023-05-03 08:41 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-05-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Biggers 2020-05-08 07:25:25 UTC
Starting with gcc 10, the following code (based on https://github.com/ebiggers/libdeflate/blob/v1.5/lib/decompress_template.h#L353) isn't compiled as expected at -O3 on x86_64:


#include <stddef.h>
#include <stdio.h>

#define WORDSIZE	sizeof(size_t)

struct word_unaligned {
	size_t x;
} __attribute__((packed, may_alias));

static inline size_t load_word_unaligned(const char *p)
{
	return ((struct word_unaligned *)p)->x;
}

static inline void store_word_unaligned(size_t x, char *p)
{
	((struct word_unaligned *)p)->x = x;
}

void __attribute__((noinline))
copy(char *dst, const char *src, size_t word_count)
{
	do {
		store_word_unaligned(load_word_unaligned(src), dst);
		src += WORDSIZE;
		dst += WORDSIZE;
	} while (--word_count);
}

int main()
{
	char buf[9 + 6 * WORDSIZE + 1] = "012345678";

	copy(&buf[9], &buf[0], 6);

	puts(buf);
}


The code is supposed to copy 6 eight-byte words from &buf[0] to &buf[9], one word at a time, resulting in the output "012345678012345678012345678012345678012345678012345678012".  But the actual output is "012345678012345678".  Based on the diassembly, this seems to be caused by gcc assuming that the src and dst pointers are offset from each other by a multiple of 8 bytes.  It uses this assumption to generate 16-byte-at-a-time SSE copy code.  But that's invalid when the pointers are actually offset by 9 bytes as in the example.

Is this working as intended?  I don't think so, since the use of packed and may_alias should make gcc assume that the pointers can have any alignment, and each store can alias the next load.  But perhaps there's some reason I'm missing why the code could nevertheless be considered incorrect, or perhaps there's some ambiguity in what 'packed' is supposed to do (it's a gcc extension after all).

I'm going to change my code to use memcpy() anyway, since the bugs where gcc generated bad code for memcpy() in some cases have supposedly been fixed in recent gcc's.  But I thought I'd point this out since I'm not sure it's working as intended, and probably other people will run into it too.  Decompression code is most likely to be affected by this.

gcc 9.3.0 works fine.  I didn't test anything in between that and 10.1.
Comment 1 Jakub Jelinek 2020-05-08 07:58:51 UTC
This changed with r10-4803-g8489e1f45b50600c01eb8ed8c5d0ca914ded281c
Comment 2 Richard Biener 2020-05-08 11:49:09 UTC
Confirmed.
Comment 3 Richard Biener 2020-07-23 06:51:55 UTC
GCC 10.2 is released, adjusting target milestone.
Comment 4 Richard Sandiford 2020-12-30 11:24:16 UTC
Seems to be caused by:

/* Get the minimum alignment for all the scalar accesses that DR_INFO
   describes.  */

static unsigned int
vect_vfa_align (dr_vec_info *dr_info)
{
  return TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_info->dr)));
}

providing duff alignment information.  Changing it to:

  return dr_alignment (dr_info->dr);

fixes the bug.
Comment 5 GCC Commits 2020-12-31 16:51:55 UTC
The master branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:9fa5b473b5b8e289b6542adfd5cfaddfb3036048

commit r11-6380-g9fa5b473b5b8e289b6542adfd5cfaddfb3036048
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Thu Dec 31 16:51:33 2020 +0000

    vect: Fix bogus alignment assumption in alias checks [PR94994]
    
    This PR is about a case in which the vectoriser was feeding
    incorrect alignment information to tree-data-ref.c, leading
    to incorrect runtime alias checks.  The alignment was taken
    from the TREE_TYPE of the DR_REF, which in this case was a
    COMPONENT_REF with a normally-aligned type.  However, the
    underlying MEM_REF was only byte-aligned.
    
    This patch uses dr_alignment to calculate the (byte) alignment
    instead, just like we do when creating vector MEM_REFs.
    
    gcc/
            PR tree-optimization/94994
            * tree-vect-data-refs.c (vect_vfa_align): Use dr_alignment.
    
    gcc/testsuite/
            PR tree-optimization/94994
            * gcc.dg/vect/pr94994.c: New test.
Comment 6 Richard Sandiford 2020-12-31 16:54:56 UTC
Fixed on trunk so far, will backport in a week or so.
Comment 7 Richard Sandiford 2021-01-12 10:07:06 UTC
Fixed for GCC 10 by r10-9261.
Comment 8 Jakub Jelinek 2021-05-04 08:51:14 UTC
*** Bug 100410 has been marked as a duplicate of this bug. ***