On a heap allocated structure, direct access to flexible array members with optimization at -O2 can result in reads to memory beyond the heap object. I.E. gcc assumes alignment/padding is allocated when accessing flexible array members. The attached file is a summary of the code involved though does _not_ reproduce the issue. To reproduce one can: git clone --depth=1 git://git.sv.gnu.org/coreutils.git cd coreutils/ git checkout 53883af0 export LSAN_OPTIONS=exitcode=0 ./bootstrap && ./configure --quiet && \ make -j8 AM_CFLAGS='-fsanitize=address -fsanitize=undefined' src/chmod a+rx .. Also attached is the disassembly of the problematic code, and for comparison good code achieved by using a (char*) cast on the flexi array to force byte at a time access.
Created attachment 35849 [details] summary code (does not reproduce issue)
Created attachment 35850 [details] disassembly of problematic mem access
Created attachment 35851 [details] disassembly of forced good mem access
I should note that I worked around the issue by increasing the allocation for the structure on the heap up to a multiple of alignof(the_struct). See: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=49078a78
(In reply to Pádraig Brady from comment #4) > I should note that I worked around the issue by increasing the allocation > for the structure on the heap up to a multiple of alignof(the_struct). See: > http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=49078a78 That is not just a workaround but a correct patch. The size of the memory allocated needs to be a multiply of the alignment of the struct to be valid. The reason why it works for you with your reduced testcase is because the alignment of the struct is 1.
More to the point this might actually be a -fsanitize=address bug so reopening for that issue. By the way we know the alignment of the struct is 4 which means loading a 4 byte would be valid.
Created attachment 35852 [details] reproducer
Yes, consider struct X { int n; char x[1]; }; which has sizeof(X) == 8 unless you use __attribute__((packed)) (in which case alignment also gets dropped down to 1).
I'm not understanding completely TBH. Are flexible array members not special? Should the optimizations be restricted on access through the flexible array, because I presume most/all existing allocation code is not considering this alignment/padding issue. I certainly didn't notice any examples when looking into a workaround which I came up with independently. For my reference there are some notes RE GCC's divergence from C99 re padding and flexi arrays at: https://sites.google.com/site/embeddedmonologue/home/c-programming/data-alig
On Thu, 25 Jun 2015, P at draigBrady dot com wrote: > I'm not understanding completely TBH. Are flexible array members not special? > Should the optimizations be restricted on access through the flexible array, > because I presume most/all existing allocation code is not considering this > alignment/padding issue. I certainly didn't notice any examples when looking > into a workaround which I came up with independently. For my reference there > are some notes RE GCC's divergence from C99 re padding and flexi arrays at: > https://sites.google.com/site/embeddedmonologue/home/c-programming/data-alig That page doesn't exist - I assume you mean: https://sites.google.com/site/embeddedmonologue/home/c-programming/data-alignment-structure-padding-and-flexible-array-member That page is over ten years out of date - it's quoting the wording in C99 as it was before it was corrected by TC2 (published 2004-11-15). You should look at the post-TC2 wording rather than the old wording with various defects in it.
I think this is a GCC bug. Object sizes do not have to be a multiple of the object alignment. Consider this example: #include <stdalign.h> #include <stdio.h> #include <stdlib.h> _Alignas (128) unsigned v0 = 128; _Alignas (8) unsigned v1 = 8; _Alignas (8) unsigned v2 = 8; _Alignas (8) unsigned v3 = 8; _Alignas (4) unsigned v4 = 4; _Alignas (4) unsigned v5 = 4; _Alignas (8) unsigned v6 = 8; _Alignas (8) unsigned v7 = 8; _Alignas (8) unsigned v8 = 8; _Alignas (4) unsigned v9 = 4; _Alignas (4) unsigned v10 = 4; static int cmp(const void *a, const void *b) { unsigned *const *a1 = a; unsigned *const *b1 = b; if (*a1 < *b1) return -1; if (*a1 > *b1) return 1; return 0; } int main() { unsigned *p[] = {&v1, &v2, &v3, &v4, &v5, &v6, &v7, &v8, &v9, &v10}; qsort (p, 10, sizeof (p[0]), cmp); for (int i = 0; i < 9; ++i) { printf ("%d: alignment %u, size %zd\n", i, *p[i], (p[i + 1] - p[i]) * sizeof (*p[i])); } } With gcc-5.3.1-6.fc23.x86_64, it prints this for me: 0: alignment 4, size 4 1: alignment 4, size 4 2: alignment 8, size 8 3: alignment 8, size 8 4: alignment 8, size 4 5: alignment 4, size 4 6: alignment 4, size 8 7: alignment 8, size 8 8: alignment 8, size 8 So the assumption that objects are at least as large as their alignment appears to be refuted by code generated by GCC.
Here's a new test case. It prints for me (without Address Sanitizer): count 2, align 4, minimum size 10, struct size 12, actual size 12 count 5, align 4, minimum size 13, struct size 16, actual size 20 count 0, align 8, minimum size 8, struct size 8, actual size 8 count 3, align 4, minimum size 11, struct size 12, actual size 24 * count 0, align 16, minimum size 8, struct size 16, actual size 8 count 8, align 4, minimum size 16, struct size 16, actual size 16 count 4, align 4, minimum size 12, struct size 12, actual size 12 count 2, align 4, minimum size 10, struct size 12, actual size 12 * count 2, align 8, minimum size 10, struct size 16, actual size 12 count 6, align 4, minimum size 14, struct size 16, actual size 16 count 7, align 4, minimum size 15, struct size 16, actual size 16 count 3, align 4, minimum size 11, struct size 12, actual size 12 count 3, align 4, minimum size 11, struct size 12, actual size 12 count 6, align 4, minimum size 14, struct size 16, actual size 44 * count 5, align 32, minimum size 13, struct size 32, actual size 16 count 8, align 4, minimum size 16, struct size 16, actual size 16 count 9, align 4, minimum size 17, struct size 20, actual size 20 count 7, align 4, minimum size 15, struct size 16, actual size 16 count 1, align 4, minimum size 9, struct size 12, actual size 12 * count 1, align 8, minimum size 9, struct size 16, actual size 12 I believe this shows that GCC has some bug in this area. Whether it's the over-reads (but over-reads can be fine here because they cannot trap, and GCC knows that they won't introduce observable data races), the object allocation in the .data section (again, could be harmless), or the Address Sanitizer report, I'm not sure. #include <stdalign.h> #include <stdio.h> #include <stdlib.h> #include <stddef.h> struct flexible { int count; int align; char bytes[]; }; #define ARGS_0 #define ARGS_1 1 #define ARGS_2 1, 2 #define ARGS_3 1, 2, 3 #define ARGS_4 1, 2, 3, 4 #define ARGS_5 1, 2, 3, 4, 5 #define ARGS_6 1, 2, 3, 4, 5, 6 #define ARGS_7 1, 2, 3, 4, 5, 6, 7 #define ARGS_8 1, 2, 3, 4, 5, 6, 7, 8 #define ARGS_9 1, 2, 3, 4, 5, 6, 7, 8, 9 #define DECL(name, count, align) \ _Alignas (align) struct flexible name = {count, align,{ ARGS_##count }} DECL (v4, 4, 4); DECL (v1, 1, 8); DECL (v7, 1, 4); DECL (v17, 7, 4); DECL (v15, 9, 4); DECL (v16, 8, 4); DECL (v11, 5, 32); DECL (v12, 6, 4); DECL (v5, 3, 4); DECL (v9, 3, 4); DECL (v13, 7, 4); DECL (v18, 6, 4); DECL (v2, 2, 8); DECL (v8, 2, 4); DECL (v10, 4, 4); DECL (v14, 8, 4); DECL (v0, 0, 16); DECL (v3, 3, 4); DECL (v5a, 0, 8); DECL (v19, 5, 4); DECL (v6, 2, 4); enum { count = 21 }; static int cmp(const void *a, const void *b) { struct flexible *const *a1 = a; struct flexible *const *b1 = b; if (*a1 < *b1) return -1; if (*a1 > *b1) return 1; return 0; } int main() { struct flexible *p[count] = {&v0, &v1, &v2, &v3, &v4, &v5, &v5a, &v6, &v7, &v8, &v9, &v10, &v11, &v12, &v13, &v14, &v15, &v16, &v17, &v18, &v19}; qsort (p, count, sizeof (p[0]), cmp); for (int i = 0; i < count - 1; ++i) { size_t min_size = offsetof (struct flexible, bytes) + p[i]->count; size_t align_mask = p[i]->align - 1; /* Struct size is the size that a struct with the requested length of the flexible array member would have. */ size_t struct_size = (min_size + align_mask) & ~align_mask; /* The actual size is the offset between this and the next object in the data section. (This can be an over-estimate if other objects not listed above are placed between the listed objects.) */ size_t actual_size = (p[i + 1] - p[i]) * sizeof (*p[i]); /* The lines marked with * have an object whose struct size exceeds the object size. If GCC assumes that objects always have their struct size allocated, this leads to an out-of-bounds acccess. */ printf ("%c count %d, align %d, minimum size %zu, struct size %zu, actual size %zu\n", struct_size > actual_size ? '*' : ' ', p[i]->count, p[i]->align, min_size, struct_size, actual_size); } }
(In reply to joseph@codesourcery.com from comment #10) > On Thu, 25 Jun 2015, P at draigBrady dot com wrote: > > > I'm not understanding completely TBH. Are flexible array members not special? > > Should the optimizations be restricted on access through the flexible array, > > because I presume most/all existing allocation code is not considering this > > alignment/padding issue. I certainly didn't notice any examples when looking > > into a workaround which I came up with independently. For my reference there > > are some notes RE GCC's divergence from C99 re padding and flexi arrays at: > > https://sites.google.com/site/embeddedmonologue/home/c-programming/data-alig > > That page doesn't exist - I assume you mean: > https://sites.google.com/site/embeddedmonologue/home/c-programming/data- > alignment-structure-padding-and-flexible-array-member > > That page is over ten years out of date - it's quoting the wording in C99 > as it was before it was corrected by TC2 (published 2004-11-15). You > should look at the post-TC2 wording rather than the old wording with > various defects in it. Could you quote the post-TC2 wording here for us so we don't have to go looking?
That wording is long including several examples. You can see it in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf subclause 6.7.2.1 (C99 + TC1 + TC2 + TC3).