This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Assorted store-merging improvements (PR middle-end/22141)
- From: Richard Biener <rguenther at suse dot de>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, gcc-patches at gcc dot gnu dot org
- Date: Mon, 30 Oct 2017 11:46:54 +0100 (CET)
- Subject: Re: [PATCH] Assorted store-merging improvements (PR middle-end/22141)
- Authentication-results: sourceware.org; auth=none
- References: <20171027183731.GJ14653@tucnak>
On Fri, 27 Oct 2017, Jakub Jelinek wrote:
> Hi!
>
> The following patch attempts to improve store merging, for the time being
> it still only optimizes constant stores to adjacent memory.
>
> The biggest improvement is handling bitfields, it uses the get_bit_range
> helper to find the bounds of what can be modified when modifying the bitfield
> and instead of requiring all the stores to be adjacent it now only requires
> that their bitregion_* regions are adjacent. If get_bit_range fails (e.g. for
> non-C/C++ languages), then it still rounds the boundaries down and up to whole
> bytes, as any kind of changes within the byte affect the rest. At the end,
> if there are any gaps in between the stored values, the old value is loaded from
> memory (had to set TREE_NO_WARNING on it, so that uninit doesn't complain)
> masked with mask, ored with the constant maked with negation of mask and stored,
> pretty much what the expansion emits. As incremental improvement, perhaps we
> could emit in some cases where all the stored bitfields in one load/store set
> are adjacent a BIT_INSERT_EXPR instead of doing the and/or.
>
> Another improvement is related to alignment handling, previously the code
> was using get_object_alignment, which say for the store_merging_11.c testcase
> has to return 8-bit alignment, as the whole struct is 64-bit aligned,
> but the first store is 1 byte after that. The old code would then on
> targets that don't allow unaligned stores or have them slow emit just byte
> stores (many of them). The patch uses get_object_alignment_1, so that we
> get both the maximum known alignment and misalign, and computes alignment
> for that for every bitpos we try, such that for stores starting with
> 1 byte after 64-bit alignment we get 1 byte store, then 2 byte and then 4 byte
> and then 8 byte.
>
> Another improvement is for targets that allow unaligned stores, the new
> code performs a dry run on split_group and if it determines that aligned
> stores are as many or fewer as unaligned stores, it prefers aligned stores.
> E.g. for the case in store_merging_11.c, where ptr is 64-bit aligned and
> we store 15 bytes, unaligned stores and unpatched gcc would choose to
> do an 8 byte, then 4 byte, then 2 byte and then one byte store.
> Aligned stores, 1, 2, 4, 8 in that order are also 4, so it is better
> to do those.
>
> The patch also attempts to reuse original stores (well, just their lhs/rhs1),
> if we choose a split store that has a single original insn in it. That way
> we don't lose ARRAY_REFs/COMPONENT_REFs etc. unnecessarily. Furthermore, if
> there is a larger original store than the maximum we try (wordsize), e.g. when
> there is originally 8 byte long long store followed by 1 byte store followed by
> 1 byte store, on 32-bit targets we'd previously try to split it into 4 byte
> store, 4 byte store and 2 byte store, figure out that is 3 stores like previously
> and give up. With the patch, if we see a single original larger store at the
> bitpos we want, we just reuse that store, so we get in that case an 8 byte
> original store (lhs/rhs1) followed by 2 byte store.
>
> In find_constituent_stmts it optimizes by not walking unnecessarily group->stores
> entries that are already known to be before the bitpos we ask. It fixes the comparisons
> which were off by one, so previously it often chose more original stores than were really
> in the split store.
>
> Another change is that output_merged_store used to emit the new stores into a sequence
> and if it found out there are too many, released all ssa names and failed.
> That seems to be unnecessary to me, because we know before entering the loop
> how many split stores there are, so can just fail at that point and so only start
> emitting something when we have decided to do the replacement.
>
> I had to disable store merging in the g++.dg/pr71694.C testcase, but that is just
> because the testcase doesn't test what it should. In my understanding, it wants to
> verify that the c.c store isn't using 32-bit RMW, because it would create data race
> for c.d. But it stores both c.c and c.d next to each other, so even when c.c's
> bitregion is the first 3 bytes and c.d's bitregion is the following byte, we are
> then touching bytes in both of the regions and thus a RMW cycle for the whole
> 32-bit word is fine, as c.d is written, it will store the new value and ignore the
> old value of the c.d byte. What is wrong, but store merging doesn't emit, is what
> we emitted before, i.e. 32-bit RMW that just stored c.c, followed by c.d store.
> Another thread could have stored value1 into c.d, we'd R it by 32-bit read, modify,
> while another thread stored value2 into c.d, then we W the 32-bit word and thus
> introduce a value1 store into c.d, then another thread reads it and finds
> a value1 instead of expected value2. Finally we store into c.d value3.
> So, alternative to -fno-store-merging in the testcase would be probably separate
> functions where one would store to c.c and another one to c.d, then we can make
> sure neither store is using movl. Though, it probably still should only look at
> movl stores or loads, other moves are fine.
>
> The included store_merging_10.c improves on x86_64 from:
> movzbl (%rdi), %eax
> - andl $-19, %eax
> + andl $-32, %eax
> orl $13, %eax
> movb %al, (%rdi)
> in foo and
> - orb $1, (%rdi)
> movl (%rdi), %eax
> - andl $-131071, %eax
> + andl $2147352576, %eax
> + orl $1, %eax
> movl %eax, (%rdi)
> - shrl $24, %eax
> - andl $127, %eax
> - movb %al, 3(%rdi)
> in bar. foo is something combine.c managed to optimize too, but bar it couldn't.
> In store_merging_11.c on x86_64, bar is the same and foo changed:
> - movabsq $578437695752115969, %rax
> - movl $0, 9(%rdi)
> - movb $0, 15(%rdi)
> - movq %rax, 1(%rdi)
> - xorl %eax, %eax
> - movw %ax, 13(%rdi)
> + movl $23, %eax
> + movb $1, 1(%rdi)
> + movl $117835012, 4(%rdi)
> + movw %ax, 2(%rdi)
> + movq $8, 8(%rdi)
> which is not only shorter, but all the stores are aligned.
> On ppc64le in store_merging_10.c the difference is:
> - lwz 9,0(3)
> + lbz 9,0(3)
> rlwinm 9,9,0,0,26
> ori 9,9,0xd
> - stw 9,0(3)
> + stb 9,0(3)
> in foo and
> lwz 9,0(3)
> + rlwinm 9,9,0,1,14
> ori 9,9,0x1
> - rlwinm 9,9,0,31,14
> - rlwinm 9,9,0,1,31
> stw 9,0(3)
> in bar, and store_merging_11.c the difference is:
> - lis 8,0x807
> - li 9,0
> - ori 8,8,0x605
> - li 10,0
> - sldi 8,8,32
> - stw 9,9(3)
> - sth 9,13(3)
> - oris 8,8,0x400
> - stb 10,15(3)
> - ori 8,8,0x1701
> - mtvsrd 0,8
> - stfd 0,1(3)
> + lis 9,0x706
> + li 7,1
> + li 8,23
> + ori 9,9,0x504
> + li 10,8
> + stb 7,1(3)
> + sth 8,2(3)
> + stw 9,4(3)
> + std 10,8(3)
> in foo and no changes in bar.
>
> What the patch doesn't implement yet, but could be also possible for
> allow_unaligned case is in store_merging_11.c when we are storing 15 bytes
> store 8 bytes at offset 1 and 8 bytes at offset 8 (i.e. create two
> overlapping stores, in this case one aligned and one unaligned).
>
> Bootstrapped/regtested on x86_64-linux, i686-linux and powerpc64le-linux,
> ok for trunk?
Ok.
Thanks,
Richard.
> 2017-10-27 Jakub Jelinek <jakub@redhat.com>
>
> PR middle-end/22141
> * gimple-ssa-store-merging.c: Include rtl.h and expr.h.
> (struct store_immediate_info): Add bitregion_start and bitregion_end
> fields.
> (store_immediate_info::store_immediate_info): Add brs and bre
> arguments and initialize bitregion_{start,end} from those.
> (struct merged_store_group): Add bitregion_start, bitregion_end,
> align_base and mask fields. Drop unnecessary struct keyword from
> struct store_immediate_info. Add do_merge method.
> (clear_bit_region_be): Use memset instead of loop storing zeros.
> (merged_store_group::do_merge): New method.
> (merged_store_group::merge_into): Use do_merge. Allow gaps in between
> stores as long as the surrounding bitregions have no gaps.
> (merged_store_group::merge_overlapping): Use do_merge.
> (merged_store_group::apply_stores): Test that bitregion_{start,end}
> is byte aligned, rather than requiring that start and width are
> byte aligned. Drop unnecessary struct keyword from
> struct store_immediate_info. Allocate and populate also mask array.
> Make start of the arrays relative to bitregion_start rather than
> start and size them according to bitregion_{end,start} difference.
> (struct imm_store_chain_info): Drop unnecessary struct keyword from
> struct store_immediate_info.
> (pass_store_merging::gate): Punt if BITS_PER_UNIT or CHAR_BIT is not 8.
> (pass_store_merging::terminate_all_aliasing_chains): Drop unnecessary
> struct keyword from struct store_immediate_info.
> (imm_store_chain_info::coalesce_immediate_stores): Allow gaps in
> between stores as long as the surrounding bitregions have no gaps.
> Formatting fixes.
> (struct split_store): Add orig non-static data member.
> (split_store::split_store): Initialize orig to false.
> (find_constituent_stmts): Return store_immediate_info *, non-NULL
> if there is exactly a single original stmt. Change stmts argument
> to pointer from reference, if NULL, don't push anything to it. Add
> first argument, use it to optimize skipping over orig stmts that
> are known to be before bitpos already. Simplify.
> (split_group): Return unsigned int count how many stores are or
> would be needed rather than a bool. Add allow_unaligned argument.
> Change split_stores argument from reference to pointer, if NULL,
> only do a dry run computing how many stores would be produced.
> Rewritten algorithm to use both alignment and misalign if
> !allow_unaligned and handle bitfield stores with gaps.
> (imm_store_chain_info::output_merged_store): Set start_byte_pos
> from bitregion_start instead of start. Compute allow_unaligned
> here, if true, do 2 split_group dry runs to compute which one
> produces fewer stores and prefer aligned if equal. Punt if
> new count is bigger or equal than original before emitting any
> statements, rather than during that. Remove no longer needed
> new_ssa_names tracking. Replace num_stmts with
> split_stores.length (). Use 32-bit stack allocated entries
> in split_stores auto_vec. Try to reuse original store lhs/rhs1
> if possible. Handle bitfields with gaps.
> (pass_store_merging::execute): Ignore bitsize == 0 stores.
> Compute bitregion_{start,end} for the stores and construct
> store_immediate_info with that. Formatting fixes.
>
> * gcc.dg/store_merging_10.c: New test.
> * gcc.dg/store_merging_11.c: New test.
> * gcc.dg/store_merging_12.c: New test.
> * g++.dg/pr71694.C: Add -fno-store-merging to dg-options.
>
> --- gcc/gimple-ssa-store-merging.c.jj 2017-10-27 14:16:26.074249585 +0200
> +++ gcc/gimple-ssa-store-merging.c 2017-10-27 18:10:18.212762773 +0200
> @@ -126,6 +126,8 @@
> #include "tree-eh.h"
> #include "target.h"
> #include "gimplify-me.h"
> +#include "rtl.h"
> +#include "expr.h" /* For get_bit_range. */
> #include "selftest.h"
>
> /* The maximum size (in bits) of the stores this pass should generate. */
> @@ -142,17 +144,24 @@ struct store_immediate_info
> {
> unsigned HOST_WIDE_INT bitsize;
> unsigned HOST_WIDE_INT bitpos;
> + unsigned HOST_WIDE_INT bitregion_start;
> + /* This is one past the last bit of the bit region. */
> + unsigned HOST_WIDE_INT bitregion_end;
> gimple *stmt;
> unsigned int order;
> store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> + unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> gimple *, unsigned int);
> };
>
> store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
> unsigned HOST_WIDE_INT bp,
> + unsigned HOST_WIDE_INT brs,
> + unsigned HOST_WIDE_INT bre,
> gimple *st,
> unsigned int ord)
> - : bitsize (bs), bitpos (bp), stmt (st), order (ord)
> + : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
> + stmt (st), order (ord)
> {
> }
>
> @@ -164,26 +173,32 @@ struct merged_store_group
> {
> unsigned HOST_WIDE_INT start;
> unsigned HOST_WIDE_INT width;
> - /* The size of the allocated memory for val. */
> + unsigned HOST_WIDE_INT bitregion_start;
> + unsigned HOST_WIDE_INT bitregion_end;
> + /* The size of the allocated memory for val and mask. */
> unsigned HOST_WIDE_INT buf_size;
> + unsigned HOST_WIDE_INT align_base;
>
> unsigned int align;
> unsigned int first_order;
> unsigned int last_order;
>
> - auto_vec<struct store_immediate_info *> stores;
> + auto_vec<store_immediate_info *> stores;
> /* We record the first and last original statements in the sequence because
> we'll need their vuse/vdef and replacement position. It's easier to keep
> track of them separately as 'stores' is reordered by apply_stores. */
> gimple *last_stmt;
> gimple *first_stmt;
> unsigned char *val;
> + unsigned char *mask;
>
> merged_store_group (store_immediate_info *);
> ~merged_store_group ();
> void merge_into (store_immediate_info *);
> void merge_overlapping (store_immediate_info *);
> bool apply_stores ();
> +private:
> + void do_merge (store_immediate_info *);
> };
>
> /* Debug helper. Dump LEN elements of byte array PTR to FD in hex. */
> @@ -287,8 +302,7 @@ clear_bit_region_be (unsigned char *ptr,
> && len > BITS_PER_UNIT)
> {
> unsigned int nbytes = len / BITS_PER_UNIT;
> - for (unsigned int i = 0; i < nbytes; i++)
> - ptr[i] = 0U;
> + memset (ptr, 0, nbytes);
> if (len % BITS_PER_UNIT != 0)
> clear_bit_region_be (ptr + nbytes, BITS_PER_UNIT - 1,
> len % BITS_PER_UNIT);
> @@ -549,10 +563,16 @@ merged_store_group::merged_store_group (
> {
> start = info->bitpos;
> width = info->bitsize;
> + bitregion_start = info->bitregion_start;
> + bitregion_end = info->bitregion_end;
> /* VAL has memory allocated for it in apply_stores once the group
> width has been finalized. */
> val = NULL;
> - align = get_object_alignment (gimple_assign_lhs (info->stmt));
> + mask = NULL;
> + unsigned HOST_WIDE_INT align_bitpos = 0;
> + get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> + &align, &align_bitpos);
> + align_base = start - align_bitpos;
> stores.create (1);
> stores.safe_push (info);
> last_stmt = info->stmt;
> @@ -568,18 +588,24 @@ merged_store_group::~merged_store_group
> XDELETEVEC (val);
> }
>
> -/* Merge a store recorded by INFO into this merged store.
> - The store is not overlapping with the existing recorded
> - stores. */
> -
> +/* Helper method for merge_into and merge_overlapping to do
> + the common part. */
> void
> -merged_store_group::merge_into (store_immediate_info *info)
> +merged_store_group::do_merge (store_immediate_info *info)
> {
> - unsigned HOST_WIDE_INT wid = info->bitsize;
> - /* Make sure we're inserting in the position we think we're inserting. */
> - gcc_assert (info->bitpos == start + width);
> + bitregion_start = MIN (bitregion_start, info->bitregion_start);
> + bitregion_end = MAX (bitregion_end, info->bitregion_end);
> +
> + unsigned int this_align;
> + unsigned HOST_WIDE_INT align_bitpos = 0;
> + get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> + &this_align, &align_bitpos);
> + if (this_align > align)
> + {
> + align = this_align;
> + align_base = info->bitpos - align_bitpos;
> + }
>
> - width += wid;
> gimple *stmt = info->stmt;
> stores.safe_push (info);
> if (info->order > last_order)
> @@ -594,6 +620,22 @@ merged_store_group::merge_into (store_im
> }
> }
>
> +/* Merge a store recorded by INFO into this merged store.
> + The store is not overlapping with the existing recorded
> + stores. */
> +
> +void
> +merged_store_group::merge_into (store_immediate_info *info)
> +{
> + unsigned HOST_WIDE_INT wid = info->bitsize;
> + /* Make sure we're inserting in the position we think we're inserting. */
> + gcc_assert (info->bitpos >= start + width
> + && info->bitregion_start <= bitregion_end);
> +
> + width += wid;
> + do_merge (info);
> +}
> +
> /* Merge a store described by INFO into this merged store.
> INFO overlaps in some way with the current store (i.e. it's not contiguous
> which is handled by merged_store_group::merge_into). */
> @@ -601,23 +643,11 @@ merged_store_group::merge_into (store_im
> void
> merged_store_group::merge_overlapping (store_immediate_info *info)
> {
> - gimple *stmt = info->stmt;
> - stores.safe_push (info);
> -
> /* If the store extends the size of the group, extend the width. */
> - if ((info->bitpos + info->bitsize) > (start + width))
> + if (info->bitpos + info->bitsize > start + width)
> width += info->bitpos + info->bitsize - (start + width);
>
> - if (info->order > last_order)
> - {
> - last_order = info->order;
> - last_stmt = stmt;
> - }
> - else if (info->order < first_order)
> - {
> - first_order = info->order;
> - first_stmt = stmt;
> - }
> + do_merge (info);
> }
>
> /* Go through all the recorded stores in this group in program order and
> @@ -627,27 +657,28 @@ merged_store_group::merge_overlapping (s
> bool
> merged_store_group::apply_stores ()
> {
> - /* The total width of the stores must add up to a whole number of bytes
> - and start at a byte boundary. We don't support emitting bitfield
> - references for now. Also, make sure we have more than one store
> - in the group, otherwise we cannot merge anything. */
> - if (width % BITS_PER_UNIT != 0
> - || start % BITS_PER_UNIT != 0
> + /* Make sure we have more than one store in the group, otherwise we cannot
> + merge anything. */
> + if (bitregion_start % BITS_PER_UNIT != 0
> + || bitregion_end % BITS_PER_UNIT != 0
> || stores.length () == 1)
> return false;
>
> stores.qsort (sort_by_order);
> - struct store_immediate_info *info;
> + store_immediate_info *info;
> unsigned int i;
> /* Create a buffer of a size that is 2 times the number of bytes we're
> storing. That way native_encode_expr can write power-of-2-sized
> chunks without overrunning. */
> - buf_size = 2 * (ROUND_UP (width, BITS_PER_UNIT) / BITS_PER_UNIT);
> - val = XCNEWVEC (unsigned char, buf_size);
> + buf_size = 2 * ((bitregion_end - bitregion_start) / BITS_PER_UNIT);
> + val = XNEWVEC (unsigned char, 2 * buf_size);
> + mask = val + buf_size;
> + memset (val, 0, buf_size);
> + memset (mask, ~0U, buf_size);
>
> FOR_EACH_VEC_ELT (stores, i, info)
> {
> - unsigned int pos_in_buffer = info->bitpos - start;
> + unsigned int pos_in_buffer = info->bitpos - bitregion_start;
> bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt),
> val, info->bitsize,
> pos_in_buffer, buf_size);
> @@ -668,6 +699,11 @@ merged_store_group::apply_stores ()
> }
> if (!ret)
> return false;
> + unsigned char *m = mask + (pos_in_buffer / BITS_PER_UNIT);
> + if (BYTES_BIG_ENDIAN)
> + clear_bit_region_be (m, pos_in_buffer % BITS_PER_UNIT, info->bitsize);
> + else
> + clear_bit_region (m, pos_in_buffer % BITS_PER_UNIT, info->bitsize);
> }
> return true;
> }
> @@ -682,7 +718,7 @@ struct imm_store_chain_info
> See pass_store_merging::m_stores_head for more rationale. */
> imm_store_chain_info *next, **pnxp;
> tree base_addr;
> - auto_vec<struct store_immediate_info *> m_store_info;
> + auto_vec<store_immediate_info *> m_store_info;
> auto_vec<merged_store_group *> m_merged_store_groups;
>
> imm_store_chain_info (imm_store_chain_info *&inspt, tree b_a)
> @@ -730,11 +766,16 @@ public:
> {
> }
>
> - /* Pass not supported for PDP-endianness. */
> + /* Pass not supported for PDP-endianness, nor for insane hosts
> + or target character sizes where native_{encode,interpret}_expr
> + doesn't work properly. */
> virtual bool
> gate (function *)
> {
> - return flag_store_merging && (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
> + return flag_store_merging
> + && WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN
> + && CHAR_BIT == 8
> + && BITS_PER_UNIT == 8;
> }
>
> virtual unsigned int execute (function *);
> @@ -811,7 +852,7 @@ pass_store_merging::terminate_all_aliasi
> aliases with any of them. */
> else
> {
> - struct store_immediate_info *info;
> + store_immediate_info *info;
> unsigned int i;
> FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
> {
> @@ -926,8 +967,9 @@ imm_store_chain_info::coalesce_immediate
> }
>
> /* |---store 1---| <gap> |---store 2---|.
> - Gap between stores. Start a new group. */
> - if (start != merged_store->start + merged_store->width)
> + Gap between stores. Start a new group if there are any gaps
> + between bitregions. */
> + if (info->bitregion_start > merged_store->bitregion_end)
> {
> /* Try to apply all the stores recorded for the group to determine
> the bitpattern they write and discard it if that fails.
> @@ -948,11 +990,11 @@ imm_store_chain_info::coalesce_immediate
> merged_store->merge_into (info);
> }
>
> - /* Record or discard the last store group. */
> - if (!merged_store->apply_stores ())
> - delete merged_store;
> - else
> - m_merged_store_groups.safe_push (merged_store);
> + /* Record or discard the last store group. */
> + if (!merged_store->apply_stores ())
> + delete merged_store;
> + else
> + m_merged_store_groups.safe_push (merged_store);
>
> gcc_assert (m_merged_store_groups.length () <= m_store_info.length ());
> bool success
> @@ -961,8 +1003,8 @@ imm_store_chain_info::coalesce_immediate
>
> if (success && dump_file)
> fprintf (dump_file, "Coalescing successful!\n"
> - "Merged into %u stores\n",
> - m_merged_store_groups.length ());
> + "Merged into %u stores\n",
> + m_merged_store_groups.length ());
>
> return success;
> }
> @@ -1016,6 +1058,8 @@ struct split_store
> unsigned HOST_WIDE_INT size;
> unsigned HOST_WIDE_INT align;
> auto_vec<gimple *> orig_stmts;
> + /* True if there is a single orig stmt covering the whole split store. */
> + bool orig;
> split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> unsigned HOST_WIDE_INT);
> };
> @@ -1025,100 +1069,198 @@ struct split_store
> split_store::split_store (unsigned HOST_WIDE_INT bp,
> unsigned HOST_WIDE_INT sz,
> unsigned HOST_WIDE_INT al)
> - : bytepos (bp), size (sz), align (al)
> + : bytepos (bp), size (sz), align (al), orig (false)
> {
> orig_stmts.create (0);
> }
>
> /* Record all statements corresponding to stores in GROUP that write to
> the region starting at BITPOS and is of size BITSIZE. Record such
> - statements in STMTS. The stores in GROUP must be sorted by
> - bitposition. */
> + statements in STMTS if non-NULL. The stores in GROUP must be sorted by
> + bitposition. Return INFO if there is exactly one original store
> + in the range. */
>
> -static void
> +static store_immediate_info *
> find_constituent_stmts (struct merged_store_group *group,
> - auto_vec<gimple *> &stmts,
> - unsigned HOST_WIDE_INT bitpos,
> - unsigned HOST_WIDE_INT bitsize)
> + vec<gimple *> *stmts,
> + unsigned int *first,
> + unsigned HOST_WIDE_INT bitpos,
> + unsigned HOST_WIDE_INT bitsize)
> {
> - struct store_immediate_info *info;
> + store_immediate_info *info, *ret = NULL;
> unsigned int i;
> + bool second = false;
> + bool update_first = true;
> unsigned HOST_WIDE_INT end = bitpos + bitsize;
> - FOR_EACH_VEC_ELT (group->stores, i, info)
> + for (i = *first; group->stores.iterate (i, &info); ++i)
> {
> unsigned HOST_WIDE_INT stmt_start = info->bitpos;
> unsigned HOST_WIDE_INT stmt_end = stmt_start + info->bitsize;
> - if (stmt_end < bitpos)
> - continue;
> + if (stmt_end <= bitpos)
> + {
> + /* BITPOS passed to this function never decreases from within the
> + same split_group call, so optimize and don't scan info records
> + which are known to end before or at BITPOS next time.
> + Only do it if all stores before this one also pass this. */
> + if (update_first)
> + *first = i + 1;
> + continue;
> + }
> + else
> + update_first = false;
> +
> /* The stores in GROUP are ordered by bitposition so if we're past
> - the region for this group return early. */
> - if (stmt_start > end)
> - return;
> -
> - if (IN_RANGE (stmt_start, bitpos, bitpos + bitsize)
> - || IN_RANGE (stmt_end, bitpos, end)
> - /* The statement writes a region that completely encloses the region
> - that this group writes. Unlikely to occur but let's
> - handle it. */
> - || IN_RANGE (bitpos, stmt_start, stmt_end))
> - stmts.safe_push (info->stmt);
> + the region for this group return early. */
> + if (stmt_start >= end)
> + return ret;
> +
> + if (stmts)
> + {
> + stmts->safe_push (info->stmt);
> + if (ret)
> + {
> + ret = NULL;
> + second = true;
> + }
> + }
> + else if (ret)
> + return NULL;
> + if (!second)
> + ret = info;
> }
> + return ret;
> }
>
> /* Split a merged store described by GROUP by populating the SPLIT_STORES
> - vector with split_store structs describing the byte offset (from the base),
> - the bit size and alignment of each store as well as the original statements
> - involved in each such split group.
> + vector (if non-NULL) with split_store structs describing the byte offset
> + (from the base), the bit size and alignment of each store as well as the
> + original statements involved in each such split group.
> This is to separate the splitting strategy from the statement
> building/emission/linking done in output_merged_store.
> - At the moment just start with the widest possible size and keep emitting
> - the widest we can until we have emitted all the bytes, halving the size
> - when appropriate. */
> -
> -static bool
> -split_group (merged_store_group *group,
> - auto_vec<struct split_store *> &split_stores)
> + Return number of new stores.
> + If SPLIT_STORES is NULL, it is just a dry run to count number of
> + new stores. */
> +
> +static unsigned int
> +split_group (merged_store_group *group, bool allow_unaligned,
> + vec<struct split_store *> *split_stores)
> {
> - unsigned HOST_WIDE_INT pos = group->start;
> - unsigned HOST_WIDE_INT size = group->width;
> + unsigned HOST_WIDE_INT pos = group->bitregion_start;
> + unsigned HOST_WIDE_INT size = group->bitregion_end - pos;
> unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT;
> - unsigned HOST_WIDE_INT align = group->align;
> + unsigned HOST_WIDE_INT group_align = group->align;
> + unsigned HOST_WIDE_INT align_base = group->align_base;
>
> - /* We don't handle partial bitfields for now. We shouldn't have
> - reached this far. */
> gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
>
> - bool allow_unaligned
> - = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
> -
> - unsigned int try_size = MAX_STORE_BITSIZE;
> - while (try_size > size
> - || (!allow_unaligned
> - && try_size > align))
> - {
> - try_size /= 2;
> - if (try_size < BITS_PER_UNIT)
> - return false;
> - }
> -
> + unsigned int ret = 0, first = 0;
> unsigned HOST_WIDE_INT try_pos = bytepos;
> group->stores.qsort (sort_by_bitpos);
>
> while (size > 0)
> {
> - struct split_store *store = new split_store (try_pos, try_size, align);
> + if ((allow_unaligned || group_align <= BITS_PER_UNIT)
> + && group->mask[try_pos - bytepos] == (unsigned char) ~0U)
> + {
> + /* Skip padding bytes. */
> + ++try_pos;
> + size -= BITS_PER_UNIT;
> + continue;
> + }
> +
> unsigned HOST_WIDE_INT try_bitpos = try_pos * BITS_PER_UNIT;
> - find_constituent_stmts (group, store->orig_stmts, try_bitpos, try_size);
> - split_stores.safe_push (store);
> + unsigned int try_size = MAX_STORE_BITSIZE, nonmasked;
> + unsigned HOST_WIDE_INT align_bitpos
> + = (try_bitpos - align_base) & (group_align - 1);
> + unsigned HOST_WIDE_INT align = group_align;
> + if (align_bitpos)
> + align = least_bit_hwi (align_bitpos);
> + if (!allow_unaligned)
> + try_size = MIN (try_size, align);
> + store_immediate_info *info
> + = find_constituent_stmts (group, NULL, &first, try_bitpos, try_size);
> + if (info)
> + {
> + /* If there is just one original statement for the range, see if
> + we can just reuse the original store which could be even larger
> + than try_size. */
> + unsigned HOST_WIDE_INT stmt_end
> + = ROUND_UP (info->bitpos + info->bitsize, BITS_PER_UNIT);
> + info = find_constituent_stmts (group, NULL, &first, try_bitpos,
> + stmt_end - try_bitpos);
> + if (info && info->bitpos >= try_bitpos)
> + {
> + try_size = stmt_end - try_bitpos;
> + goto found;
> + }
> + }
>
> - try_pos += try_size / BITS_PER_UNIT;
> + /* Approximate store bitsize for the case when there are no padding
> + bits. */
> + while (try_size > size)
> + try_size /= 2;
> + /* Now look for whole padding bytes at the end of that bitsize. */
> + for (nonmasked = try_size / BITS_PER_UNIT; nonmasked > 0; --nonmasked)
> + if (group->mask[try_pos - bytepos + nonmasked - 1]
> + != (unsigned char) ~0U)
> + break;
> + if (nonmasked == 0)
> + {
> + /* If entire try_size range is padding, skip it. */
> + try_pos += try_size / BITS_PER_UNIT;
> + size -= try_size;
> + continue;
> + }
> + /* Otherwise try to decrease try_size if second half, last 3 quarters
> + etc. are padding. */
> + nonmasked *= BITS_PER_UNIT;
> + while (nonmasked <= try_size / 2)
> + try_size /= 2;
> + if (!allow_unaligned && group_align > BITS_PER_UNIT)
> + {
> + /* Now look for whole padding bytes at the start of that bitsize. */
> + unsigned int try_bytesize = try_size / BITS_PER_UNIT, masked;
> + for (masked = 0; masked < try_bytesize; ++masked)
> + if (group->mask[try_pos - bytepos + masked] != (unsigned char) ~0U)
> + break;
> + masked *= BITS_PER_UNIT;
> + gcc_assert (masked < try_size);
> + if (masked >= try_size / 2)
> + {
> + while (masked >= try_size / 2)
> + {
> + try_size /= 2;
> + try_pos += try_size / BITS_PER_UNIT;
> + size -= try_size;
> + masked -= try_size;
> + }
> + /* Need to recompute the alignment, so just retry at the new
> + position. */
> + continue;
> + }
> + }
>
> + found:
> + ++ret;
> +
> + if (split_stores)
> + {
> + struct split_store *store
> + = new split_store (try_pos, try_size, align);
> + info = find_constituent_stmts (group, &store->orig_stmts,
> + &first, try_bitpos, try_size);
> + if (info
> + && info->bitpos >= try_bitpos
> + && info->bitpos + info->bitsize <= try_bitpos + try_size)
> + store->orig = true;
> + split_stores->safe_push (store);
> + }
> +
> + try_pos += try_size / BITS_PER_UNIT;
> size -= try_size;
> - align = try_size;
> - while (size < try_size)
> - try_size /= 2;
> }
> - return true;
> +
> + return ret;
> }
>
> /* Given a merged store group GROUP output the widened version of it.
> @@ -1132,31 +1274,50 @@ split_group (merged_store_group *group,
> bool
> imm_store_chain_info::output_merged_store (merged_store_group *group)
> {
> - unsigned HOST_WIDE_INT start_byte_pos = group->start / BITS_PER_UNIT;
> + unsigned HOST_WIDE_INT start_byte_pos
> + = group->bitregion_start / BITS_PER_UNIT;
>
> unsigned int orig_num_stmts = group->stores.length ();
> if (orig_num_stmts < 2)
> return false;
>
> - auto_vec<struct split_store *> split_stores;
> + auto_vec<struct split_store *, 32> split_stores;
> split_stores.create (0);
> - if (!split_group (group, split_stores))
> - return false;
> + bool allow_unaligned
> + = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
> + if (allow_unaligned)
> + {
> + /* If unaligned stores are allowed, see how many stores we'd emit
> + for unaligned and how many stores we'd emit for aligned stores.
> + Only use unaligned stores if it allows fewer stores than aligned. */
> + unsigned aligned_cnt = split_group (group, false, NULL);
> + unsigned unaligned_cnt = split_group (group, true, NULL);
> + if (aligned_cnt <= unaligned_cnt)
> + allow_unaligned = false;
> + }
> + split_group (group, allow_unaligned, &split_stores);
> +
> + if (split_stores.length () >= orig_num_stmts)
> + {
> + /* We didn't manage to reduce the number of statements. Bail out. */
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Exceeded original number of stmts (%u)."
> + " Not profitable to emit new sequence.\n",
> + orig_num_stmts);
> + }
> + return false;
> + }
>
> gimple_stmt_iterator last_gsi = gsi_for_stmt (group->last_stmt);
> gimple_seq seq = NULL;
> - unsigned int num_stmts = 0;
> tree last_vdef, new_vuse;
> last_vdef = gimple_vdef (group->last_stmt);
> new_vuse = gimple_vuse (group->last_stmt);
>
> gimple *stmt = NULL;
> - /* The new SSA names created. Keep track of them so that we can free them
> - if we decide to not use the new sequence. */
> - auto_vec<tree> new_ssa_names;
> split_store *split_store;
> unsigned int i;
> - bool fail = false;
>
> tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &seq,
> is_gimple_mem_ref_addr, NULL_TREE);
> @@ -1165,48 +1326,76 @@ imm_store_chain_info::output_merged_stor
> unsigned HOST_WIDE_INT try_size = split_store->size;
> unsigned HOST_WIDE_INT try_pos = split_store->bytepos;
> unsigned HOST_WIDE_INT align = split_store->align;
> - tree offset_type = get_alias_type_for_stmts (split_store->orig_stmts);
> - location_t loc = get_location_for_stmts (split_store->orig_stmts);
> -
> - tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
> - int_type = build_aligned_type (int_type, align);
> - tree dest = fold_build2 (MEM_REF, int_type, addr,
> - build_int_cst (offset_type, try_pos));
> -
> - tree src = native_interpret_expr (int_type,
> - group->val + try_pos - start_byte_pos,
> - group->buf_size);
> + tree dest, src;
> + location_t loc;
> + if (split_store->orig)
> + {
> + /* If there is just a single constituent store which covers
> + the whole area, just reuse the lhs and rhs. */
> + dest = gimple_assign_lhs (split_store->orig_stmts[0]);
> + src = gimple_assign_rhs1 (split_store->orig_stmts[0]);
> + loc = gimple_location (split_store->orig_stmts[0]);
> + }
> + else
> + {
> + tree offset_type
> + = get_alias_type_for_stmts (split_store->orig_stmts);
> + loc = get_location_for_stmts (split_store->orig_stmts);
> +
> + tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
> + int_type = build_aligned_type (int_type, align);
> + dest = fold_build2 (MEM_REF, int_type, addr,
> + build_int_cst (offset_type, try_pos));
> + src = native_interpret_expr (int_type,
> + group->val + try_pos - start_byte_pos,
> + group->buf_size);
> + tree mask
> + = native_interpret_expr (int_type,
> + group->mask + try_pos - start_byte_pos,
> + group->buf_size);
> + if (!integer_zerop (mask))
> + {
> + tree tem = make_ssa_name (int_type);
> + tree load_src = unshare_expr (dest);
> + /* The load might load some or all bits uninitialized,
> + avoid -W*uninitialized warnings in that case.
> + As optimization, it would be nice if all the bits are
> + provably uninitialized (no stores at all yet or previous
> + store a CLOBBER) we'd optimize away the load and replace
> + it e.g. with 0. */
> + TREE_NO_WARNING (load_src) = 1;
> + stmt = gimple_build_assign (tem, load_src);
> + gimple_set_location (stmt, loc);
> + gimple_set_vuse (stmt, new_vuse);
> + gimple_seq_add_stmt_without_update (&seq, stmt);
> +
> + /* FIXME: If there is a single chunk of zero bits in mask,
> + perhaps use BIT_INSERT_EXPR instead? */
> + stmt = gimple_build_assign (make_ssa_name (int_type),
> + BIT_AND_EXPR, tem, mask);
> + gimple_set_location (stmt, loc);
> + gimple_seq_add_stmt_without_update (&seq, stmt);
> + tem = gimple_assign_lhs (stmt);
> +
> + src = wide_int_to_tree (int_type,
> + wi::bit_and_not (wi::to_wide (src),
> + wi::to_wide (mask)));
> + stmt = gimple_build_assign (make_ssa_name (int_type),
> + BIT_IOR_EXPR, tem, src);
> + gimple_set_location (stmt, loc);
> + gimple_seq_add_stmt_without_update (&seq, stmt);
> + src = gimple_assign_lhs (stmt);
> + }
> + }
>
> stmt = gimple_build_assign (dest, src);
> gimple_set_location (stmt, loc);
> gimple_set_vuse (stmt, new_vuse);
> gimple_seq_add_stmt_without_update (&seq, stmt);
>
> - /* We didn't manage to reduce the number of statements. Bail out. */
> - if (++num_stmts == orig_num_stmts)
> - {
> - if (dump_file && (dump_flags & TDF_DETAILS))
> - {
> - fprintf (dump_file, "Exceeded original number of stmts (%u)."
> - " Not profitable to emit new sequence.\n",
> - orig_num_stmts);
> - }
> - unsigned int ssa_count;
> - tree ssa_name;
> - /* Don't forget to cleanup the temporary SSA names. */
> - FOR_EACH_VEC_ELT (new_ssa_names, ssa_count, ssa_name)
> - release_ssa_name (ssa_name);
> -
> - fail = true;
> - break;
> - }
> -
> tree new_vdef;
> if (i < split_stores.length () - 1)
> - {
> - new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
> - new_ssa_names.safe_push (new_vdef);
> - }
> + new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
> else
> new_vdef = last_vdef;
>
> @@ -1218,15 +1407,12 @@ imm_store_chain_info::output_merged_stor
> FOR_EACH_VEC_ELT (split_stores, i, split_store)
> delete split_store;
>
> - if (fail)
> - return false;
> -
> gcc_assert (seq);
> if (dump_file)
> {
> fprintf (dump_file,
> "New sequence of %u stmts to replace old one of %u stmts\n",
> - num_stmts, orig_num_stmts);
> + split_stores.length (), orig_num_stmts);
> if (dump_flags & TDF_DETAILS)
> print_gimple_seq (dump_file, seq, 0, TDF_VOPS | TDF_MEMSYMS);
> }
> @@ -1387,12 +1573,25 @@ pass_store_merging::execute (function *f
> tree rhs = gimple_assign_rhs1 (stmt);
>
> HOST_WIDE_INT bitsize, bitpos;
> + unsigned HOST_WIDE_INT bitregion_start = 0;
> + unsigned HOST_WIDE_INT bitregion_end = 0;
> machine_mode mode;
> int unsignedp = 0, reversep = 0, volatilep = 0;
> tree offset, base_addr;
> base_addr
> = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
> &unsignedp, &reversep, &volatilep);
> + if (TREE_CODE (lhs) == COMPONENT_REF
> + && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1)))
> + {
> + get_bit_range (&bitregion_start, &bitregion_end, lhs,
> + &bitpos, &offset);
> + if (bitregion_end)
> + ++bitregion_end;
> + }
> + if (bitsize == 0)
> + continue;
> +
> /* As a future enhancement we could handle stores with the same
> base and offset. */
> bool invalid = reversep
> @@ -1414,7 +1613,26 @@ pass_store_merging::execute (function *f
> bit_off = byte_off << LOG2_BITS_PER_UNIT;
> bit_off += bitpos;
> if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
> - bitpos = bit_off.to_shwi ();
> + {
> + bitpos = bit_off.to_shwi ();
> + if (bitregion_end)
> + {
> + bit_off = byte_off << LOG2_BITS_PER_UNIT;
> + bit_off += bitregion_start;
> + if (wi::fits_uhwi_p (bit_off))
> + {
> + bitregion_start = bit_off.to_uhwi ();
> + bit_off = byte_off << LOG2_BITS_PER_UNIT;
> + bit_off += bitregion_end;
> + if (wi::fits_uhwi_p (bit_off))
> + bitregion_end = bit_off.to_uhwi ();
> + else
> + bitregion_end = 0;
> + }
> + else
> + bitregion_end = 0;
> + }
> + }
> else
> invalid = true;
> base_addr = TREE_OPERAND (base_addr, 0);
> @@ -1428,6 +1646,12 @@ pass_store_merging::execute (function *f
> base_addr = build_fold_addr_expr (base_addr);
> }
>
> + if (!bitregion_end)
> + {
> + bitregion_start = ROUND_DOWN (bitpos, BITS_PER_UNIT);
> + bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT);
> + }
> +
> if (! invalid
> && offset != NULL_TREE)
> {
> @@ -1457,9 +1681,11 @@ pass_store_merging::execute (function *f
> store_immediate_info *info;
> if (chain_info)
> {
> - info = new store_immediate_info (
> - bitsize, bitpos, stmt,
> - (*chain_info)->m_store_info.length ());
> + unsigned int ord = (*chain_info)->m_store_info.length ();
> + info = new store_immediate_info (bitsize, bitpos,
> + bitregion_start,
> + bitregion_end,
> + stmt, ord);
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> fprintf (dump_file,
> @@ -1488,6 +1714,8 @@ pass_store_merging::execute (function *f
> struct imm_store_chain_info *new_chain
> = new imm_store_chain_info (m_stores_head, base_addr);
> info = new store_immediate_info (bitsize, bitpos,
> + bitregion_start,
> + bitregion_end,
> stmt, 0);
> new_chain->m_store_info.safe_push (info);
> m_stores.put (base_addr, new_chain);
> --- gcc/testsuite/gcc.dg/store_merging_10.c.jj 2017-10-27 14:52:29.724755656 +0200
> +++ gcc/testsuite/gcc.dg/store_merging_10.c 2017-10-27 14:52:29.724755656 +0200
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target store_merge } */
> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> +
> +struct S {
> + unsigned int b1:1;
> + unsigned int b2:1;
> + unsigned int b3:1;
> + unsigned int b4:1;
> + unsigned int b5:1;
> + unsigned int b6:27;
> +};
> +
> +struct T {
> + unsigned int b1:1;
> + unsigned int b2:16;
> + unsigned int b3:14;
> + unsigned int b4:1;
> +};
> +
> +__attribute__((noipa)) void
> +foo (struct S *x)
> +{
> + x->b1 = 1;
> + x->b2 = 0;
> + x->b3 = 1;
> + x->b4 = 1;
> + x->b5 = 0;
> +}
> +
> +__attribute__((noipa)) void
> +bar (struct T *x)
> +{
> + x->b1 = 1;
> + x->b2 = 0;
> + x->b4 = 0;
> +}
> +
> +struct S s = { 0, 1, 0, 0, 1, 0x3a5f05a };
> +struct T t = { 0, 0xf5af, 0x3a5a, 1 };
> +
> +int
> +main ()
> +{
> + asm volatile ("" : : : "memory");
> + foo (&s);
> + bar (&t);
> + asm volatile ("" : : : "memory");
> + if (s.b1 != 1 || s.b2 != 0 || s.b3 != 1 || s.b4 != 1 || s.b5 != 0 || s.b6 != 0x3a5f05a)
> + __builtin_abort ();
> + if (t.b1 != 1 || t.b2 != 0 || t.b3 != 0x3a5a || t.b4 != 0)
> + __builtin_abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
> --- gcc/testsuite/gcc.dg/store_merging_11.c.jj 2017-10-27 14:52:29.725755644 +0200
> +++ gcc/testsuite/gcc.dg/store_merging_11.c 2017-10-27 14:52:29.725755644 +0200
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target store_merge } */
> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> +
> +struct S { unsigned char b[2]; unsigned short c; unsigned char d[4]; unsigned long e; };
> +
> +__attribute__((noipa)) void
> +foo (struct S *p)
> +{
> + p->b[1] = 1;
> + p->c = 23;
> + p->d[0] = 4;
> + p->d[1] = 5;
> + p->d[2] = 6;
> + p->d[3] = 7;
> + p->e = 8;
> +}
> +
> +__attribute__((noipa)) void
> +bar (struct S *p)
> +{
> + p->b[1] = 9;
> + p->c = 112;
> + p->d[0] = 10;
> + p->d[1] = 11;
> +}
> +
> +struct S s = { { 30, 31 }, 32, { 33, 34, 35, 36 }, 37 };
> +
> +int
> +main ()
> +{
> + asm volatile ("" : : : "memory");
> + foo (&s);
> + asm volatile ("" : : : "memory");
> + if (s.b[0] != 30 || s.b[1] != 1 || s.c != 23 || s.d[0] != 4 || s.d[1] != 5
> + || s.d[2] != 6 || s.d[3] != 7 || s.e != 8)
> + __builtin_abort ();
> + bar (&s);
> + asm volatile ("" : : : "memory");
> + if (s.b[0] != 30 || s.b[1] != 9 || s.c != 112 || s.d[0] != 10 || s.d[1] != 11
> + || s.d[2] != 6 || s.d[3] != 7 || s.e != 8)
> + __builtin_abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
> --- gcc/testsuite/gcc.dg/store_merging_12.c.jj 2017-10-27 15:00:20.046976487 +0200
> +++ gcc/testsuite/gcc.dg/store_merging_12.c 2017-10-27 14:59:56.000000000 +0200
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wall" } */
> +
> +struct S { unsigned int b1:1, b2:1, b3:1, b4:1, b5:1, b6:27; };
> +void bar (struct S *);
> +void foo (int x)
> +{
> + struct S s;
> + s.b2 = 1; s.b3 = 0; s.b4 = 1; s.b5 = 0; s.b1 = x; s.b6 = x; /* { dg-bogus "is used uninitialized in this function" } */
> + bar (&s);
> +}
> --- gcc/testsuite/g++.dg/pr71694.C.jj 2016-12-16 11:24:32.000000000 +0100
> +++ gcc/testsuite/g++.dg/pr71694.C 2017-10-27 16:53:09.278596219 +0200
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fno-store-merging" } */
>
> struct B {
> B() {}
>
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)