This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Use bswap framework in store-merging (PR tree-optimization/78821)
- From: Richard Biener <rguenther at suse dot de>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, Thomas Preudhomme <thomas dot preudhomme at foss dot arm dot com>, gcc-patches at gcc dot gnu dot org
- Date: Mon, 20 Nov 2017 10:34:25 +0100 (CET)
- Subject: Re: [PATCH] Use bswap framework in store-merging (PR tree-optimization/78821)
- Authentication-results: sourceware.org; auth=none
- References: <20171116170606.GW14653@tucnak>
On Thu, 16 Nov 2017, Jakub Jelinek wrote:
> Hi!
>
> This patch uses the bswap pass framework inside of the store merging
> pass to handle adjacent stores which produce together a 16/32/64 bit
> store of bswapped value (loaded or from SSA_NAME) or identity (usually
> only from SSA_NAME, the code prefers to use the existing store merging
> code if coming from identity load, because it e.g. can handle arbitrary
> sizes, not just 16/32/64 bits).
>
> There are small tweaks to the bswap code to make it usable inside of
> the store merging pass. Then when processing the stores, we record
> what find_bswap_or_nop_1 returns and do a small sanity check on it,
> and when doing coalesce_immediate_stores (i.e. the splitting into
> groups), we try for 64-bit, 32-bit and 16-bit sizes if we can extend/shift
> (according to endianity) and perform_symbolic_merge them together.
> If it is possible, we turn those 2+ adjacent stores that make together
> {64,32,16} bits into a separate group and process it specially later
> (we need to treat it as a single store rather than multiple, so
> split_group is only very lightweight for that case).
>
> Bootstrapped/regtested on {x86_64,i686,powerpc64le,powerpc64}-linux, ok
> for trunk?
Ok. As said, if you can do the formatting fixes when moving the code
that would be nice.
Thanks,
Richard.
> The cases this patch can handle are less common than rhs_code INTEGER_CST
> (stores of constants to adjacent memory) or MEM_REF (adjacent memory
> copying), but are more common than the bitwise ops, during combined
> x86_64+i686 bootstraps/regtests it triggered:
> lrotate_expr 974 2528
> nop_expr 720 1711
> (lrotate_expr stands for bswap, nop_expr for identity, the first column is
> the actual count of such new stores, the second is the original number of
> stores that have been optimized this way).
>
> 2017-11-16 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/78821
> * gimple-ssa-store-merging.c (find_bswap_or_nop_load): Give up
> if base is TARGET_MEM_REF. If base is not MEM_REF, set base_addr
> to the address of the base rather than the base itself.
> (find_bswap_or_nop_1): Just use pointer comparison for vuse check.
> (find_bswap_or_nop_finalize): New function.
> (find_bswap_or_nop): Use it.
> (bswap_replace): Return a tree rather than bool, change first
> argument from gimple * to gimple_stmt_iterator, allow inserting
> into an empty sequence, allow ins_stmt to be NULL - then emit
> all stmts into gsi. Fix up MEM_REF address gimplification.
> (pass_optimize_bswap::execute): Adjust bswap_replace caller.
> Formatting fix.
> (struct store_immediate_info): Add N and INS_STMT non-static
> data members.
> (store_immediate_info::store_immediate_info): Initialize them
> from newly added ctor args.
> (merged_store_group::apply_stores): Formatting fixes. Sort by
> bitpos at the end.
> (stmts_may_clobber_ref_p): For stores call also
> refs_anti_dependent_p.
> (gather_bswap_load_refs): New function.
> (imm_store_chain_info::try_coalesce_bswap): New method.
> (imm_store_chain_info::coalesce_immediate_stores): Use it.
> (split_group): Handle LROTATE_EXPR and NOP_EXPR rhs_code specially.
> (imm_store_chain_info::output_merged_store): Fail if number of
> new estimated stmts is bigger or equal than old. Handle LROTATE_EXPR
> and NOP_EXPR rhs_code.
> (pass_store_merging::process_store): Compute n and ins_stmt, if
> ins_stmt is non-NULL and the store rhs is otherwise invalid, use
> LROTATE_EXPR rhs_code. Pass n and ins_stmt to store_immediate_info
> ctor.
> (pass_store_merging::execute): Calculate dominators.
>
> * gcc.dg/store_merging_16.c: New test.
>
> --- gcc/gimple-ssa-store-merging.c.jj 2017-11-16 10:45:09.239185205 +0100
> +++ gcc/gimple-ssa-store-merging.c 2017-11-16 15:34:08.560080214 +0100
> @@ -369,7 +369,10 @@ find_bswap_or_nop_load (gimple *stmt, tr
> base_addr = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
> &unsignedp, &reversep, &volatilep);
>
> - if (TREE_CODE (base_addr) == MEM_REF)
> + if (TREE_CODE (base_addr) == TARGET_MEM_REF)
> + /* Do not rewrite TARGET_MEM_REF. */
> + return false;
> + else if (TREE_CODE (base_addr) == MEM_REF)
> {
> offset_int bit_offset = 0;
> tree off = TREE_OPERAND (base_addr, 1);
> @@ -401,6 +404,8 @@ find_bswap_or_nop_load (gimple *stmt, tr
>
> bitpos += bit_offset.to_shwi ();
> }
> + else
> + base_addr = build_fold_addr_expr (base_addr);
>
> if (bitpos % BITS_PER_UNIT)
> return false;
> @@ -743,8 +748,7 @@ find_bswap_or_nop_1 (gimple *stmt, struc
> if (TYPE_PRECISION (n1.type) != TYPE_PRECISION (n2.type))
> return NULL;
>
> - if (!n1.vuse != !n2.vuse
> - || (n1.vuse && !operand_equal_p (n1.vuse, n2.vuse, 0)))
> + if (n1.vuse != n2.vuse)
> return NULL;
>
> source_stmt
> @@ -765,39 +769,21 @@ find_bswap_or_nop_1 (gimple *stmt, struc
> return NULL;
> }
>
> -/* Check if STMT completes a bswap implementation or a read in a given
> - endianness consisting of ORs, SHIFTs and ANDs and sets *BSWAP
> - accordingly. It also sets N to represent the kind of operations
> - performed: size of the resulting expression and whether it works on
> - a memory source, and if so alias-set and vuse. At last, the
> - function returns a stmt whose rhs's first tree is the source
> - expression. */
> +/* Helper for find_bswap_or_nop and try_coalesce_bswap to compute
> + *CMPXCHG, *CMPNOP and adjust *N. */
>
> -gimple *
> -find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> +void
> +find_bswap_or_nop_finalize (struct symbolic_number *n, uint64_t *cmpxchg,
> + uint64_t *cmpnop)
> {
> unsigned rsize;
> uint64_t tmpn, mask;
> -/* The number which the find_bswap_or_nop_1 result should match in order
> - to have a full byte swap. The number is shifted to the right
> - according to the size of the symbolic number before using it. */
> - uint64_t cmpxchg = CMPXCHG;
> - uint64_t cmpnop = CMPNOP;
> -
> - gimple *ins_stmt;
> - int limit;
>
> - /* The last parameter determines the depth search limit. It usually
> - correlates directly to the number n of bytes to be touched. We
> - increase that number by log2(n) + 1 here in order to also
> - cover signed -> unsigned conversions of the src operand as can be seen
> - in libgcc, and for initial shift/and operation of the src operand. */
> - limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt)));
> - limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
> - ins_stmt = find_bswap_or_nop_1 (stmt, n, limit);
> -
> - if (!ins_stmt)
> - return NULL;
> + /* The number which the find_bswap_or_nop_1 result should match in order
> + to have a full byte swap. The number is shifted to the right
> + according to the size of the symbolic number before using it. */
> + *cmpxchg = CMPXCHG;
> + *cmpnop = CMPNOP;
>
> /* Find real size of result (highest non-zero byte). */
> if (n->base_addr)
> @@ -810,8 +796,8 @@ find_bswap_or_nop (gimple *stmt, struct
> if (n->range < (int) sizeof (int64_t))
> {
> mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
> - cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
> - cmpnop &= mask;
> + *cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
> + *cmpnop &= mask;
> }
>
> /* Zero out the bits corresponding to unused bytes in the result of the
> @@ -821,18 +807,47 @@ find_bswap_or_nop (gimple *stmt, struct
> if (BYTES_BIG_ENDIAN)
> {
> mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> - cmpxchg &= mask;
> - cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> + *cmpxchg &= mask;
> + *cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> }
> else
> {
> mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> - cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> - cmpnop &= mask;
> + *cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> + *cmpnop &= mask;
> }
> n->range = rsize;
> }
>
> + n->range *= BITS_PER_UNIT;
> +}
> +
> +/* Check if STMT completes a bswap implementation or a read in a given
> + endianness consisting of ORs, SHIFTs and ANDs and sets *BSWAP
> + accordingly. It also sets N to represent the kind of operations
> + performed: size of the resulting expression and whether it works on
> + a memory source, and if so alias-set and vuse. At last, the
> + function returns a stmt whose rhs's first tree is the source
> + expression. */
> +
> +gimple *
> +find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> +{
> + /* The last parameter determines the depth search limit. It usually
> + correlates directly to the number n of bytes to be touched. We
> + increase that number by log2(n) + 1 here in order to also
> + cover signed -> unsigned conversions of the src operand as can be seen
> + in libgcc, and for initial shift/and operation of the src operand. */
> + int limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt)));
> + limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
> + gimple *ins_stmt = find_bswap_or_nop_1 (stmt, n, limit);
> +
> + if (!ins_stmt)
> + return NULL;
> +
> + uint64_t cmpxchg, cmpnop;
> + find_bswap_or_nop_finalize (n, &cmpxchg, &cmpnop);
> +
> /* A complete byte swap should make the symbolic number to start with
> the largest digit in the highest order byte. Unchanged symbolic
> number indicates a read with same endianness as target architecture. */
> @@ -847,7 +862,6 @@ find_bswap_or_nop (gimple *stmt, struct
> if (!n->base_addr && n->n == cmpnop && n->n_ops == 1)
> return NULL;
>
> - n->range *= BITS_PER_UNIT;
> return ins_stmt;
> }
>
> @@ -882,68 +896,89 @@ public:
> }; // class pass_optimize_bswap
>
> /* Perform the bswap optimization: replace the expression computed in the rhs
> - of CUR_STMT by an equivalent bswap, load or load + bswap expression.
> + of gsi_stmt (GSI) (or if NULL add instead of replace) by an equivalent
> + bswap, load or load + bswap expression.
> Which of these alternatives replace the rhs is given by N->base_addr (non
> null if a load is needed) and BSWAP. The type, VUSE and set-alias of the
> load to perform are also given in N while the builtin bswap invoke is given
> - in FNDEL. Finally, if a load is involved, SRC_STMT refers to one of the
> - load statements involved to construct the rhs in CUR_STMT and N->range gives
> - the size of the rhs expression for maintaining some statistics.
> -
> - Note that if the replacement involve a load, CUR_STMT is moved just after
> - SRC_STMT to do the load with the same VUSE which can lead to CUR_STMT
> - changing of basic block. */
> + in FNDEL. Finally, if a load is involved, INS_STMT refers to one of the
> + load statements involved to construct the rhs in gsi_stmt (GSI) and
> + N->range gives the size of the rhs expression for maintaining some
> + statistics.
> +
> + Note that if the replacement involve a load and if gsi_stmt (GSI) is
> + non-NULL, that stmt is moved just after INS_STMT to do the load with the
> + same VUSE which can lead to gsi_stmt (GSI) changing of basic block. */
>
> -bool
> -bswap_replace (gimple *cur_stmt, gimple *ins_stmt, tree fndecl,
> +tree
> +bswap_replace (gimple_stmt_iterator gsi, gimple *ins_stmt, tree fndecl,
> tree bswap_type, tree load_type, struct symbolic_number *n,
> bool bswap)
> {
> - gimple_stmt_iterator gsi;
> - tree src, tmp, tgt;
> + tree src, tmp, tgt = NULL_TREE;
> gimple *bswap_stmt;
>
> - gsi = gsi_for_stmt (cur_stmt);
> + gimple *cur_stmt = gsi_stmt (gsi);
> src = n->src;
> - tgt = gimple_assign_lhs (cur_stmt);
> + if (cur_stmt)
> + tgt = gimple_assign_lhs (cur_stmt);
>
> /* Need to load the value from memory first. */
> if (n->base_addr)
> {
> - gimple_stmt_iterator gsi_ins = gsi_for_stmt (ins_stmt);
> + gimple_stmt_iterator gsi_ins = gsi;
> + if (ins_stmt)
> + gsi_ins = gsi_for_stmt (ins_stmt);
> tree addr_expr, addr_tmp, val_expr, val_tmp;
> tree load_offset_ptr, aligned_load_type;
> - gimple *addr_stmt, *load_stmt;
> - unsigned align;
> + gimple *load_stmt;
> + unsigned align = get_object_alignment (src);
> HOST_WIDE_INT load_offset = 0;
> - basic_block ins_bb, cur_bb;
>
> - ins_bb = gimple_bb (ins_stmt);
> - cur_bb = gimple_bb (cur_stmt);
> - if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb))
> - return false;
> -
> - align = get_object_alignment (src);
> -
> - /* Move cur_stmt just before one of the load of the original
> - to ensure it has the same VUSE. See PR61517 for what could
> - go wrong. */
> - if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt))
> - reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
> - gsi_move_before (&gsi, &gsi_ins);
> - gsi = gsi_for_stmt (cur_stmt);
> + if (cur_stmt)
> + {
> + basic_block ins_bb = gimple_bb (ins_stmt);
> + basic_block cur_bb = gimple_bb (cur_stmt);
> + if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb))
> + return NULL_TREE;
> +
> + /* Move cur_stmt just before one of the load of the original
> + to ensure it has the same VUSE. See PR61517 for what could
> + go wrong. */
> + if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt))
> + reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
> + gsi_move_before (&gsi, &gsi_ins);
> + gsi = gsi_for_stmt (cur_stmt);
> + }
> + else
> + gsi = gsi_ins;
>
> /* Compute address to load from and cast according to the size
> of the load. */
> - addr_expr = build_fold_addr_expr (unshare_expr (src));
> + addr_expr = build_fold_addr_expr (src);
> if (is_gimple_mem_ref_addr (addr_expr))
> - addr_tmp = addr_expr;
> + addr_tmp = unshare_expr (addr_expr);
> else
> {
> - addr_tmp = make_temp_ssa_name (TREE_TYPE (addr_expr), NULL,
> - "load_src");
> - addr_stmt = gimple_build_assign (addr_tmp, addr_expr);
> - gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT);
> + addr_tmp = unshare_expr (n->base_addr);
> + if (!is_gimple_mem_ref_addr (addr_tmp))
> + addr_tmp = force_gimple_operand_gsi_1 (&gsi, addr_tmp,
> + is_gimple_mem_ref_addr,
> + NULL_TREE, true,
> + GSI_SAME_STMT);
> + load_offset = n->bytepos;
> + if (n->offset)
> + {
> + tree off
> + = force_gimple_operand_gsi (&gsi, unshare_expr (n->offset),
> + true, NULL_TREE, true,
> + GSI_SAME_STMT);
> + gimple *stmt
> + = gimple_build_assign (make_ssa_name (TREE_TYPE (addr_tmp)),
> + POINTER_PLUS_EXPR, addr_tmp, off);
> + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> + addr_tmp = gimple_assign_lhs (stmt);
> + }
> }
>
> /* Perform the load. */
> @@ -967,7 +1002,7 @@ bswap_replace (gimple *cur_stmt, gimple
> }
>
> /* Convert the result of load if necessary. */
> - if (!useless_type_conversion_p (TREE_TYPE (tgt), load_type))
> + if (tgt && !useless_type_conversion_p (TREE_TYPE (tgt), load_type))
> {
> val_tmp = make_temp_ssa_name (aligned_load_type, NULL,
> "load_dst");
> @@ -975,13 +1010,21 @@ bswap_replace (gimple *cur_stmt, gimple
> gimple_set_vuse (load_stmt, n->vuse);
> gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT);
> gimple_assign_set_rhs_with_ops (&gsi, NOP_EXPR, val_tmp);
> + update_stmt (cur_stmt);
> }
> - else
> + else if (cur_stmt)
> {
> gimple_assign_set_rhs_with_ops (&gsi, MEM_REF, val_expr);
> gimple_set_vuse (cur_stmt, n->vuse);
> + update_stmt (cur_stmt);
> + }
> + else
> + {
> + tgt = make_ssa_name (load_type);
> + cur_stmt = gimple_build_assign (tgt, MEM_REF, val_expr);
> + gimple_set_vuse (cur_stmt, n->vuse);
> + gsi_insert_before (&gsi, cur_stmt, GSI_SAME_STMT);
> }
> - update_stmt (cur_stmt);
>
> if (dump_file)
> {
> @@ -990,7 +1033,7 @@ bswap_replace (gimple *cur_stmt, gimple
> (int) n->range);
> print_gimple_stmt (dump_file, cur_stmt, 0);
> }
> - return true;
> + return tgt;
> }
> else
> {
> @@ -1003,15 +1046,17 @@ bswap_replace (gimple *cur_stmt, gimple
> }
> else if (!bswap)
> {
> - gimple *g;
> - if (!useless_type_conversion_p (TREE_TYPE (tgt), TREE_TYPE (src)))
> + gimple *g = NULL;
> + if (tgt && !useless_type_conversion_p (TREE_TYPE (tgt), TREE_TYPE (src)))
> {
> if (!is_gimple_val (src))
> - return false;
> + return NULL_TREE;
> g = gimple_build_assign (tgt, NOP_EXPR, src);
> }
> - else
> + else if (cur_stmt)
> g = gimple_build_assign (tgt, src);
> + else
> + tgt = src;
> if (n->range == 16)
> nop_stats.found_16bit++;
> else if (n->range == 32)
> @@ -1026,10 +1071,17 @@ bswap_replace (gimple *cur_stmt, gimple
> fprintf (dump_file,
> "%d bit reshuffle in target endianness found at: ",
> (int) n->range);
> - print_gimple_stmt (dump_file, cur_stmt, 0);
> + if (cur_stmt)
> + print_gimple_stmt (dump_file, cur_stmt, 0);
> + else
> + {
> + print_generic_expr (dump_file, tgt, 0);
> + fprintf (dump_file, "\n");
> + }
> }
> - gsi_replace (&gsi, g, true);
> - return true;
> + if (cur_stmt)
> + gsi_replace (&gsi, g, true);
> + return tgt;
> }
> else if (TREE_CODE (src) == BIT_FIELD_REF)
> src = TREE_OPERAND (src, 0);
> @@ -1069,6 +1121,8 @@ bswap_replace (gimple *cur_stmt, gimple
> else
> bswap_stmt = gimple_build_call (fndecl, 1, tmp);
>
> + if (tgt == NULL_TREE)
> + tgt = make_ssa_name (bswap_type);
> tmp = tgt;
>
> /* Convert the result if necessary. */
> @@ -1087,12 +1141,23 @@ bswap_replace (gimple *cur_stmt, gimple
> {
> fprintf (dump_file, "%d bit bswap implementation found at: ",
> (int) n->range);
> - print_gimple_stmt (dump_file, cur_stmt, 0);
> + if (cur_stmt)
> + print_gimple_stmt (dump_file, cur_stmt, 0);
> + else
> + {
> + print_generic_expr (dump_file, tgt, 0);
> + fprintf (dump_file, "\n");
> + }
> }
>
> - gsi_insert_after (&gsi, bswap_stmt, GSI_SAME_STMT);
> - gsi_remove (&gsi, true);
> - return true;
> + if (cur_stmt)
> + {
> + gsi_insert_after (&gsi, bswap_stmt, GSI_SAME_STMT);
> + gsi_remove (&gsi, true);
> + }
> + else
> + gsi_insert_before (&gsi, bswap_stmt, GSI_SAME_STMT);
> + return tgt;
> }
>
> /* Find manual byte swap implementations as well as load in a given
> @@ -1141,7 +1206,7 @@ pass_optimize_bswap::execute (function *
> inserted smaller bswap replacements as sub-patterns, the wider
> variant wouldn't be detected. */
> for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
> - {
> + {
> gimple *ins_stmt, *cur_stmt = gsi_stmt (gsi);
> tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
> enum tree_code code;
> @@ -1211,8 +1276,8 @@ pass_optimize_bswap::execute (function *
> if (bswap && !fndecl && n.range != 16)
> continue;
>
> - if (bswap_replace (cur_stmt, ins_stmt, fndecl, bswap_type, load_type,
> - &n, bswap))
> + if (bswap_replace (gsi_for_stmt (cur_stmt), ins_stmt, fndecl,
> + bswap_type, load_type, &n, bswap))
> changed = true;
> }
> }
> @@ -1281,8 +1346,15 @@ struct store_immediate_info
> gimple *stmt;
> unsigned int order;
> /* INTEGER_CST for constant stores, MEM_REF for memory copy or
> - BIT_*_EXPR for logical bitwise operation. */
> + BIT_*_EXPR for logical bitwise operation.
> + LROTATE_EXPR if it can be only bswap optimized and
> + ops are not really meaningful.
> + NOP_EXPR if bswap optimization detected identity, ops
> + are not meaningful. */
> enum tree_code rhs_code;
> + /* Two fields for bswap optimization purposes. */
> + struct symbolic_number n;
> + gimple *ins_stmt;
> /* True if BIT_{AND,IOR,XOR}_EXPR result is inverted before storing. */
> bool bit_not_p;
> /* True if ops have been swapped and thus ops[1] represents
> @@ -1293,7 +1365,8 @@ struct store_immediate_info
> store_operand_info ops[2];
> store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> - gimple *, unsigned int, enum tree_code, bool,
> + gimple *, unsigned int, enum tree_code,
> + struct symbolic_number &, gimple *, bool,
> const store_operand_info &,
> const store_operand_info &);
> };
> @@ -1305,12 +1378,14 @@ store_immediate_info::store_immediate_in
> gimple *st,
> unsigned int ord,
> enum tree_code rhscode,
> + struct symbolic_number &nr,
> + gimple *ins_stmtp,
> bool bitnotp,
> const store_operand_info &op0r,
> const store_operand_info &op1r)
> : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
> - stmt (st), order (ord), rhs_code (rhscode), bit_not_p (bitnotp),
> - ops_swapped_p (false)
> + stmt (st), order (ord), rhs_code (rhscode), n (nr),
> + ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false)
> #if __cplusplus >= 201103L
> , ops { op0r, op1r }
> {
> @@ -1884,13 +1959,13 @@ merged_store_group::apply_stores ()
> fprintf (dump_file, "After writing ");
> print_generic_expr (dump_file, cst, 0);
> fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
> - " at position %d the merged region contains:\n",
> - info->bitsize, pos_in_buffer);
> + " at position %d the merged region contains:\n",
> + info->bitsize, pos_in_buffer);
> dump_char_array (dump_file, val, buf_size);
> }
> else
> fprintf (dump_file, "Failed to merge stores\n");
> - }
> + }
> if (!ret)
> return false;
> unsigned char *m = mask + (pos_in_buffer / BITS_PER_UNIT);
> @@ -1901,6 +1976,7 @@ merged_store_group::apply_stores ()
> else
> clear_bit_region (m, pos_in_buffer % BITS_PER_UNIT, info->bitsize);
> }
> + stores.qsort (sort_by_bitpos);
> return true;
> }
>
> @@ -1937,6 +2013,7 @@ struct imm_store_chain_info
> }
> }
> bool terminate_and_process_chain ();
> + bool try_coalesce_bswap (merged_store_group *, unsigned int, unsigned int);
> bool coalesce_immediate_stores ();
> bool output_merged_store (merged_store_group *);
> bool output_merged_stores ();
> @@ -2075,7 +2152,8 @@ pass_store_merging::terminate_and_releas
>
> /* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
> may clobber REF. FIRST and LAST must be in the same basic block and
> - have non-NULL vdef. */
> + have non-NULL vdef. We want to be able to sink load of REF across
> + stores between FIRST and LAST, up to right before LAST. */
>
> bool
> stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
> @@ -2092,6 +2170,9 @@ stmts_may_clobber_ref_p (gimple *first,
> stmt = SSA_NAME_DEF_STMT (vop);
> if (stmt_may_clobber_ref_p_1 (stmt, &r))
> return true;
> + if (gimple_store_p (stmt)
> + && refs_anti_dependent_p (ref, gimple_get_lhs (stmt)))
> + return true;
> /* Avoid quadratic compile time by bounding the number of checks
> we perform. */
> if (++count > MAX_STORE_ALIAS_CHECKS)
> @@ -2192,6 +2273,252 @@ compatible_load_p (merged_store_group *m
> return true;
> }
>
> +/* Add all refs loaded to compute VAL to REFS vector. */
> +
> +void
> +gather_bswap_load_refs (vec<tree> *refs, tree val)
> +{
> + if (TREE_CODE (val) != SSA_NAME)
> + return;
> +
> + gimple *stmt = SSA_NAME_DEF_STMT (val);
> + if (!is_gimple_assign (stmt))
> + return;
> +
> + if (gimple_assign_load_p (stmt))
> + {
> + refs->safe_push (gimple_assign_rhs1 (stmt));
> + return;
> + }
> +
> + switch (gimple_assign_rhs_class (stmt))
> + {
> + case GIMPLE_BINARY_RHS:
> + gather_bswap_load_refs (refs, gimple_assign_rhs2 (stmt));
> + /* FALLTHRU */
> + case GIMPLE_UNARY_RHS:
> + gather_bswap_load_refs (refs, gimple_assign_rhs1 (stmt));
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +}
> +
> +/* Return true if m_store_info[first] and at least one following store
> + form a group which store try_size bitsize value which is byte swapped
> + from a memory load or some value, or identity from some value.
> + This uses the bswap pass APIs. */
> +
> +bool
> +imm_store_chain_info::try_coalesce_bswap (merged_store_group *merged_store,
> + unsigned int first,
> + unsigned int try_size)
> +{
> + unsigned int len = m_store_info.length (), last = first;
> + unsigned HOST_WIDE_INT width = m_store_info[first]->bitsize;
> + if (width >= try_size)
> + return false;
> + for (unsigned int i = first + 1; i < len; ++i)
> + {
> + if (m_store_info[i]->bitpos != m_store_info[first]->bitpos + width
> + || m_store_info[i]->ins_stmt == NULL)
> + return false;
> + width += m_store_info[i]->bitsize;
> + if (width >= try_size)
> + {
> + last = i;
> + break;
> + }
> + }
> + if (width != try_size)
> + return false;
> +
> + bool allow_unaligned
> + = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
> + /* Punt if the combined store would not be aligned and we need alignment. */
> + if (!allow_unaligned)
> + {
> + unsigned int align = merged_store->align;
> + unsigned HOST_WIDE_INT align_base = merged_store->align_base;
> + for (unsigned int i = first + 1; i <= last; ++i)
> + {
> + unsigned int this_align;
> + unsigned HOST_WIDE_INT align_bitpos = 0;
> + get_object_alignment_1 (gimple_assign_lhs (m_store_info[i]->stmt),
> + &this_align, &align_bitpos);
> + if (this_align > align)
> + {
> + align = this_align;
> + align_base = m_store_info[i]->bitpos - align_bitpos;
> + }
> + }
> + unsigned HOST_WIDE_INT align_bitpos
> + = (m_store_info[first]->bitpos - align_base) & (align - 1);
> + if (align_bitpos)
> + align = least_bit_hwi (align_bitpos);
> + if (align < try_size)
> + return false;
> + }
> +
> + tree type;
> + switch (try_size)
> + {
> + case 16: type = uint16_type_node; break;
> + case 32: type = uint32_type_node; break;
> + case 64: type = uint64_type_node; break;
> + default: gcc_unreachable ();
> + }
> + struct symbolic_number n;
> + gimple *ins_stmt = NULL;
> + int vuse_store = -1;
> + unsigned int first_order = merged_store->first_order;
> + unsigned int last_order = merged_store->last_order;
> + gimple *first_stmt = merged_store->first_stmt;
> + gimple *last_stmt = merged_store->last_stmt;
> + store_immediate_info *infof = m_store_info[first];
> +
> + for (unsigned int i = first; i <= last; ++i)
> + {
> + store_immediate_info *info = m_store_info[i];
> + struct symbolic_number this_n = info->n;
> + this_n.type = type;
> + if (!this_n.base_addr)
> + this_n.range = try_size / BITS_PER_UNIT;
> + unsigned int bitpos = info->bitpos - infof->bitpos;
> + if (!do_shift_rotate (LSHIFT_EXPR, &this_n,
> + BYTES_BIG_ENDIAN
> + ? try_size - info->bitsize - bitpos
> + : bitpos))
> + return false;
> + if (n.base_addr && vuse_store)
> + {
> + unsigned int j;
> + for (j = first; j <= last; ++j)
> + if (this_n.vuse == gimple_vuse (m_store_info[j]->stmt))
> + break;
> + if (j > last)
> + {
> + if (vuse_store == 1)
> + return false;
> + vuse_store = 0;
> + }
> + }
> + if (i == first)
> + {
> + n = this_n;
> + ins_stmt = info->ins_stmt;
> + }
> + else
> + {
> + if (n.base_addr)
> + {
> + if (n.vuse != this_n.vuse)
> + {
> + if (vuse_store == 0)
> + return false;
> + vuse_store = 1;
> + }
> + if (info->order > last_order)
> + {
> + last_order = info->order;
> + last_stmt = info->stmt;
> + }
> + else if (info->order < first_order)
> + {
> + first_order = info->order;
> + first_stmt = info->stmt;
> + }
> + }
> +
> + ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
> + &this_n, &n);
> + if (ins_stmt == NULL)
> + return false;
> + }
> + }
> +
> + uint64_t cmpxchg, cmpnop;
> + find_bswap_or_nop_finalize (&n, &cmpxchg, &cmpnop);
> +
> + /* A complete byte swap should make the symbolic number to start with
> + the largest digit in the highest order byte. Unchanged symbolic
> + number indicates a read with same endianness as target architecture. */
> + if (n.n != cmpnop && n.n != cmpxchg)
> + return false;
> +
> + if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
> + return false;
> +
> + /* Don't handle memory copy this way if normal non-bswap processing
> + would handle it too. */
> + if (n.n == cmpnop && (unsigned) n.n_ops == last - first + 1)
> + {
> + unsigned int i;
> + for (i = first; i <= last; ++i)
> + if (m_store_info[i]->rhs_code != MEM_REF)
> + break;
> + if (i == last + 1)
> + return false;
> + }
> +
> + if (n.n == cmpxchg)
> + switch (try_size)
> + {
> + case 16:
> + /* Will emit LROTATE_EXPR. */
> + break;
> + case 32:
> + if (builtin_decl_explicit_p (BUILT_IN_BSWAP32)
> + && optab_handler (bswap_optab, SImode) != CODE_FOR_nothing)
> + break;
> + return false;
> + case 64:
> + if (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
> + && optab_handler (bswap_optab, DImode) != CODE_FOR_nothing)
> + break;
> + return false;
> + default:
> + gcc_unreachable ();
> + }
> +
> + if (!allow_unaligned && n.base_addr)
> + {
> + unsigned int align = get_object_alignment (n.src);
> + if (align < try_size)
> + return false;
> + }
> +
> + /* If each load has vuse of the corresponding store, need to verify
> + the loads can be sunk right before the last store. */
> + if (vuse_store == 1)
> + {
> + auto_vec<tree, 64> refs;
> + for (unsigned int i = first; i <= last; ++i)
> + gather_bswap_load_refs (&refs,
> + gimple_assign_rhs1 (m_store_info[i]->stmt));
> +
> + unsigned int i;
> + tree ref;
> + FOR_EACH_VEC_ELT (refs, i, ref)
> + if (stmts_may_clobber_ref_p (first_stmt, last_stmt, ref))
> + return false;
> + n.vuse = NULL_TREE;
> + }
> +
> + infof->n = n;
> + infof->ins_stmt = ins_stmt;
> + for (unsigned int i = first; i <= last; ++i)
> + {
> + m_store_info[i]->rhs_code = n.n == cmpxchg ? LROTATE_EXPR : NOP_EXPR;
> + m_store_info[i]->ops[0].base_addr = NULL_TREE;
> + m_store_info[i]->ops[1].base_addr = NULL_TREE;
> + if (i != first)
> + merged_store->merge_into (m_store_info[i]);
> + }
> +
> + return true;
> +}
> +
> /* Go through the candidate stores recorded in m_store_info and merge them
> into merged_store_group objects recorded into m_merged_store_groups
> representing the widened stores. Return true if coalescing was successful
> @@ -2210,7 +2537,7 @@ imm_store_chain_info::coalesce_immediate
> m_store_info.length ());
>
> store_immediate_info *info;
> - unsigned int i;
> + unsigned int i, ignore = 0;
>
> /* Order the stores by the bitposition they write to. */
> m_store_info.qsort (sort_by_bitpos);
> @@ -2229,14 +2556,41 @@ imm_store_chain_info::coalesce_immediate
> fprintf (dump_file, "\n------------\n");
> }
>
> - if (i == 0)
> + if (i <= ignore)
> continue;
>
> + /* First try to handle group of stores like:
> + p[0] = data >> 24;
> + p[1] = data >> 16;
> + p[2] = data >> 8;
> + p[3] = data;
> + using the bswap framework. */
> + if (info->bitpos == merged_store->start + merged_store->width
> + && merged_store->stores.length () == 1
> + && merged_store->stores[0]->ins_stmt != NULL
> + && info->ins_stmt != NULL)
> + {
> + unsigned int try_size;
> + for (try_size = 64; try_size >= 16; try_size >>= 1)
> + if (try_coalesce_bswap (merged_store, i - 1, try_size))
> + break;
> +
> + if (try_size >= 16)
> + {
> + ignore = i + merged_store->stores.length () - 1;
> + m_merged_store_groups.safe_push (merged_store);
> + if (ignore < m_store_info.length ())
> + merged_store = new merged_store_group (m_store_info[ignore]);
> + else
> + merged_store = NULL;
> + continue;
> + }
> + }
> +
> /* |---store 1---|
> |---store 2---|
> - Overlapping stores. */
> - unsigned HOST_WIDE_INT start = info->bitpos;
> - if (IN_RANGE (start, merged_store->start,
> + Overlapping stores. */
> + if (IN_RANGE (info->bitpos, merged_store->start,
> merged_store->start + merged_store->width - 1))
> {
> /* Only allow overlapping stores of constants. */
> @@ -2251,7 +2605,8 @@ imm_store_chain_info::coalesce_immediate
> This store is consecutive to the previous one.
> Merge it into the current store group. There can be gaps in between
> the stores, but there can't be gaps in between bitregions. */
> - else if (info->bitregion_start <= merged_store->bitregion_end
> + else if (info->rhs_code != LROTATE_EXPR
> + && info->bitregion_start <= merged_store->bitregion_end
> && info->rhs_code == merged_store->stores[0]->rhs_code)
> {
> store_immediate_info *infof = merged_store->stores[0];
> @@ -2297,10 +2652,13 @@ imm_store_chain_info::coalesce_immediate
> }
>
> /* Record or discard the last store group. */
> - if (!merged_store->apply_stores ())
> - delete merged_store;
> - else
> - m_merged_store_groups.safe_push (merged_store);
> + if (merged_store)
> + {
> + if (!merged_store->apply_stores ())
> + delete merged_store;
> + else
> + m_merged_store_groups.safe_push (merged_store);
> + }
>
> gcc_assert (m_merged_store_groups.length () <= m_store_info.length ());
> bool success
> @@ -2560,9 +2918,41 @@ split_group (merged_store_group *group,
>
> gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
>
> + if (group->stores[0]->rhs_code == LROTATE_EXPR
> + || group->stores[0]->rhs_code == NOP_EXPR)
> + {
> + /* For bswap framework using sets of stores, all the checking
> + has been done earlier in try_coalesce_bswap and needs to be
> + emitted as a single store. */
> + if (total_orig)
> + {
> + /* Avoid the old/new stmt count heuristics. It should be
> + always beneficial. */
> + total_new[0] = 1;
> + total_orig[0] = 2;
> + }
> +
> + if (split_stores)
> + {
> + unsigned HOST_WIDE_INT align_bitpos
> + = (group->start - align_base) & (group_align - 1);
> + unsigned HOST_WIDE_INT align = group_align;
> + if (align_bitpos)
> + align = least_bit_hwi (align_bitpos);
> + bytepos = group->start / BITS_PER_UNIT;
> + struct split_store *store
> + = new split_store (bytepos, group->width, align);
> + unsigned int first = 0;
> + find_constituent_stores (group, &store->orig_stores,
> + &first, group->start, group->width);
> + split_stores->safe_push (store);
> + }
> +
> + return 1;
> + }
> +
> unsigned int ret = 0, first = 0;
> unsigned HOST_WIDE_INT try_pos = bytepos;
> - group->stores.qsort (sort_by_bitpos);
>
> if (total_orig)
> {
> @@ -2904,6 +3294,7 @@ imm_store_chain_info::output_merged_stor
> " not larger than estimated number of new"
> " stmts (%u).\n",
> total_orig, total_new);
> + return false;
> }
>
> gimple_stmt_iterator last_gsi = gsi_for_stmt (group->last_stmt);
> @@ -2911,13 +3302,63 @@ imm_store_chain_info::output_merged_stor
> tree last_vdef, new_vuse;
> last_vdef = gimple_vdef (group->last_stmt);
> new_vuse = gimple_vuse (group->last_stmt);
> + tree bswap_res = NULL_TREE;
> +
> + if (group->stores[0]->rhs_code == LROTATE_EXPR
> + || group->stores[0]->rhs_code == NOP_EXPR)
> + {
> + tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
> + gimple *ins_stmt = group->stores[0]->ins_stmt;
> + struct symbolic_number *n = &group->stores[0]->n;
> + bool bswap = group->stores[0]->rhs_code == LROTATE_EXPR;
> +
> + switch (n->range)
> + {
> + case 16:
> + load_type = bswap_type = uint16_type_node;
> + break;
> + case 32:
> + load_type = uint32_type_node;
> + if (bswap)
> + {
> + fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32);
> + bswap_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
> + }
> + break;
> + case 64:
> + load_type = uint64_type_node;
> + if (bswap)
> + {
> + fndecl = builtin_decl_explicit (BUILT_IN_BSWAP64);
> + bswap_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
> + }
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + /* If the loads have each vuse of the corresponding store,
> + we've checked the aliasing already in try_coalesce_bswap and
> + we want to sink the need load into seq. So need to use new_vuse
> + on the load. */
> + if (n->base_addr && n->vuse == NULL)
> + {
> + n->vuse = new_vuse;
> + ins_stmt = NULL;
> + }
> + bswap_res = bswap_replace (gsi_start (seq), ins_stmt, fndecl,
> + bswap_type, load_type, n, bswap);
> + gcc_assert (bswap_res);
> + }
>
> gimple *stmt = NULL;
> split_store *split_store;
> unsigned int i;
> auto_vec<gimple *, 32> orig_stmts;
> - tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &seq,
> + gimple_seq this_seq;
> + tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &this_seq,
> is_gimple_mem_ref_addr, NULL_TREE);
> + gimple_seq_add_seq_without_update (&seq, this_seq);
>
> tree load_addr[2] = { NULL_TREE, NULL_TREE };
> gimple_seq load_seq[2] = { NULL, NULL };
> @@ -2941,7 +3382,6 @@ imm_store_chain_info::output_merged_stor
> load_addr[j] = addr;
> else
> {
> - gimple_seq this_seq;
> load_addr[j]
> = force_gimple_operand_1 (unshare_expr (op.base_addr),
> &this_seq, is_gimple_mem_ref_addr,
> @@ -2988,10 +3428,12 @@ imm_store_chain_info::output_merged_stor
> MR_DEPENDENCE_BASE (dest) = base;
> }
>
> - tree mask
> - = native_interpret_expr (int_type,
> - group->mask + try_pos - start_byte_pos,
> - group->buf_size);
> + tree mask = integer_zero_node;
> + if (!bswap_res)
> + mask = native_interpret_expr (int_type,
> + group->mask + try_pos
> + - start_byte_pos,
> + group->buf_size);
>
> tree ops[2];
> for (int j = 0;
> @@ -2999,7 +3441,9 @@ imm_store_chain_info::output_merged_stor
> ++j)
> {
> store_operand_info &op = split_store->orig_stores[0]->ops[j];
> - if (op.base_addr)
> + if (bswap_res)
> + ops[j] = bswap_res;
> + else if (op.base_addr)
> {
> FOR_EACH_VEC_ELT (split_store->orig_stores, k, info)
> orig_stmts.safe_push (info->ops[j].stmt);
> @@ -3124,6 +3568,24 @@ imm_store_chain_info::output_merged_stor
> src = gimple_assign_lhs (stmt);
> }
> break;
> + case LROTATE_EXPR:
> + case NOP_EXPR:
> + src = ops[0];
> + if (!is_gimple_val (src))
> + {
> + stmt = gimple_build_assign (make_ssa_name (TREE_TYPE (src)),
> + src);
> + gimple_seq_add_stmt_without_update (&seq, stmt);
> + src = gimple_assign_lhs (stmt);
> + }
> + if (!useless_type_conversion_p (int_type, TREE_TYPE (src)))
> + {
> + stmt = gimple_build_assign (make_ssa_name (int_type),
> + NOP_EXPR, src);
> + gimple_seq_add_stmt_without_update (&seq, stmt);
> + src = gimple_assign_lhs (stmt);
> + }
> + break;
> default:
> src = ops[0];
> break;
> @@ -3494,6 +3956,8 @@ pass_store_merging::process_store (gimpl
> && (TREE_CODE (rhs) != INTEGER_CST)));
> enum tree_code rhs_code = ERROR_MARK;
> bool bit_not_p = false;
> + struct symbolic_number n;
> + gimple *ins_stmt = NULL;
> store_operand_info ops[2];
> if (invalid)
> ;
> @@ -3558,6 +4022,35 @@ pass_store_merging::process_store (gimpl
> invalid = true;
> break;
> }
> + if ((bitsize % BITS_PER_UNIT) == 0
> + && (bitpos % BITS_PER_UNIT) == 0
> + && bitsize <= 64
> + && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN)
> + {
> + ins_stmt = find_bswap_or_nop_1 (def_stmt, &n, 12);
> + if (ins_stmt)
> + {
> + uint64_t nn = n.n;
> + for (unsigned HOST_WIDE_INT i = 0;
> + i < bitsize; i += BITS_PER_UNIT, nn >>= BITS_PER_MARKER)
> + if ((nn & MARKER_MASK) == 0
> + || (nn & MARKER_MASK) == MARKER_BYTE_UNKNOWN)
> + {
> + ins_stmt = NULL;
> + break;
> + }
> + if (ins_stmt)
> + {
> + if (invalid)
> + {
> + rhs_code = LROTATE_EXPR;
> + ops[0].base_addr = NULL_TREE;
> + ops[1].base_addr = NULL_TREE;
> + }
> + invalid = false;
> + }
> + }
> + }
> }
>
> if (invalid)
> @@ -3566,6 +4059,9 @@ pass_store_merging::process_store (gimpl
> return;
> }
>
> + if (!ins_stmt)
> + memset (&n, 0, sizeof (n));
> +
> struct imm_store_chain_info **chain_info = NULL;
> if (base_addr)
> chain_info = m_stores.get (base_addr);
> @@ -3576,6 +4072,7 @@ pass_store_merging::process_store (gimpl
> unsigned int ord = (*chain_info)->m_store_info.length ();
> info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> bitregion_end, stmt, ord, rhs_code,
> + n, ins_stmt,
> bit_not_p, ops[0], ops[1]);
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> @@ -3604,6 +4101,7 @@ pass_store_merging::process_store (gimpl
> = new imm_store_chain_info (m_stores_head, base_addr);
> info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> bitregion_end, stmt, 0, rhs_code,
> + n, ins_stmt,
> bit_not_p, ops[0], ops[1]);
> new_chain->m_store_info.safe_push (info);
> m_stores.put (base_addr, new_chain);
> @@ -3628,6 +4126,8 @@ pass_store_merging::execute (function *f
> basic_block bb;
> hash_set<gimple *> orig_stmts;
>
> + calculate_dominance_info (CDI_DOMINATORS);
> +
> FOR_EACH_BB_FN (bb, fun)
> {
> gimple_stmt_iterator gsi;
> --- gcc/testsuite/gcc.dg/store_merging_16.c.jj 2017-11-16 13:36:06.550536836 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_16.c 2017-11-16 15:12:55.231898459 +0100
> @@ -0,0 +1,157 @@
> +/* Only test on some 64-bit targets which do have bswap{si,di}2 patterns and
> + are either big or little endian (not pdp endian). */
> +/* { dg-do compile { target { lp64 && { i?86-*-* x86_64-*-* powerpc*-*-* aarch64*-*-* } } } } */
> +/* { dg-require-effective-target store_merge } */
> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> +
> +__attribute__((noipa)) void
> +f1 (unsigned char *p, unsigned long long q)
> +{
> + p[0] = q;
> + p[1] = q >> 8;
> + p[2] = q >> 16;
> + p[3] = q >> 24;
> + p[4] = q >> 32;
> + p[5] = q >> 40;
> + p[6] = q >> 48;
> + p[7] = q >> 56;
> +}
> +
> +__attribute__((noipa)) void
> +f2 (unsigned char *p, unsigned long long q)
> +{
> + p[0] = q >> 56;
> + p[1] = q >> 48;
> + p[2] = q >> 40;
> + p[3] = q >> 32;
> + p[4] = q >> 24;
> + p[5] = q >> 16;
> + p[6] = q >> 8;
> + p[7] = q;
> +}
> +
> +__attribute__((noipa)) void
> +f3 (unsigned char *__restrict p, unsigned char *__restrict q)
> +{
> + unsigned char q3 = q[3];
> + unsigned char q2 = q[2];
> + unsigned char q1 = q[1];
> + unsigned char q0 = q[0];
> + p[0] = q3;
> + p[1] = q2;
> + p[2] = q1;
> + p[3] = q0;
> +}
> +
> +__attribute__((noipa)) void
> +f4 (unsigned char *__restrict p, unsigned char *__restrict q)
> +{
> + p[0] = q[3];
> + p[1] = q[2];
> + p[2] = q[1];
> + p[3] = q[0];
> +}
> +
> +struct S { unsigned char a, b; unsigned short c; };
> +
> +__attribute__((noipa)) void
> +f5 (struct S *__restrict p, struct S *__restrict q)
> +{
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> + unsigned char pa = q->c >> 8;
> + unsigned char pb = q->c;
> + unsigned short pc = (q->a << 8) | q->b;
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> + unsigned char pa = q->c;
> + unsigned char pb = q->c >> 8;
> + unsigned short pc = q->a | (q->b << 8);
> +#endif
> + p->a = pa;
> + p->b = pb;
> + p->c = pc;
> +}
> +
> +__attribute__((noipa)) void
> +f6 (struct S *__restrict p, struct S *__restrict q)
> +{
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> + p->a = q->c >> 8;
> + p->b = q->c;
> + p->c = (q->a << 8) | q->b;
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> + p->a = q->c;
> + p->b = q->c >> 8;
> + p->c = q->a | (q->b << 8);
> +#endif
> +}
> +
> +struct T { unsigned long long a : 8, b : 8, c : 8, d : 8, e : 8, f : 8, g : 8, h : 8; };
> +
> +__attribute__((noipa)) void
> +f7 (struct T *__restrict p, struct T *__restrict q)
> +{
> + p->a = q->h;
> + p->b = q->g;
> + p->c = q->f;
> + p->d = q->e;
> + p->e = q->d;
> + p->f = q->c;
> + p->g = q->b;
> + p->h = q->a;
> +}
> +
> +struct S b = { 0x11, 0x12,
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> + 0x1413
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> + 0x1314
> +#endif
> + };
> +struct T e = { 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28 };
> +
> +int
> +main ()
> +{
> + unsigned char a[8];
> + int i;
> + struct S b, c, d;
> + f1 (a, 0x0102030405060708ULL);
> + for (i = 0; i < 8; ++i)
> + if (a[i] != 8 - i)
> + __builtin_abort ();
> + f2 (a, 0x0102030405060708ULL);
> + for (i = 0; i < 8; ++i)
> + if (a[i] != 1 + i)
> + __builtin_abort ();
> + f3 (a, a + 4);
> + for (i = 0; i < 8; ++i)
> + if (a[i] != (i < 4 ? 8 - i : 1 + i))
> + __builtin_abort ();
> + f2 (a, 0x090a0b0c0d0e0f10ULL);
> + f4 (a + 4, a);
> + for (i = 0; i < 8; ++i)
> + if (a[i] != (i < 4 ? 9 + i : 16 - i))
> + __builtin_abort ();
> + f5 (&c, &b);
> + if (c.a != 0x14 || c.b != 0x13
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> + || c.c != 0x1112
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> + || c.c != 0x1211
> +#endif
> + )
> + __builtin_abort ();
> + f6 (&d, &c);
> + if (d.a != 0x11 || d.b != 0x12 || d.c != b.c)
> + __builtin_abort ();
> + struct T f;
> + f7 (&f, &e);
> + if (f.a != 0x28 || f.b != 0x27 || f.c != 0x26 || f.d != 0x25
> + || f.e != 0x24 || f.f != 0x23 || f.g != 0x22 || f.h != 0x21)
> + __builtin_abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Merging successful" 7 "store-merging" } } */
> +/* { dg-final { scan-tree-dump-times "__builtin_bswap64" 2 "store-merging" } } */
> +/* { dg-final { scan-tree-dump-times "__builtin_bswap32" 4 "store-merging" } } */
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)