This is the mail archive of the
mailing list for the GCC project.
RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store
- From: "Thomas Preud'homme" <thomas dot preudhomme at arm dot com>
- To: "GCC Patches" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 17 Apr 2014 13:19:42 +0800
- Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store
- Authentication-results: sourceware.org; auth=none
- References: <000b01cf4e0e$54044b70$fc0ce250$ at arm dot com> <alpine dot DEB dot 2 dot 10 dot 1404020800200 dot 3584 at laptop-mg dot saclay dot inria dot fr> <001301cf4e41$d2a4f320$77eed960$ at arm dot com> <CAFiYyc2WTsb8CPzQK7m_aqcsy5Rjbw_2HWVJO8CVxFha_+Mkfg at mail dot gmail dot com> <003001cf4eff$74b0e210$5e12a630$ at arm dot com> <003c01cf4fc9$a0b37510$e21a5f30$ at arm dot com> <CAFiYyc2624v+tN6JskOvgjWf7B8vtKZTyvB6s3VHc_-vo5=t7w at mail dot gmail dot com>
> From: Richard Biener [mailto:firstname.lastname@example.org]
> With handling only the outermost handled-component and then only a
> selected subset you'll catch many but not all cases. Why not simply
> use get_inner_reference () here (plus stripping the constant offset
> from an innermost MEM_REF) and get the best of both worlds (not
> duplicate parts of its logic and handle more cases)? Eventually
> using tree-affine.c and get_inner_reference_aff is even more appropriate
> so you can compute the address differences without decomposing
> them yourselves.
Why does the constant offset from an innermost MEM_REF need to be
stripped? Shouldn't that be part of the offset in the symbolic number?
> + /* Compute address to load from and cast according to the size
> + of the load. */
> + load_ptr_type = build_pointer_type (load_type);
> + addr_expr = build1 (ADDR_EXPR, load_ptr_type, bswap_src);
> + addr_tmp = make_temp_ssa_name (load_ptr_type, NULL,
> + addr_stmt = gimple_build_assign_with_ops
> + (NOP_EXPR, addr_tmp, addr_expr, NULL);
> + gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT);
> + /* Perform the load. */
> + load_offset_ptr = build_int_cst (load_ptr_type, 0);
> + val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst");
> + val_expr = build2 (MEM_REF, load_type, addr_tmp, load_offset_ptr);
> + load_stmt = gimple_build_assign_with_ops
> + (MEM_REF, val_tmp, val_expr, NULL);
> this is unnecessarily complex and has TBAA issues. You don't need to
> create a "correct" pointer type, so doing
> addr_expr = fold_build_addr_expr (bswap_src);
> is enough. Now, to fix the TBAA issues you either need to remember
> and "combine" the reference_alias_ptr_type of each original load and
> use that for the load_offset_ptr value or decide that isn't worth it and
> use alias-set zero (use ptr_type_node).
Sorry this is only my second patch  to gcc so it's not all clear to me. The TBAA
issue you mention comes from val_expr referring to a memory area that
overlap with the smaller memory area used in the bitwise OR operation, am I
right? Now, I have no idea about how to do the combination of the values
returned by reference_alias_ptr_type () for each individual small memory
area. Can you advise me on this? And what are the effect of not doing it and
using ptr_type_node for the alias-set?
 First one was a fix on the existing implementation of the bswap pass.
> Can you also expand the comment about "size vs. range"? Is it
> that range can be bigger than size if you have (short)a |
> ((short)a << 1) sofar
> where size == 2 but range == 3? Thus range can also be smaller than size
> for example for (short)a | ((short)a << 1) where range would be 1 and
> size == 2? I suppose adding two examples like this to the comment, together
> with the expected value of 'n' would help here.
You understood correctly. I will add the suggested example.
> Otherwise the patch looks good. Now we're only missing the addition
> of trying to match to a VEC_PERM_EXPR with a constant mask
> using can_vec_perm_p ;)
Is that the vector shuffle engine you were mentioning in PR54733? If I
understand correctly it is a generalization of the check again CMPNOP and
CMPXCHG in find_bswap in this new patchset. I will look if ARM could
Benefit from this and if yes I might take a look (yep, two conditions).
Thanks a lot for such quick and detailed comments after my ping.