[PATCH] __builtin_assume_aligned
Richard Guenther
richard.guenther@gmail.com
Mon Jun 27 10:59:00 GMT 2011
On Fri, Jun 24, 2011 at 4:22 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch introduces a new extension, to hint the compiler
> that a pointer is guaranteed to be somehow aligned (or misaligned).
> It is designed as a pass-thru builtin which just returns its first
> argument, so that it is more obvious where we can assume how it is aligned.
> Otherwise it is similar to ICC's __assume_aligned, so for lvalue first
> argument ICC's __assume_aligned can be emulated using
> #define __assume_aligned(lvalueptr, align) lvalueptr = __builtin_assume_aligned (lvalueptr, align)
> ICC doesn't allow side-effects in the arguments of this, GCC does,
> so one can e.g. write:
> void
> foo (std::vector<double> &vec)
> {
> double *__restrict data = (double *) __builtin_assume_aligned (vec.data (), 16);
> ...
> }
> to hint gcc that it can assume the vector has its data () 16 byte aligned
> (which is true e.g. on x86_64-linux if using standard malloc based
> allocator, which guarantees 2 * sizeof (void*) alignment). E.g. vectorizer
> can use that hint to generate aligned stores/loads instead of unaligned
> ones.
>
> Maybe we should have also __builtin_likely_aligned, which would be similar,
> just wouldn't guarantee such an alignment, just say it is very likely. If
> vectorizer decided to version a loop, for the fast alternative it could
> check the alignment in the versioning condition and assume the likely
> aligned alignment in the fast vectorized version and let the unlikely
> non-aligned case use slower scalar loop. But that can be done separately.
>
> The builtin can have either two or three arguments, the second is
> alignment and third is misalignment (i.e. that
> (uintptr_t) ((char *) firstarg - misalign) & (align - 1) == 0
> ). I've been contemplating to make the builtin overloaded, have return
> type be always the type of the first argument if it is pointer/reference
> type, like template <typename T> T __builtin_assume_aligned (T, size_t, ...);
> both in C and C++, but I think it would be too difficult to make it work
> that way, so the builtin is instead
> void *__builtin_assume_aligned (const void *, size_t, ...);
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok if you remove the builtins.c folding and instead verify arguments
from check_builtin_function_arguments.
Thanks,
Richard.
> 2011-06-24 Jakub Jelinek <jakub@redhat.com>
>
> * builtin-types.def (BT_FN_PTR_CONST_PTR_SIZE_VAR): New.
> * builtins.def (BUILT_IN_ASSUME_ALIGNED): New builtin.
> * tree-ssa-structalias.c (find_func_aliases_for_builtin_call,
> find_func_clobbers): Handle BUILT_IN_ASSUME_ALIGNED.
> * tree-ssa-ccp.c (bit_value_assume_aligned): New function.
> (evaluate_stmt, execute_fold_all_builtins): Handle
> BUILT_IN_ASSUME_ALIGNED.
> * tree-ssa-dce.c (propagate_necessity): Likewise.
> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
> call_may_clobber_ref_p_1): Likewise.
> * builtins.c (is_simple_builtin, fold_builtin_varargs,
> expand_builtin): Likewise.
> (expand_builtin_assume_aligned, fold_builtin_assume_aligned):
> New functions.
> * doc/extend.texi (__builtin_assume_aligned): Document.
>
> * gcc.dg/builtin-assume-aligned-1.c: New test.
> * gcc.dg/builtin-assume-aligned-2.c: New test.
> * gcc.target/i386/builtin-assume-aligned-1.c: New test.
>
> --- gcc/builtin-types.def.jj 2011-06-21 16:45:42.000000000 +0200
> +++ gcc/builtin-types.def 2011-06-23 11:25:03.000000000 +0200
> @@ -454,6 +454,8 @@ DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_CONST
> BT_INT, BT_CONST_STRING, BT_CONST_STRING)
> DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_INT_CONST_STRING_VAR,
> BT_INT, BT_INT, BT_CONST_STRING)
> +DEF_FUNCTION_TYPE_VAR_2 (BT_FN_PTR_CONST_PTR_SIZE_VAR, BT_PTR,
> + BT_CONST_PTR, BT_SIZE)
>
> DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_STRING_SIZE_CONST_STRING_VAR,
> BT_INT, BT_STRING, BT_SIZE, BT_CONST_STRING)
> --- gcc/builtins.def.jj 2011-06-21 16:46:01.000000000 +0200
> +++ gcc/builtins.def 2011-06-23 11:25:03.000000000 +0200
> @@ -1,7 +1,7 @@
> /* This file contains the definitions and documentation for the
> builtins used in the GNU compiler.
> Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
> - 2010 Free Software Foundation, Inc.
> + 2010, 2011 Free Software Foundation, Inc.
>
> This file is part of GCC.
>
> @@ -638,6 +638,7 @@ DEF_EXT_LIB_BUILTIN (BUILT_IN_EXE
> DEF_EXT_LIB_BUILTIN (BUILT_IN_EXECVE, "execve", BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING, ATTR_NOTHROW_LIST)
> DEF_LIB_BUILTIN (BUILT_IN_EXIT, "exit", BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LIST)
> DEF_GCC_BUILTIN (BUILT_IN_EXPECT, "expect", BT_FN_LONG_LONG_LONG, ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_GCC_BUILTIN (BUILT_IN_ASSUME_ALIGNED, "assume_aligned", BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST)
> DEF_GCC_BUILTIN (BUILT_IN_EXTEND_POINTER, "extend_pointer", BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
> DEF_GCC_BUILTIN (BUILT_IN_EXTRACT_RETURN_ADDR, "extract_return_addr", BT_FN_PTR_PTR, ATTR_LEAF_LIST)
> DEF_EXT_LIB_BUILTIN (BUILT_IN_FFS, "ffs", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
> --- gcc/tree-ssa-structalias.c.jj 2011-06-23 10:13:58.000000000 +0200
> +++ gcc/tree-ssa-structalias.c 2011-06-23 11:25:04.000000000 +0200
> @@ -4002,6 +4002,7 @@ find_func_aliases_for_builtin_call (gimp
> case BUILT_IN_STPCPY_CHK:
> case BUILT_IN_STRCAT_CHK:
> case BUILT_IN_STRNCAT_CHK:
> + case BUILT_IN_ASSUME_ALIGNED:
> {
> tree res = gimple_call_lhs (t);
> tree dest = gimple_call_arg (t, (DECL_FUNCTION_CODE (fndecl)
> @@ -4726,6 +4727,7 @@ find_func_clobbers (gimple origt)
> return;
> }
> /* The following functions neither read nor clobber memory. */
> + case BUILT_IN_ASSUME_ALIGNED:
> case BUILT_IN_FREE:
> return;
> /* Trampolines are of no interest to us. */
> --- gcc/tree-ssa-ccp.c.jj 2011-06-23 10:13:58.000000000 +0200
> +++ gcc/tree-ssa-ccp.c 2011-06-23 15:17:16.000000000 +0200
> @@ -1476,6 +1476,64 @@ bit_value_binop (enum tree_code code, tr
> return val;
> }
>
> +/* Return the propagation value when applying __builtin_assume_aligned to
> + its arguments. */
> +
> +static prop_value_t
> +bit_value_assume_aligned (gimple stmt)
> +{
> + tree ptr = gimple_call_arg (stmt, 0), align, misalign = NULL_TREE;
> + tree type = TREE_TYPE (ptr);
> + unsigned HOST_WIDE_INT aligni, misaligni = 0;
> + prop_value_t ptrval = get_value_for_expr (ptr, true);
> + prop_value_t alignval;
> + double_int value, mask;
> + prop_value_t val;
> + if (ptrval.lattice_val == UNDEFINED)
> + return ptrval;
> + gcc_assert ((ptrval.lattice_val == CONSTANT
> + && TREE_CODE (ptrval.value) == INTEGER_CST)
> + || double_int_minus_one_p (ptrval.mask));
> + align = gimple_call_arg (stmt, 1);
> + if (!host_integerp (align, 1))
> + return ptrval;
> + aligni = tree_low_cst (align, 1);
> + if (aligni <= 1
> + || (aligni & (aligni - 1)) != 0)
> + return ptrval;
> + if (gimple_call_num_args (stmt) > 2)
> + {
> + misalign = gimple_call_arg (stmt, 2);
> + if (!host_integerp (misalign, 1))
> + return ptrval;
> + misaligni = tree_low_cst (misalign, 1);
> + if (misaligni >= aligni)
> + return ptrval;
> + }
> + align = build_int_cst_type (type, -aligni);
> + alignval = get_value_for_expr (align, true);
> + bit_value_binop_1 (BIT_AND_EXPR, type, &value, &mask,
> + type, value_to_double_int (ptrval), ptrval.mask,
> + type, value_to_double_int (alignval), alignval.mask);
> + if (!double_int_minus_one_p (mask))
> + {
> + val.lattice_val = CONSTANT;
> + val.mask = mask;
> + gcc_assert ((mask.low & (aligni - 1)) == 0);
> + gcc_assert ((value.low & (aligni - 1)) == 0);
> + value.low |= misaligni;
> + /* ??? Delay building trees here. */
> + val.value = double_int_to_tree (type, value);
> + }
> + else
> + {
> + val.lattice_val = VARYING;
> + val.value = NULL_TREE;
> + val.mask = double_int_minus_one;
> + }
> + return val;
> +}
> +
> /* Evaluate statement STMT.
> Valid only for assignments, calls, conditionals, and switches. */
>
> @@ -1647,6 +1705,10 @@ evaluate_stmt (gimple stmt)
> val = get_value_for_expr (gimple_call_arg (stmt, 0), true);
> break;
>
> + case BUILT_IN_ASSUME_ALIGNED:
> + val = bit_value_assume_aligned (stmt);
> + break;
> +
> default:;
> }
> }
> @@ -2186,6 +2248,11 @@ execute_fold_all_builtins (void)
> result = integer_zero_node;
> break;
>
> + case BUILT_IN_ASSUME_ALIGNED:
> + /* Remove __builtin_assume_aligned. */
> + result = gimple_call_arg (stmt, 0);
> + break;
> +
> case BUILT_IN_STACK_RESTORE:
> result = optimize_stack_restore (i);
> if (result)
> --- gcc/tree-ssa-dce.c.jj 2011-06-23 10:13:58.000000000 +0200
> +++ gcc/tree-ssa-dce.c 2011-06-23 11:25:05.000000000 +0200
> @@ -837,7 +837,8 @@ propagate_necessity (struct edge_list *e
> || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE
> || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
> || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
> - || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE))
> + || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
> + || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED))
> continue;
>
> /* Calls implicitly load from memory, their arguments
> --- gcc/tree-ssa-alias.c.jj 2011-06-23 10:13:58.000000000 +0200
> +++ gcc/tree-ssa-alias.c 2011-06-23 11:25:05.000000000 +0200
> @@ -1253,6 +1253,7 @@ ref_maybe_used_by_call_p_1 (gimple call,
> case BUILT_IN_SINCOS:
> case BUILT_IN_SINCOSF:
> case BUILT_IN_SINCOSL:
> + case BUILT_IN_ASSUME_ALIGNED:
> return false;
> /* __sync_* builtins and some OpenMP builtins act as threading
> barriers. */
> @@ -1511,6 +1512,7 @@ call_may_clobber_ref_p_1 (gimple call, a
> return false;
> case BUILT_IN_STACK_SAVE:
> case BUILT_IN_ALLOCA:
> + case BUILT_IN_ASSUME_ALIGNED:
> return false;
> /* Freeing memory kills the pointed-to memory. More importantly
> the call has to serve as a barrier for moving loads and stores
> --- gcc/builtins.c.jj 2011-06-22 10:16:56.000000000 +0200
> +++ gcc/builtins.c 2011-06-23 11:25:05.000000000 +0200
> @@ -4604,6 +4604,23 @@ expand_builtin_expect (tree exp, rtx tar
> return target;
> }
>
> +/* Expand a call to __builtin_assume_aligned. We just return our first
> + argument as the builtin_assume_aligned semantic should've been already
> + executed by CCP. */
> +
> +static rtx
> +expand_builtin_assume_aligned (tree exp, rtx target)
> +{
> + if (call_expr_nargs (exp) < 2)
> + return const0_rtx;
> + target = expand_expr (CALL_EXPR_ARG (exp, 0), target, VOIDmode,
> + EXPAND_NORMAL);
> + gcc_assert (!TREE_SIDE_EFFECTS (CALL_EXPR_ARG (exp, 1))
> + && (call_expr_nargs (exp) < 3
> + || !TREE_SIDE_EFFECTS (CALL_EXPR_ARG (exp, 2))));
> + return target;
> +}
> +
> void
> expand_builtin_trap (void)
> {
> @@ -5823,6 +5840,8 @@ expand_builtin (tree exp, rtx target, rt
> return expand_builtin_va_copy (exp);
> case BUILT_IN_EXPECT:
> return expand_builtin_expect (exp, target);
> + case BUILT_IN_ASSUME_ALIGNED:
> + return expand_builtin_assume_aligned (exp, target);
> case BUILT_IN_PREFETCH:
> expand_builtin_prefetch (exp);
> return const0_rtx;
> @@ -9352,6 +9371,31 @@ fold_builtin_fpclassify (location_t loc,
> return res;
> }
>
> +/* Diagnose invalid uses of __builtin_assume_aligned. */
> +
> +static tree
> +fold_builtin_assume_aligned (location_t loc, tree fndecl, tree exp)
> +{
> + int nargs = call_expr_nargs (exp);
> +
> + if (nargs < 2)
> + return NULL_TREE;
> + if (nargs > 3)
> + {
> + error_at (loc, "%<__builtin_assume_aligned%> must have 2 or 3 arguments");
> + return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
> + CALL_EXPR_ARG (exp, 0));
> + }
> + if (nargs == 3 && !validate_arg (CALL_EXPR_ARG (exp, 2), INTEGER_TYPE))
> + {
> + error_at (loc,
> + "%<__builtin_assume_aligned%> last operand must have integer type");
> + return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)),
> + CALL_EXPR_ARG (exp, 0));
> + }
> + return NULL_TREE;
> +}
> +
> /* Fold a call to an unordered comparison function such as
> __builtin_isgreater(). FNDECL is the FUNCTION_DECL for the function
> being called and ARG0 and ARG1 are the arguments for the call.
> @@ -10266,6 +10310,9 @@ fold_builtin_varargs (location_t loc, tr
> ret = fold_builtin_fpclassify (loc, exp);
> break;
>
> + case BUILT_IN_ASSUME_ALIGNED:
> + return fold_builtin_assume_aligned (loc, fndecl, exp);
> +
> default:
> break;
> }
> @@ -13461,6 +13508,7 @@ is_simple_builtin (tree decl)
> case BUILT_IN_OBJECT_SIZE:
> case BUILT_IN_UNREACHABLE:
> /* Simple register moves or loads from stack. */
> + case BUILT_IN_ASSUME_ALIGNED:
> case BUILT_IN_RETURN_ADDRESS:
> case BUILT_IN_EXTRACT_RETURN_ADDR:
> case BUILT_IN_FROB_RETURN_ADDR:
> --- gcc/doc/extend.texi.jj 2011-06-21 16:45:44.000000000 +0200
> +++ gcc/doc/extend.texi 2011-06-24 12:36:34.000000000 +0200
> @@ -7646,6 +7646,28 @@ int g (int c)
>
> @end deftypefn
>
> +@deftypefn {Built-in Function} void *__builtin_assume_aligned (const void *@var{exp}, size_t @var{align}, ...)
> +This function returns its first argument, and allows the compiler
> +to assume that the returned pointer is at least @var{align} bytes
> +aligned. This built-in can have either two or three arguments,
> +if it has three, the third argument should have integer type, and
> +if it is non-zero means misalignment offset. For example:
> +
> +@smallexample
> +void *x = __builtin_assume_aligned (arg, 16);
> +@end smallexample
> +
> +means that the compiler can assume x, set to arg, is at least
> +16 byte aligned, while:
> +
> +@smallexample
> +void *x = __builtin_assume_aligned (arg, 32, 8);
> +@end smallexample
> +
> +means that the compiler can assume for x, set to arg, that
> +(char *) x - 8 is 32 byte aligned.
> +@end deftypefn
> +
> @deftypefn {Built-in Function} void __builtin___clear_cache (char *@var{begin}, char *@var{end})
> This function is used to flush the processor's instruction cache for
> the region of memory between @var{begin} inclusive and @var{end}
> --- gcc/testsuite/gcc.dg/builtin-assume-aligned-1.c.jj 2011-06-24 12:56:21.000000000 +0200
> +++ gcc/testsuite/gcc.dg/builtin-assume-aligned-1.c 2011-06-24 13:05:45.000000000 +0200
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +void
> +test1 (double *out1, double *out2, double *out3, double *in1,
> + double *in2, int len)
> +{
> + int i;
> + double *__restrict o1 = __builtin_assume_aligned (out1, 16);
> + double *__restrict o2 = __builtin_assume_aligned (out2, 16);
> + double *__restrict o3 = __builtin_assume_aligned (out3, 16);
> + double *__restrict i1 = __builtin_assume_aligned (in1, 16);
> + double *__restrict i2 = __builtin_assume_aligned (in2, 16);
> + for (i = 0; i < len; ++i)
> + {
> + o1[i] = i1[i] * i2[i];
> + o2[i] = i1[i] + i2[i];
> + o3[i] = i1[i] - i2[i];
> + }
> +}
> +
> +void
> +test2 (double *out1, double *out2, double *out3, double *in1,
> + double *in2, int len)
> +{
> + int i, align = 32, misalign = 16;
> + out1 = __builtin_assume_aligned (out1, align, misalign);
> + out2 = __builtin_assume_aligned (out2, align, 16);
> + out3 = __builtin_assume_aligned (out3, 32, misalign);
> + in1 = __builtin_assume_aligned (in1, 32, 16);
> + in2 = __builtin_assume_aligned (in2, 32, 0);
> + for (i = 0; i < len; ++i)
> + {
> + out1[i] = in1[i] * in2[i];
> + out2[i] = in1[i] + in2[i];
> + out3[i] = in1[i] - in2[i];
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "__builtin_assume_aligned" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> --- gcc/testsuite/gcc.dg/builtin-assume-aligned-2.c.jj 2011-06-24 13:00:45.000000000 +0200
> +++ gcc/testsuite/gcc.dg/builtin-assume-aligned-2.c 2011-06-24 13:01:36.000000000 +0200
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +
> +double *bar (void);
> +
> +void
> +foo (double *ptr, int i)
> +{
> + double *a = __builtin_assume_aligned (ptr, 16, 8, 7); /* { dg-error "must have 2 or 3 arguments" } */
> + double *b = __builtin_assume_aligned (bar (), 16);
> + double *c = __builtin_assume_aligned (bar (), 16, 8);
> + double *d = __builtin_assume_aligned (ptr, i, ptr); /* { dg-error "last operand must have integer type" } */
> + *a = 0.0;
> + *b = 0.0;
> + *c = 0.0;
> + *d = 0.0;
> +}
> --- gcc/testsuite/gcc.target/i386/builtin-assume-aligned-1.c.jj 2011-06-24 13:02:57.000000000 +0200
> +++ gcc/testsuite/gcc.target/i386/builtin-assume-aligned-1.c 2011-06-24 13:05:28.000000000 +0200
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -msse2 -mno-avx" } */
> +
> +void
> +test1 (double *out1, double *out2, double *out3, double *in1,
> + double *in2, int len)
> +{
> + int i;
> + double *__restrict o1 = __builtin_assume_aligned (out1, 16);
> + double *__restrict o2 = __builtin_assume_aligned (out2, 16);
> + double *__restrict o3 = __builtin_assume_aligned (out3, 16);
> + double *__restrict i1 = __builtin_assume_aligned (in1, 16);
> + double *__restrict i2 = __builtin_assume_aligned (in2, 16);
> + for (i = 0; i < len; ++i)
> + {
> + o1[i] = i1[i] * i2[i];
> + o2[i] = i1[i] + i2[i];
> + o3[i] = i1[i] - i2[i];
> + }
> +}
> +
> +void
> +test2 (double *out1, double *out2, double *out3, double *in1,
> + double *in2, int len)
> +{
> + int i, align = 32, misalign = 16;
> + out1 = __builtin_assume_aligned (out1, align, misalign);
> + out2 = __builtin_assume_aligned (out2, align, 16);
> + out3 = __builtin_assume_aligned (out3, 32, misalign);
> + in1 = __builtin_assume_aligned (in1, 32, 16);
> + in2 = __builtin_assume_aligned (in2, 32, 0);
> + for (i = 0; i < len; ++i)
> + {
> + out1[i] = in1[i] * in2[i];
> + out2[i] = in1[i] + in2[i];
> + out3[i] = in1[i] - in2[i];
> + }
> +}
> +
> +/* { dg-final { scan-assembler-not "movhpd" } } */
> +/* { dg-final { scan-assembler-not "movlpd" } } */
>
> Jakub
>
More information about the Gcc-patches
mailing list