This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

__attribute__((early_branch))


Hello GCC-team,

I use GCC for all my C and C++ programs. I know how to use GCC, but I am
not a contributor to GCC (yet). I often discover some problems C and C++
code have in general. There is often the choice between fast or readable
code. Some code I have seen performs well but looks ugly (goto, etc.);
other code is simple, but has some performance problems. What if we could
make the simple code perform well?

There is a common problem with conditional branches inside loops. This can
decrease the performance of a program. To fix this issue, the conditional
branch should be moved outside of the loop. Sometimes this optimization is
done by the compiler, but guessing on optimizations done by the compiler is
really bad. Often it is not easy to transform the source code to have the
conditional branching outside the loop. Instead I propose a new attribute,
which forces the compiler to do a conditional branch (based on the
annotated parameter) at the beginning of a function. It branches to the
corresponding code of the function compiled with the value being constant.

Here is a code example, which contains this issue.

    enum reduce_op
    {
        REDUCE_ADD,
        REDUCE_MULT,
        REDUCE_MIN,
        REDUCE_MAX
    };

    /* performance critical function */
    void reduce_data(enum reduce_op reduce,
                     unsigned const* data,
                     unsigned data_size)
    {
        unsigned i, result, item;

        result = reduce == REDUCE_MULT ?  1u
               : reduce == REDUCE_MIN  ? ~0u // ~0u is UINT_MAX
               :                          0u;

        for(i = 0; i < data_size; ++i)
        {
            item = data[i];

            switch(reduce)
            {
                case REDUCE_ADD:
                    result += item;
                break;

                case REDUCE_MULT:
                    result *= item;
                break;

                case REDUCE_MIN:
                    if(item < result) result = item;
                    // RIP: result <?= item;
                break;

                case REDUCE_MAX:
                    if(item > result) result = item;
                    // RIP: result >?= item;
                break;
            }
        }

        return result;
    }

The value of  reduce  does not change inside the function. For this
example, the optimization is trivial. But consider more complex examples.
The function should be optimized to:

    void reduce_data(enum reduce_op reduce,
                     unsigned const* data,
                     unsigned data_size)
    {
        unsigned i, result, item;

        switch(reduce)
        {
            case REDUCE_ADD:
                result = 0;

                for(i = 0; i < data_size; ++i)
                    result += data[i];

                return result;

            case REDUCE_MULT:
                result = 1;

                for(i = 0; i < data_size; ++i)
                    result *= data[i];

                return result;

            case REDUCE_MIN:
                result = ~0u;

                for(i = 0; i < data_size; ++i)
                {
                    item = data[i];
                    if(item < result) result = item;
                }

                return result;

            case REDUCE_MAX:
                result = 0;

                for(i = 0; i < data_size; ++i)
                {
                    item = data[i];
                    if(item > result) result = item;
                }

                return result;

            default: return 0u;
        }
    }

As I have mentioned, the value is treated as constant in the code. The
compiler simply compiles the body of the function for each enum-value as if
it was constant and puts it in a conditional branch. This means that
__builtin_constant_p evaluates to true/1, when the parameter has the
attribute. But duplicate code should also be avoided, the initialization of
the loop counter  i  could be moved before the branch.

You might ask, what about values which are not declared in the enum? The
compiler simply compiles a default case as in the example code above. The
default case equals to the normal code (as if the variable was not
annotated with the attribute), except that the compiler knows, that the
enum value is never a value declared in the enum. In this case the return
value is always 0. In the default case __builtin_constant_p with the
annotated variable evaluates to false. (But as  reduce == REDUCE_ADD  is
always false  __builtin_constant_p(reduce == REDUCE_ADD)  should be always
true.)

The attribute can only be applied in the implementation of a function. I
call the attribute early_branch. The usage of the attribute could be

    void reduce_data(enum reduce_op reduce __attribute__((early_branch)),
                     unsigned const* data,
                     unsigned data_size)
    {
        ...
    }


The way to remove the default case could be to just test against non-enum
values and insert __builtin_unreachable()

    if(reduce != REDUCE_ADD  &&
       reduce != REDUCE_MULT &&
       reduce != REDUCE_MIN  &&
       reduce != REDUCE_MAX) __builtin_unreachable();

Or test if it is not constant

    if(!__builtin_constant_p(reduce)) __builtin_unreachable();

For this case, there could be a problem when the proposed attribute is not
available. There could be either a builtin __builtin_default_branch_p or a
macro definition.

    #if __has_attribute(early_branch)
    #define is_default_branch(v) (!__builtin_constant_p(v))
    #else
    #define is_default_branch(v) 0
    #endif


The attribute can take additional parameters, which can specify the values
to compile the code seperately for, this can include non-enum values

    void do_stuff(unsigned element_size __attribute__((early_branch(1, 2,
4, 8)))) { ... }

The bool/_Bool type is interpreted as an enum with two values.


What if the variable changes?

When the variable changes there should be a jump to the corresponding
parallel code of the compiled code with new value.

Unoptimized

    /* removes comments starting with # and ending in a newline */
    void remove_comment(char* dst,
                        char const* src)
    {
        // initialization nessecary
        int state __attribute__((early_branch(0, 1))) = 0;

        char c;

        while(*src)
        {
            c = *src++;

            switch(state)
            {
                case 0:
                    if(c == '#')
                        state = 1;
                    else
                        *dst++ = c;

                    break;

                case 1:
                    if(c == '\n')
                    {
                        *dst++ = '\n';
                        state = 0;
                    }

                    break;
            }
        }
        *dst = '\0';
    }

changed to

    void remove_comment(char* dst,
                        char const* src)
    {
        char c;

        switch(0)
        {
            case 0:
                while(*src)
                {
                    c = *src++;
                    if(c == '#')
                        goto branch1;
                    else
                        *dst++ = c;
                    branch0:
                }
                break;

            case 1:
                while(*src)
                {
                    if(*src++ == '\n')
                    {
                        *dst++ = '\n';
                        goto branch0;
                    }
                    branch1:
                }
                break;
        }
        *dst = '\0';
    }

You see that the state is not tested in each iteration of the loop(s). For
complex cases (like parsers), this can improve performance. This attribute
is needed to avoid such a goto usage. I have seen such code, which is
unmaintainable but performs better than the first solution.

I hope that some of this is implementable. I'd like to discuss about this
and want to know what you think.

cmdLP


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]