This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] PR18002: Undo fold_single_bit_test in do_jump
- From: Roger Sayle <roger at eyesopen dot com>
- To: Paul Schlie <schlie at comcast dot net>
- Cc: Ian Lance Taylor <ian at wasabisystems dot com>, <gcc-patches at gcc dot gnu dot org>, <gcc at gcc dot gnu dot org>
- Date: Thu, 9 Dec 2004 09:05:33 -0700 (MST)
- Subject: Re: [PATCH] PR18002: Undo fold_single_bit_test in do_jump
On Thu, 9 Dec 2004, Paul Schlie wrote:
> Thanks for the clarification of context, but the problem introduced
> by the initial transform of expressions of the form:
> if (x & <pow2-const>) ... else ... ; // which is a typical bit test
> if ((x >> <log2-const>) & 1) ... else ... ;
> Wreak havoc on targets with relatively inefficient shifts, and seems
> generally counter productive in circumstances where conditional jumps
> are inherently unavoidable, therefore would be nice, if not imperative,
> to have remedied. (which I had hoped the refinement was attempting to
I think you have a fundamental misunderstanding about how and why
code transformations are applied in modern optimizing compilers.
Many of the transformations that a compiler makes to a program do
not concern code generation at all and are often not optimizations.
The easiest to understand canonicalization transformation is for
commutative binary operators, where a compiler will transform the
tree "1 + x" into the equivalent "x + 1". Typically this will not
improve performance or even affect the code we generate. Instead
the motivation is that it greatly simplifies the task of recognizing
a tree that represents an addition by a constant. This improves
compile-time performance and reduces the amount of code needed to
write a compiler.
The next level up, might be the canonicalization of "x + x" into
the equivalent "x * 2". This transformation is critical for things
such as recognizing that "x + x + x + x" is the same as "x * 4".
However, notice that on most modern processors, a multiplication
is typically much slower than an addition. This canonicalization
would appear to hurt performance and be a pessimization. The important
thing to realize is that GCC's tree representation is an abstract
intermediate form that represents the semantics of a program, and
unlike RTL, it is not a faithful representation of the code that will
be executed. The important bit to notice is that although at the tree
level we convert "x + x" to "x * 2", which is easier to analyse and
manipulate, at RTL expansion time, we convert "x * 2" back into a
single addition before the compiler's output is generated.
Internally, "if (x & C)" has the semantics "if ((x & C) != 0)" when
interpreted by language front-ends. Which is a tree with two operators
a NE_EXPR and a BIT_AND_EXPR, which has identical semantics to
equivalent expression "(x >> C') & 1", which again has two operators
BIT_AND_EXPR and a RSHIFT_EXPR. Both are equivalent, so we can
CSE/GCSE the value of one into the use of the other, so that it makes
sense at the conceptual level that both are internally represented
The important thing to remember is that this is a semantic issue
and not a performance one. Which representation gets used for
trees is irrelevant provided that the final code we generate is
"optimal" for its context. In this particular case, it was historically
preferable to canonicalize to the shift-form, as this simplified the
task of when to undo the transformation on expansion into RTL form.
It is vital to get beyond the belief that tree-ssa form is related
to the efficiency of the final machine code. fold-const.c appears
to convert expressions into more expensive forms, gimplification
would appear to introduce large numbers of unnecessary variables and
trivial assignments between them, spliting loop headers and critical
blocks would appear to introduce large numbers of empty basic blocks
and unconditional jumps.
The truth is than none of the above should adversely affect a program's
performance, and indeed the goal of applying these transformations is
to improve the program globally, and not just tree-node by tree-node.
Hopefully, this explains why its not important that you're aware of
a tree transformation that appears counter-intuitive, only that the
code which is generated for the PRs is at least as efficient as it
was for 3.3.1, but also demonstrably more correct.