I looked at the generated code without TARGET_FUSE_CMP_AND_BRANCH.
In most cases, gcc doesn't put any instructions between TEST/CMP
and JCC and we get macro-fusion optimization automatically even
if TARGET_FUSE_CMP_AND_BRANCH is off.
Since TARGET_FUSE_CMP_AND_BRANCH generates patterns with
incorrect instruction length, it prevents the block from merging and
copying, which hurt performance.
We have 2 choices:
1. Correct insn length for *jcc_fused_X patterns, which what Joey's
patch does.
2. Remove *jcc_fused_X patterns and optimize macro-fusion in Core 2
scheduling.
Given that *jcc_fused_X patterns don't buy us much, I think
we should remove them and fix the missed macro-fusion
optimization in Core 2 scheduling in the future. OK for trunk?