User account creation filtered due to spam.
The following C code: unsigned char lfsr (unsigned long number) { unsigned char b = 0; if (number & (1L << 29)) b++; if (number & (1L << 13)) b++; return b; } compiles to a right shift 29 bits of number which is very expensive because AVR has no barrel shifter. Instead, a bit-testing instruction could be used which takes just a few cycles and not more than 100 like with the right shift. 4.6.2 uses a bit testing instruction. == Command line ==
== Command Line == $ avr-gcc -S foo.c -Os -dp with avr-gcc GCC: (GNU) 4.8.0 20120915 (experimental) Set component to "other", dunno if avr backend is the culprit or the middle-end disregarding costs.
Created attachment 28597 [details] foo.c Test case
Created attachment 28598 [details] foo.s (from 4.8) Assembler output that shows the expensive shift loop in insn 44.
Created attachment 28599 [details] foo.s (4.6.2, good) Assembler output from 4.6.2 that generates a bit test instruction.
At trunk r196410, the .164t.optimized dump looks like this: lfsr (long unsigned int number) { unsigned char b; long unsigned int _4; long unsigned int _5; _Bool _8; <bb 2>: _4 = number_3(D) & 536870912; _8 = _4 != 0; b_10 = (unsigned char) _8; _5 = number_3(D) & 8192; if (_5 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: b_6 = b_10 + 1; <bb 4>: # b_2 = PHI <b_10(2), b_6(3)> return b_2; } and exact_log2 could be used to identify the and-instruction as a bit test instruction: (gdb) p exact_log2(536870912) $1 = 29 (gdb) AFAIK there are no named patterns for bit tests, perhaps there should be. (http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names) So at least on x86_64 the test gets expanded as a shift: 6: {r65:DI=r64:DI 0>>0x1d;clobber flags:CC;} 7: {r59:QI=r65:DI#0&0x1;clobber flags:CC;} 8: {r66:DI=r64:DI&0x2000;clobber flags:CC;} 9: flags:CCZ=cmp(r66:DI,0) i.e. also far from optimal.
Note that a bit-test instruction can only be used if the and only feeds a comparison against zero. Not sure how canonical bit-test patterns should look like, but combine should be able to work this out, no? Unless we mess things up earlier during generic expansion (which seems to be the case)
GCC 4.7.3 is being released, adjusting target milestone.
(In reply to Richard Biener from comment #6) > Note that a bit-test instruction can only be used if the and only feeds a > comparison against zero. Not sure how canonical bit-test patterns should > look like, but combine should be able to work this out, no? Unless we > mess things up earlier during generic expansion (which seems to be the > case) A single bit test pattern such as (taken from SH) (define_insn "tstsi_t" [(set (reg:SI T_REG) (eq:SI (and:SI (match_operand:SI 0 "logical_operand" "%z,r") (match_operand:SI 1 "logical_operand" "K08,r")) (const_int 0)))] is not enough. If the second operand is a constant combine will try out various variations (canonicalized bit extractions) such as zero_extend or [(set (reg:SI T_REG) (and:SI (not:SI (match_operand:SI 0 "logical_operand" "z")) (const_int 1)))]
The first "if (...) b++;" is transformed to a bit extraction (right shift + and), because the result is either b = 0 or b = 1. The second "if (...) b++" uses an and + zero-compare + branch around add. The and + zero-compare are then combined to a bit test insn. The first bit extraction could be turned into a bit test followed by a test result store by implementing the according zero_extract combine pattern: (set (reg:SI 169) (zero_extract:SI (reg/v:SI 165 [ number ]) (const_int 1 [0x1]) (const_int 29 [0x1d]))) Although doing so resulted in problems when matching bit test insns, if I remember correctly. The second bit test + branch + add is a bit more difficult, as it is always expanded as a branch + add. On SH there are multiple minimal sequences, depending on the context / surrounding code. The following branchless variant could used if the tested reg dies after the tests and the whole thing is not inside a (inner) loop: mov r4,r0 // r0 = number shlr8 r0 // r0 = r0 >> 8 (logical shift) tst #(1 << (13-8)),r0 // T = (r0 & (1 << (13-8))) == 0 shlr8 r0 // r0 = r0 >> 8 (logical shift) movrt r1 // r1 = !T tst #(1 << (29-16)),r0 // T = (r0 & (1 << (26-16))) == 0 movrt r0 // r0 = !T rts add r1,r0 // r0 = r0 + r1 If the code is in a loop, it's more efficient to load constants (which might require a constant pool on SH): mov.l (1 << 13),r1 mov.l (1 << 29),r2 mov #0,r0 ... loop: ... // r4 = number from somewhere tst r1,r4 // T = (r4 & (1 << 13)) == 0 movrt r3 // r3 = !T tst r2,r4 // T = (r4 & (1 << 29)) == 0 add r3,r0 // r0 = r0 + r3 movrt r3 // r3 = !T add r3,r0 ... <cbranch loop> Using shift + and is usually worse on SH for these kind of sequences. I've tried adding the standard name pattern extzv<mode> but it doesn't seem to be used neither for the first if (...) nor for the second during RTL expansion. Maybe if-convert could be taught to transform the second if (...) to a zero_extract as well. But probably it's better to catch this earlier.
(In reply to Oleg Endo from comment #9) > > Maybe if-convert could be taught to transform the second if (...) to a > zero_extract as well. But probably it's better to catch this earlier. ... which would make the original example unsigned char lfsr (unsigned long number) { unsigned char b = 0; if (number & (1L << 29)) b++; if (number & (1L << 13)) b++; return b; } produce the same code as unsigned char lfsr (unsigned long number) { return ((number & (1L << 29)) != 0) + ((number & (1L << 13)) != 0); }
The 4.7 branch is being closed, moving target milestone to 4.8.4.
GCC 4.8.4 has been released.
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
GCC 4.9.3 has been released.
GCC 4.9 branch is being closed
Author: gjl Date: Thu Aug 4 07:50:53 2016 New Revision: 239116 URL: https://gcc.gnu.org/viewcvs?rev=239116&root=gcc&view=rev Log: PR 55181 * config/avr/avr.md: New pattern to work around do_store_flag generating shift instructions for bit extractions. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/avr.md
For now and for avr, hacked around the issue by supplying a pattern for the extract...