This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
Command: avr-gcc -O1 -S div32_7.c or avr-gcc -O1 -fno-split-wide-types -S div32_7.c Code size 4.1.2: 0x28 Code size 4.3.0: 0x68 Code size 4.3.0: 0x28 with -fno-split-wide-types //---------------- unsigned long udivr32_7( unsigned long a, unsigned char b, unsigned char *r ) { unsigned char i, t; for( t = 0, i = 32; i ; i-- ){ t += t; if( a & 0x80000000UL ) t++; a += a; if( t >= b ){ t -= b; a |= 1; } } *r = t; return a; } //----------------
I think this is already fixed on the trunk, fword prop was not proping as much as it should have.
I'll see about testing with Andy Hutchinson's fwprop patch at bug #35542.
Subject: Re: code bloat caused by -fsplit-wide-types Try fwprop patch it might well help. I can't tell from report where the oppertunities are missed. But anything split at combine/split won't get any benefit as fwprop passes only occur before (much to my dismay). Register allocation has a more limited forward propagtion ability (it does not simplify for one) and simplistical will remove one level of redundant moves. If we try split before combine (expanded RTL), then combine does work so well and it's a net loss. Combine on split types does not work well as it is not possible to all instructions (like compare, add). We can't split due to use of CC0. We use CC0 because I cant figure out how to prevent reloads destroying status. Dang it! -----Original Message----- From: eric dot weddington at atmel dot com <gcc-bugzilla@gcc.gnu.org> To: hutchinsonandy@aim.com Sent: Wed, 9 Apr 2008 3:04 pm Subject: [Bug target/35860] code bloat caused by -fsplit-wide-types ------- Comment #2 from eric dot weddington at atmel dot com 2008-04-09 19:04 ------- I'll see about testing with Andy Hutchinson's fwprop patch at bug #35542. -- eric dot weddington at atmel dot com changed: What |Removed |Added ------------------------------------------------------------------------- --- CC| |hutchinsonandy at aim dot | |com, eric dot weddington at | |atmel dot com GCC host triplet|winavr 20080402 release |mingw http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35860 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
Confirmed. Andy's fwprop patch from bug #35542 did not solve this.
This bug has to do with reload and additional register conflicts introduced by register lowering. In the smaller case, the register for 'a' is a call used register (often r22..r25). The avr backend code performs optimization of long OR, to give a single byte or. The worse code occurs because 'a' get assigned to call saved register. This 'long' register (r10..r13) has be pushed/popped by function. This register also cannot be used for immediate OR. So the code grows to load another register with long constant. The backend does not have any optimization for this. The difference in register allocation occurs as a side effect of wide-types. With -fno-split-wide-types' a SI (long) RTL move is used to place result 'a' ( psuedo register p48)into R22..r25. With 'split-wide-types' this is split into 4 individual QI (byte) moves of subregs p48[0]..p48[3] into R22,r23,r24 and r25 When global register allocator is trying to figure out which cpu register it should use for 'a', it looks for preferred type and conflicts. For 'wide-types' it can use preferred R22..25, with no conflict - so it does and you get small code. With split wide types, it wants to use R22..25 but it can see that the use of 'a' OVERLAPS the use of R22,R23,R24 across 3 instructions - next it tries R25 - which is not big enough or valid. The next available register of that size is R10. The conflict or overlapped access is technically incorrect. Reload is looking at p48 as a single entity rather than its subregs and is unable to spot that on a subreg basis there is no conflict. ie R22 does not conflict with p48[0], r23 does not conflict with p48[1] etc. Ok - thats what happens how do we fix? I have no idea (yet) how to deal with it directly in reload or subreg lowering. This would be best places as the problem is not confined to this testcase. ALL SUGGESTIONS WELCOME! With this problem, I noted several issues with AVR target that do not help. 1) The above example has enough free registers - the problem is that none of them are contiguous enough to hold the long value of 'a'. This is due to the fragmentation of the register set that occurs with the current allocation order. Changing the order can alleviate this. 2) Splitting logical operations would definitely remove the long OR with 1. I am not sure it would free any registers to remove the conflict. 3) Alternatively, optimisation of single byte OR on SI pattern could be done. The current *iorsi3_clobber is intended to do this but is impotent - it will not be matched by combine - or used as peephole - it needs fixin. Again, this may not help with the conflict. 4) The local register allocation was favoring LD_REGS for 't' - when any GENERAL_REG could be used. This is because *movqi pattern does not have constraint 'L' to allow GENERAL_REG 'r' to be loaded with zero. Same problem for movhi - but movsi is correct! (Alas it was not enough to free register.) Solving 1..3, would help but not cure this issue.
The point of -fsplit-wide-types was to kill patterns like iorsi3 in AVR backend.
I agree with Paolo in comment #6. One purpose of the lower-subreg path was to allow backends to *not* define insns that it doesn't have. The expanders will generate inline code for such patterns at expand time, with sets to subregs. Before GCC had lower-subreg, this would lead to awful code, but now that we split the subregs out to pseudos it ought to work just fine. Sadly, even i386 still hasn't been modified to benefit from this work...
Subject: Re: [4.3 Regression] [avr] code bloat caused by -fsplit-wide-types Yes, indeed, I have patches in progress for AVR that do split operation to take more advantage of lowering but the "bug" is still an issue then. For example, if the testcase was using PLUS instead or OR, I will not be able to split instruction. (anything with carried "status" is problematic with reload and - as yet - cannot be split) The problem will merely propagate backwards until it gets blocked by unsplit wide mode operation (PLUS, COMPARE, SUB, MULT and probabley calls). Simply put, it will occur where ever a wide mode value meets a set of subregs. Here it will determine there is a conflict - even if there is not one. -----Original Message----- From: steven at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> To: hutchinsonandy@aim.com Sent: Wed, 16 Apr 2008 4:59 am Subject: [Bug target/35860] [4.3 Regression] [avr] code bloat caused by -fsplit-wide-types ------- Comment #7 from steven at gcc dot gnu dot org 2008-04-16 08:59 ------- I agree with Paolo in comment #6. One purpose of the lower-subreg path was to allow backends to *not* define insns that it doesn't have. The expanders will generate inline code for such patterns at expand time, with sets to subregs. Before GCC had lower-subreg, this would lead to awful code, but now that we split the subregs out to pseudos it ought to work just fine. Sadly, even i386 still hasn't been modified to benefit from this work... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35860 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
4.3.2 is released, changing milestones to 4.3.3.
GCC 4.3.3 is being released, adjusting target milestone.
GCC 4.3.4 is being released, adjusting target milestone.