Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 35860
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Andreas Kaiser <a.kaiser@gmx.net>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 35860 depends on: Show dependency tree
Show dependency graph
Bug 35860 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2008-04-09 22:09 Opened: 2008-04-07 21:35
Command:
  avr-gcc -O1 -S div32_7.c
or
  avr-gcc -O1 -fno-split-wide-types -S div32_7.c

Code size 4.1.2: 0x28
Code size 4.3.0: 0x68
Code size 4.3.0: 0x28 with -fno-split-wide-types

//----------------
unsigned long udivr32_7( unsigned long a, unsigned char b, unsigned char *r )
{
  unsigned char i, t;

  for(  t = 0, i = 32; i ; i-- ){
    t += t;
    if( a & 0x80000000UL )
      t++;
    a += a;
    if( t >= b ){
      t -= b;
      a |= 1;
    }
  }
  *r = t;
  return a;
}
//----------------

------- Comment #1 From Andrew Pinski 2008-04-07 21:38 -------
I think this is already fixed on the trunk, fword prop was not proping as much
as it should have.

------- Comment #2 From Eric Weddington 2008-04-09 19:04 -------
I'll see about testing with Andy Hutchinson's fwprop patch at bug #35542.

------- Comment #3 From Andy Hutchinson 2008-04-09 19:24 -------
Subject: Re:  code bloat caused by -fsplit-wide-types

Try fwprop patch it might well help.

I can't tell from report where the oppertunities are missed.

But anything split at combine/split won't get any benefit as fwprop 
passes only occur before (much to my dismay).

Register allocation has a more  limited forward propagtion ability (it 
does not simplify for one) and simplistical will remove one level of 
redundant moves.

If we try split before combine (expanded RTL), then combine does work 
so well and it's a net loss.

Combine on split types does not work well as it is not possible to all 
instructions  (like compare, add).

We can't split due to use of CC0. We use CC0 because I cant figure out 
how to prevent reloads destroying status.

Dang it!



-----Original Message-----
From: eric dot weddington at atmel dot com <gcc-bugzilla@gcc.gnu.org>
To: hutchinsonandy@aim.com
Sent: Wed, 9 Apr 2008 3:04 pm
Subject: [Bug target/35860] code bloat caused by -fsplit-wide-types




------- Comment #2 from eric dot weddington at atmel dot com  
2008-04-09 19:04
-------
I'll see about testing with Andy Hutchinson's fwprop patch at bug 
#35542.


--

eric dot weddington at atmel dot com changed:

           What    |Removed                     |Added
-------------------------------------------------------------------------
---
                  CC|                            |hutchinsonandy at aim 
dot
                    |                            |com, eric dot 
weddington at
                   |                            |atmel dot com
   GCC host triplet|winavr 20080402 release     |mingw


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35860

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

------- Comment #4 From Eric Weddington 2008-04-09 22:09 -------
Confirmed.
Andy's fwprop patch from bug #35542 did not solve this.

------- Comment #5 From Andy Hutchinson 2008-04-13 00:33 -------
This bug has to do with reload and additional register conflicts introduced by
register lowering.

In the smaller case, the register for 'a' is a call used register (often
r22..r25). The avr backend code performs optimization of long OR, to give a
single byte or.

The worse code occurs because 'a' get assigned to call saved register. This
'long' register (r10..r13) has  be pushed/popped by function. This register
also cannot be used for immediate OR. So the code grows to load another
register with long constant. The backend does not have any optimization for
this.

The difference in register allocation occurs as a side effect of wide-types.

With -fno-split-wide-types' a SI (long) RTL move is used to place result 'a'  (
psuedo register p48)into R22..r25.

With 'split-wide-types' this is split into 4 individual QI (byte) moves of
subregs p48[0]..p48[3] into R22,r23,r24 and r25

When global register allocator is trying to figure out which cpu register it
should use for 'a', it looks for preferred type and conflicts. For 'wide-types'
it can use preferred R22..25, with no conflict - so it does and you get small
code.

With split wide types, it wants to use R22..25 but it can see that the use of
'a' OVERLAPS the use of R22,R23,R24 across 3 instructions  - next it  tries R25
- which is not big enough or valid. The next available register of that size is
R10.

The conflict or overlapped access is technically incorrect. Reload is looking
at p48 as a single entity rather than its subregs and is unable to spot that on
a subreg basis there is no conflict. ie R22 does not conflict with p48[0], r23
does not conflict with p48[1] etc.

Ok - thats what happens how do we fix? 

I have no idea (yet) how to deal with it directly in reload or subreg lowering.
This would be best places as the problem is not confined to this testcase.
ALL SUGGESTIONS WELCOME!


With this problem, I noted several issues with AVR target that do not help.

1) The above example has enough free registers - the problem is that none of
them are contiguous enough to hold the long value of 'a'. This is due to the
fragmentation of the register set that occurs with the current allocation
order. Changing the order can alleviate this.

2) Splitting logical operations would  definitely remove the long OR with 1. I
am not sure it would free any registers to remove the conflict.

3) Alternatively, optimisation of single byte OR on SI pattern could be done.
The current *iorsi3_clobber is intended to do this but is impotent - it will
not be matched by combine - or used as peephole - it needs fixin. Again, this
may not help with the conflict.

4) The local register allocation was favoring LD_REGS for 't' - when any
GENERAL_REG could be used. This is because *movqi pattern does not have
constraint 'L' to allow GENERAL_REG 'r' to be loaded with zero. Same problem
for movhi - but movsi is correct! (Alas it was not enough to free register.)

Solving  1..3, would help but not cure this issue.




------- Comment #6 From Paolo Bonzini 2008-04-15 12:26 -------
The point of -fsplit-wide-types was to kill patterns like iorsi3 in AVR
backend.

------- Comment #7 From Steven Bosscher 2008-04-16 08:59 -------
I agree with Paolo in comment #6.  One purpose of the lower-subreg path was to
allow backends to *not* define insns that it doesn't have.  The expanders will
generate inline code for such patterns at expand time, with sets to subregs. 
Before GCC had lower-subreg, this would lead to awful code, but now that we
split the subregs out to pseudos it ought to work just fine.

Sadly, even i386 still hasn't been modified to benefit from this work...

------- Comment #8 From Andy Hutchinson 2008-04-16 13:10 -------
Subject: Re:  [4.3 Regression] [avr] code bloat caused by
 -fsplit-wide-types

Yes, indeed, I have patches in progress for AVR  that do split 
operation to take more advantage of lowering but the "bug" is still an 
issue then.

For example, if the testcase was using PLUS instead or OR, I will not 
be able to split instruction. (anything with carried "status" is 
problematic with reload and - as yet - cannot be split)

The  problem will merely propagate backwards until it gets blocked by 
unsplit wide mode operation (PLUS, COMPARE, SUB, MULT and probabley 
calls). Simply put, it will occur where ever a wide mode value meets a 
set of subregs. Here it will determine there is a conflict - even if 
there is not one.





-----Original Message-----
From: steven at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org>
To: hutchinsonandy@aim.com
Sent: Wed, 16 Apr 2008 4:59 am
Subject: [Bug target/35860] [4.3 Regression] [avr] code bloat caused by 
-fsplit-wide-types




------- Comment #7 from steven at gcc dot gnu dot org  2008-04-16 08:59 
-------
I agree with Paolo in comment #6.  One purpose of the lower-subreg path 
was to
allow backends to *not* define insns that it doesn't have.  The 
expanders will
generate inline code for such patterns at expand time, with sets to 
subregs.
Before GCC had lower-subreg, this would lead to awful code, but now 
that we
split the subregs out to pseudos it ought to work just fine.

Sadly, even i386 still hasn't been modified to benefit from this work...


--


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35860

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

------- Comment #9 From Joseph S. Myers 2008-08-27 22:03 -------
4.3.2 is released, changing milestones to 4.3.3.

------- Comment #10 From Richard Guenther 2009-01-24 10:20 -------
GCC 4.3.3 is being released, adjusting target milestone.

------- Comment #11 From Richard Guenther 2009-08-04 12:29 -------
GCC 4.3.4 is being released, adjusting target milestone.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug