This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 3.1 is still very slow, compared to 2.95.3


> Thinking about it there may be yet another issue at play here for 
> small(er) machines: the number of TLB walks required.  If most memory 
> references are grouped, then cache line fetching may be a major effect, 
> however, if they are truly scattered over memory, then cache line fetches 
> could equally well require a TLB walk as well before the line can be 
> filled.
> 
> Although TLB entries cover much larger areas of memory (normally a page), 
> there are only a finite number of them... and fetching cache lines from 
> ten separate pages will require ten TLB entries...
> 
> With the obstack code the body of an insn would be laid out in a single 
> chunk of memory, typically something like
> 
> (insn (set (reg:SI 55) (const_int 65512)))
> 
> would be in a single block of memory as
> 
> +-------+-----------------+---------+----------+
> |reg 55 | const_int 65512 | set ... | insn ... |
> +-------+-----------------+---------+----------+
> 
> so walking the RTL of this insn might involve a couple of cache fills and 
> one TLB walk.

This is interesting remark, definitly.
I seem to remember Richard mentioning that his bootstrap finished faster with
the GGC than without at the time he was integrating it, but I may be mistaken.
Definitly adding some locality to the instruction pattern is desirable thing,
however I am not at all sure how this can be reached.
> 
> Now each entry might well be in a separate page (or have padding bytes 
> after it).  Requiring additional cache fetches and additional TLB walks.
> 
> So, with ggc we either waste more memory on padding, or we waste pages on 
> separation of types; the more types we have the more pages we need and the 
> more TLB misses we are likely to have.
> 
> Do we have machines where we can profile things like cache and TLB 
> activity in user code?

I believe about any modern i386 system with oprofile can measure data cache
misses.
Other problem are code cache misses - just look at size of insn-attrtab :)
In the past I did some work on caching get_attr results during scheduling
and it appeared to work well.  I think good sollution would be to rewrite
insn-attrtab to use bytecode interpreter for the recognizers.

Honza
> 
> R.
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]