This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: gcc 3.1 is still very slow, compared to 2.95.3
> Thinking about it there may be yet another issue at play here for
> small(er) machines: the number of TLB walks required. If most memory
> references are grouped, then cache line fetching may be a major effect,
> however, if they are truly scattered over memory, then cache line fetches
> could equally well require a TLB walk as well before the line can be
> filled.
>
> Although TLB entries cover much larger areas of memory (normally a page),
> there are only a finite number of them... and fetching cache lines from
> ten separate pages will require ten TLB entries...
>
> With the obstack code the body of an insn would be laid out in a single
> chunk of memory, typically something like
>
> (insn (set (reg:SI 55) (const_int 65512)))
>
> would be in a single block of memory as
>
> +-------+-----------------+---------+----------+
> |reg 55 | const_int 65512 | set ... | insn ... |
> +-------+-----------------+---------+----------+
>
> so walking the RTL of this insn might involve a couple of cache fills and
> one TLB walk.
This is interesting remark, definitly.
I seem to remember Richard mentioning that his bootstrap finished faster with
the GGC than without at the time he was integrating it, but I may be mistaken.
Definitly adding some locality to the instruction pattern is desirable thing,
however I am not at all sure how this can be reached.
>
> Now each entry might well be in a separate page (or have padding bytes
> after it). Requiring additional cache fetches and additional TLB walks.
>
> So, with ggc we either waste more memory on padding, or we waste pages on
> separation of types; the more types we have the more pages we need and the
> more TLB misses we are likely to have.
>
> Do we have machines where we can profile things like cache and TLB
> activity in user code?
I believe about any modern i386 system with oprofile can measure data cache
misses.
Other problem are code cache misses - just look at size of insn-attrtab :)
In the past I did some work on caching get_attr results during scheduling
and it appeared to work well. I think good sollution would be to rewrite
insn-attrtab to use bytecode interpreter for the recognizers.
Honza
>
> R.
>