This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch][RFC] 160-bits bitmap_element
- From: Richard Guenther <richard dot guenther at gmail dot com>
- To: Steven Bosscher <stevenb dot gcc at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Fri, 17 Aug 2012 13:54:43 +0200
- Subject: Re: [patch][RFC] 160-bits bitmap_element
- References: <CABu31nMK54b1=+iVxt3npXJf3642xouc0CEwu66wP9ia1QJArQ@mail.gmail.com>
On Fri, Aug 17, 2012 at 1:46 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> On a 64-bits host we leave 32 bits of each bitmap_element unused. And
> actually, on a 32-bits host we do that too for GGC-allocated bitmaps
> (due to alignment).
>
> With this patch, we use those 32 bits by making BITMAP_WORD an
> unsigned int (instead of unsigned long) and having 5 elts per
> bitmap_element (instead of 2 or 4).
>
> The effect is that computing the bit position gets more expensive.
> Currently this only needs a div/mod by a power-of-two, but with this
> patch it's a div/mod 5. But because most bitmaps are sparse in blocks,
> bitmaps with 160 bits bitmap_elements have fewer searches. And I think
> that in general the call overhead to the bitmap functions
> (bitmap_bit_p/bitmap_set_bit are not inline functions, they're
> relatively large) and the pointer chasing is more expensive than
> requiring div/mod 5. Compile time for a set of cc1-i files doesn't
> change, but compile time for the small test case of PR54146 goes down
> by ~10%, and peak memory goes down a bit too (harder to measure so I
> don't know exactly by how much).
>
> Looking for your thoughts and comments,
Well, another effect of reducing the size of BITMAP_WORD is that
operations are not performed in a mode optimally using CPU regs
(did you check code generation differences on a 64bit host?).
I think another idea was to even actually increase BITMAP_WORD size
to be a vector type where supported to benefit from that fact, eventually
getting rid of all inner loops (though we unroll them completely anyway I hope).
I suppose using vector registers could be achieved in by specializing as well,
in your proposed case we'd have one xmm op and a 32bit reg op.
I guess that's one vote in favor of not wasting those 32 bits which is a shame.
Looking for other comments.
Thanks,
Richard.
> Ciao!
> Steven