This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/31640] cache block alignment is too aggressive on sh-elf
- From: "oleg dot endo at t-online dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 31 Dec 2011 17:24:47 +0000
- Subject: [Bug target/31640] cache block alignment is too aggressive on sh-elf
- Auto-submitted: auto-generated
- References: <bug-31640-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
--- Comment #3 from Oleg Endo <oleg.endo@t-online.de> 2011-12-31 17:24:47 UTC ---
Created attachment 26208
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26208
Proposed patch
(In reply to comment #0)
> The sh4 port aligns blocks that have no fallthrus and that are either
> frequently executed (JUMP_ALIGN) or preceeded a barrier
> (LABEL_ALIGN_AFTER_BARRIER) on a cache line.
>
> While in theory this help to avoid cache misses if the block slits over 2 cache
> lines, in practise this reduces cache locality and lenghten distance between
> blocks.
> The number of issued instructions are also impacted. For example the relative
> indirect address in jump tables needs a byte zero extend instruction if the
> distance occupies 8 bits instead of 7 bits.
>
> I ran some experiments and benchmarked (eembc) with 2 strategies
> 1) -falign-jumps=1
> 2) Align the block if the size is bigger than a given threshold. (empirically
> set to 16 bytes, half of the cache line size). See illustrating attached patch.
>
> My conclusion is that in -O3 the performance never degrades (option 2 is a
> little bit better, even improving dhrystone by 3%) when removing this padding.
> And the text size improves by ~15%.
Because of this I would like to propose the following alignment strategies
(unless they are changed by the user with -falign-??? options).
-Os:
Align everything to 2 byte to get compact code
-O2,-O3:
Align functions to 4 bytes.
Align labels and jumps to 2 bytes (to avoid potential code bloat).
Align loops to 4 bytes.
The attached patch should do that, although not fully tested yet.