Bug 48328 - GCC failed to generate 16bit relative jump table
Summary: GCC failed to generate 16bit relative jump table
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2011-03-29 08:28 UTC by Carrot
Modified: 2011-08-12 16:58 UTC (History)
1 user (show)

See Also:
Host: linux
Target: arm-eabi
Build: linux
Known to work:
Known to fail:
Last reconfirmed: 2011-04-04 21:59:55


Attachments
testcase (9.79 KB, application/octet-stream)
2011-03-29 08:28 UTC, Carrot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2011-03-29 08:28:09 UTC
Created attachment 23796 [details]
testcase

As mentioned in pr47373, sometimes gcc generates absolute address in jump table, double the size of the table. Now I extract the test case. Compile it with trunk gcc and options -march=armv7-a -mthumb -Os, I can get

        ...
	ldr	r3, [fp, #0]
	subs	r3, r3, #11
.L14:
	cmp	r3, #18
	bhi	.L14
	adr	r0, .L21
	ldr	pc, [r0, r3, lsl #2]
	.align	2
.L21:
	.word	.L15+1
	.word	.L14+1
	.word	.L16+1
	.word	.L14+1
	.word	.L14+1
	.word	.L17+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L18+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L19+1
	.word	.L76+1
.L15:
         ...

This is the first problem, the relative address now becomes absolute address, of course 32bit entries.

The corresponding insns from infback.c.220r.nothrow is actually addr_diff_vec, I couldn't find how the absolute addresses are outputted.

(jump_insn:TI 85 83 86 7 (parallel [
            (set (pc)
                (if_then_else (leu (reg:SI 3 r3 [551])
                        (const_int 18 [0x12]))
                    (mem:SI (plus:SI (mult:SI (reg:SI 3 r3 [551])
                                (const_int 4 [0x4]))
                            (label_ref 86)) [0 S4 A32])
                    (label_ref:SI 82)))
            (clobber (reg:CC 24 cc))
            (clobber (reg:SI 0 r0))
            (use (label_ref 86))
        ]) src/zlib/infback.c:281 717 {thumb2_casesi_internal}
     (expr_list:REG_UNUSED (reg:CC 24 cc)
        (expr_list:REG_UNUSED (reg:SI 0 r0)
            (insn_list:REG_LABEL_TARGET 82 (nil))))
 -> 86)

(code_label 86 85 87 21 "" [2 uses])

(jump_insn 87 86 88 (addr_diff_vec:SI (label_ref:SI 86)
         [
            (label_ref:SI 89)
            (label_ref:SI 82)
            (label_ref:SI 180)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 232)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 484)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 700)
            (label_ref:SI 762)
        ]
        (label_ref:SI 82)
        (label_ref:SI 762)) src/zlib/infback.c:281 -1
     (nil))


When I add -fpic to command line, gcc generates following


        subs	r3, r3, #11
.L14:
	cmp	r3, #18
	bhi	.L14
	adr	r0, .L21
	ldr	r1, [r0, r3, lsl #2]
	add	r0, r0, r1
	bx	r0
	.align	2
.L21:
	.word	.L15+1-.L21
	.word	.L14+1-.L21
	.word	.L16+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L17+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L18+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L19+1-.L21
	.word	.L76+1-.L21
.L15:

Now we get relative address table, but the table entries are 4 bytes, not the optimal 2 bytes form. This is the second problem.

The related source should be in arm.h

#define CASE_VECTOR_SHORTEN_MODE(min, max, body)			\
  (TARGET_THUMB1							\
   ? (min >= 0 && max < 512						\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 1, QImode)	\
      : min >= -256 && max < 256					\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 0, QImode)	\
      : min >= 0 && max < 8192						\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 1, HImode)	\
      : min >= -4096 && max < 4096					\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 0, HImode)	\
      : SImode)								\
   : ((min < 0 || max >= 0x2000 || !TARGET_THUMB2) ? SImode		\
      : (max >= 0x200) ? HImode						\
      : QImode))

Problems:

a) Is (max >= 0x2000) correct? Why not (max >= 0x20000)? The maximum unsigned short is 0xFFFF.
b) Alghough tbb/tbh needs forward jump (min >= 0), but tbb/tbh isn't must be used. In this case (min < 0), we can use separate instructions to load the offset and add it to pc. It is still a win compared with wider table entry in nearly all cases.
Comment 1 Carrot 2011-03-30 07:25:31 UTC
Another possible enhancement is we can also use HImode jump table entries. Similar to cases min<0, although tbh is not available in arm mode, we can use separate instruction to load offset and adjust PC.
Comment 2 Ramana Radhakrishnan 2011-08-12 16:58:13 UTC
Author: ramana
Date: Fri Aug 12 16:58:09 2011
New Revision: 177705

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177705
Log:

Fix PR target/48328 part 1

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.h