48328 – GCC failed to generate 16bit relative jump table

Bug 48328 - GCC failed to generate 16bit relative jump table

Summary: GCC failed to generate 16bit relative jump table

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.7.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2011-03-29 08:28 UTC by Carrot
Modified:	2011-08-12 16:58 UTC (History)
CC List:	1 user (show)

See Also:
Host:	linux
Target:	arm-eabi
Build:	linux
Known to work:
Known to fail:
Last reconfirmed:	2011-04-04 21:59:55

Attachments
testcase (9.79 KB, application/octet-stream) 2011-03-29 08:28 UTC, Carrot	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Carrot 2011-03-29 08:28:09 UTC

Created attachment 23796 [details]
testcase

As mentioned in pr47373, sometimes gcc generates absolute address in jump table, double the size of the table. Now I extract the test case. Compile it with trunk gcc and options -march=armv7-a -mthumb -Os, I can get

        ...
	ldr	r3, [fp, #0]
	subs	r3, r3, #11
.L14:
	cmp	r3, #18
	bhi	.L14
	adr	r0, .L21
	ldr	pc, [r0, r3, lsl #2]
	.align	2
.L21:
	.word	.L15+1
	.word	.L14+1
	.word	.L16+1
	.word	.L14+1
	.word	.L14+1
	.word	.L17+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L18+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L14+1
	.word	.L19+1
	.word	.L76+1
.L15:
         ...

This is the first problem, the relative address now becomes absolute address, of course 32bit entries.

The corresponding insns from infback.c.220r.nothrow is actually addr_diff_vec, I couldn't find how the absolute addresses are outputted.

(jump_insn:TI 85 83 86 7 (parallel [
            (set (pc)
                (if_then_else (leu (reg:SI 3 r3 [551])
                        (const_int 18 [0x12]))
                    (mem:SI (plus:SI (mult:SI (reg:SI 3 r3 [551])
                                (const_int 4 [0x4]))
                            (label_ref 86)) [0 S4 A32])
                    (label_ref:SI 82)))
            (clobber (reg:CC 24 cc))
            (clobber (reg:SI 0 r0))
            (use (label_ref 86))
        ]) src/zlib/infback.c:281 717 {thumb2_casesi_internal}
     (expr_list:REG_UNUSED (reg:CC 24 cc)
        (expr_list:REG_UNUSED (reg:SI 0 r0)
            (insn_list:REG_LABEL_TARGET 82 (nil))))
 -> 86)

(code_label 86 85 87 21 "" [2 uses])

(jump_insn 87 86 88 (addr_diff_vec:SI (label_ref:SI 86)
         [
            (label_ref:SI 89)
            (label_ref:SI 82)
            (label_ref:SI 180)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 232)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 484)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 82)
            (label_ref:SI 700)
            (label_ref:SI 762)
        ]
        (label_ref:SI 82)
        (label_ref:SI 762)) src/zlib/infback.c:281 -1
     (nil))


When I add -fpic to command line, gcc generates following


        subs	r3, r3, #11
.L14:
	cmp	r3, #18
	bhi	.L14
	adr	r0, .L21
	ldr	r1, [r0, r3, lsl #2]
	add	r0, r0, r1
	bx	r0
	.align	2
.L21:
	.word	.L15+1-.L21
	.word	.L14+1-.L21
	.word	.L16+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L17+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L18+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L14+1-.L21
	.word	.L19+1-.L21
	.word	.L76+1-.L21
.L15:

Now we get relative address table, but the table entries are 4 bytes, not the optimal 2 bytes form. This is the second problem.

The related source should be in arm.h

#define CASE_VECTOR_SHORTEN_MODE(min, max, body)			\
  (TARGET_THUMB1							\
   ? (min >= 0 && max < 512						\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 1, QImode)	\
      : min >= -256 && max < 256					\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 0, QImode)	\
      : min >= 0 && max < 8192						\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 1, HImode)	\
      : min >= -4096 && max < 4096					\
      ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 0, HImode)	\
      : SImode)								\
   : ((min < 0 || max >= 0x2000 || !TARGET_THUMB2) ? SImode		\
      : (max >= 0x200) ? HImode						\
      : QImode))

Problems:

a) Is (max >= 0x2000) correct? Why not (max >= 0x20000)? The maximum unsigned short is 0xFFFF.
b) Alghough tbb/tbh needs forward jump (min >= 0), but tbb/tbh isn't must be used. In this case (min < 0), we can use separate instructions to load the offset and add it to pc. It is still a win compared with wider table entry in nearly all cases.

Comment 1 Carrot 2011-03-30 07:25:31 UTC

Another possible enhancement is we can also use HImode jump table entries. Similar to cases min<0, although tbh is not available in arm mode, we can use separate instruction to load offset and adjust PC.

Comment 2 Ramana Radhakrishnan 2011-08-12 16:58:13 UTC

Author: ramana
Date: Fri Aug 12 16:58:09 2011
New Revision: 177705

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177705
Log:

Fix PR target/48328 part 1

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.h