Bug 109429 - ivopts: Compute complexity for unsupported addressing modes
Summary: ivopts: Compute complexity for unsupported addressing modes
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 109430 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-04-06 07:33 UTC by Jovan Dmitrović
Modified: 2024-06-26 12:36 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jovan Dmitrović 2023-04-06 07:33:41 UTC
Greetings everyone,
I have pinged the mailing list about this issue about a month ago (I am not the original poster), and heard no response since.

For some addressing modes, it seems that the complexity is calculated in sub-optimal manner. More information and a patch are contained in the link below.

https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608895.html
Comment 1 Jovan Dmitrović 2023-04-06 09:01:11 UTC
It seems that commit f9f69dd changed the way complexity is calculated, so now the complexity equals zero across the board, for each candidate.

Here is one testcase:

void daxpy(float *vector1, float *vector2, int n, float fp_const)
{
	for (int i = 0; i < n; ++i)
		vector1[i] += fp_const * vector2[i];
}

void dgefa(float *vector, int m, int n, int l)
{
	for (int i = 0; i < n - 1; ++i)
	{
		for (int j = i + 1; j < n; ++j)
		{
			float t = vector[m * j + l];
			daxpy(&vector[m * i + i + 1], 
                              &vector[m * j + i + 1], n - (i + 1), t);
		}
	}
}

Following tables compare IV candidates.


===== Before f9f69dd =====
Group 1:
  cand	cost	compl.	inv.expr.    inv.vars
  1	13	1	3;	     NIL;
  2	13	2	4;	     NIL;
  4	9	1	5;	     NIL;
  5	1	0	NIL;	     NIL;
  7	9	1	3;	     NIL;
===== Before f9f69dd =====
===== After f9f69dd =====
Group 1:
  cand	cost	compl.	inv.expr.    inv.vars
  1	10	0	4;	     NIL;
  2	10	0	5;	     NIL;
  4	6	0	6;	     NIL;
  5	1	0	NIL;	     NIL;
  7	6	0	4;	     NIL;
===== After f9f69dd =====


This leads to different candidate being selected in our test case.


===== Before f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
Candidate 4:
  Var befor: ivtmp.17_52
  Var after: ivtmp.17_103
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _10)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
Candidate 5:
  Var befor: ivtmp.18_99
  Var after: ivtmp.18_98
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _14)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
===== Before f9f69dd =====
===== After f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 1 IVs:
Candidate 4:
  Var befor: ivtmp.17_52
  Var after: ivtmp.17_103
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _10)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
===== After f9f69dd =====


Finally, here is the difference in assembly:


===== Before f9f69dd =====
.L83:
	lwc1	$f5,0($3)
	lwc1	$f8,0($2)
	lwc1	$f7,4($2)
	lwc1	$f6,8($2)
	lwc1	$f9,12($2)
	lwc1	$f10,16($2)
	maddf.s	$f8,$f0,$f5
	lwc1	$f11,20($2)
	lwc1	$f12,24($2)
	lwc1	$f13,28($2)
	ld	$12,72($sp)
	swc1	$f8,0($2)
	lwc1	$f14,4($3)
	maddf.s	$f7,$f0,$f14
	swc1	$f7,4($2)
	lwc1	$f15,8($3)
	maddf.s	$f6,$f0,$f15
	swc1	$f6,8($2)
	lwc1	$f16,12($3)
	maddf.s	$f9,$f0,$f16
	swc1	$f9,12($2)
	lwc1	$f17,16($3)
	maddf.s	$f10,$f0,$f17
	swc1	$f10,16($2)
	lwc1	$f18,20($3)
	maddf.s	$f11,$f0,$f18
	swc1	$f11,20($2)
	lwc1	$f19,24($3)
	maddf.s	$f12,$f0,$f19
	swc1	$f12,24($2)
	lwc1	$f20,28($3)
	maddf.s	$f13,$f0,$f20
	swc1	$f13,28($2)
	daddiu	$2,$2,32
	bne	$2,$12,.L83
	daddiu	$3,$3,32
        ...
===== Before f9f69dd =====
===== After f9f69dd =====
.L93:
	dsubu	$18,$2,$4
	lwc1	$f13,0($2)
	daddu	$19,$18,$5
	daddiu	$16,$2,4
	lwc1	$f14,0($19)
	dsubu	$17,$16,$4
	daddu	$25,$17,$5
	lwc1	$f15,4($2)
	daddiu	$19,$2,12
	daddiu	$20,$2,8
	maddf.s	$f13,$f1,$f14
	dsubu	$16,$19,$4
	daddiu	$17,$2,16
	dsubu	$18,$20,$4
	daddu	$19,$16,$5
	daddiu	$16,$2,20
	lwc1	$f10,8($2)
	daddu	$20,$18,$5
	lwc1	$f16,12($2)
	dsubu	$18,$17,$4
	lwc1	$f17,16($2)
	dsubu	$17,$16,$4
	lwc1	$f18,20($2)
	daddiu	$16,$2,24
	lwc1	$f20,24($2)
	daddu	$18,$18,$5
	swc1	$f13,0($2)
	daddu	$17,$17,$5
	lwc1	$f19,0($25)
	daddiu	$25,$2,28
	lwc1	$f11,28($2)
	daddiu	$2,$2,32
	dsubu	$16,$16,$4
	dsubu	$25,$25,$4
	maddf.s	$f15,$f1,$f19
	daddu	$16,$16,$5
	daddu	$25,$25,$5
	swc1	$f15,-28($2)
	lwc1	$f21,0($20)
	ld	$20,48($sp)
	maddf.s	$f10,$f1,$f21
	swc1	$f10,-24($2)
	lwc1	$f22,0($19)
	maddf.s	$f16,$f1,$f22
	swc1	$f16,-20($2)
	lwc1	$f23,0($18)
	maddf.s	$f17,$f1,$f23
	swc1	$f17,-16($2)
	lwc1	$f0,0($17)
	maddf.s	$f18,$f1,$f0
	swc1	$f18,-12($2)
	lwc1	$f7,0($16)
	maddf.s	$f20,$f1,$f7
	swc1	$f20,-8($2)
	lwc1	$f12,0($25)
	maddf.s	$f11,$f1,$f12
	bne	$2,$20,.L93
	swc1	$f11,-4($2)
        ...
===== After f9f69dd =====
Comment 2 Jovan Dmitrović 2023-04-06 09:37:00 UTC
*** Bug 109430 has been marked as a duplicate of this bug. ***
Comment 3 Jovan Dmitrović 2023-04-06 09:39:24 UTC
Another thing that has come to attention is the register pressure costs being calculated improperly. More information and a patch are submitted at the link below.

https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608896.html
Comment 4 Jovan Dmitrović 2024-06-26 12:36:51 UTC
There is a patch regarding this issue submitted to the GCC mailing list: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647966.html.

Unfortunately, there hasn't been a response to it for quite some time now, even after a few pings from the original poster.