97064 – BB vectorization behaves sub-optimal

Bug 97064 - BB vectorization behaves sub-optimal

Summary: BB vectorization behaves sub-optimal

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	11.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2020-09-16 06:29 UTC by Richard Biener
Modified:	2022-01-11 12:15 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	x86_64-- i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2020-09-16 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2020-09-16 06:29:39 UTC

The testcase g++.dg/vect/slp-pr87105.cc ends in

  _64 = MIN_EXPR <_32, _87>;
  bBox_6(D)->x0 = _64;
  _67 = MIN_EXPR <_33, _86>;
  bBox_6(D)->y0 = _67;
  _70 = MAX_EXPR <_36, _87>;
  bBox_6(D)->x1 = _70;
  _73 = MAX_EXPR <_39, _86>;
  bBox_6(D)->y1 = _73;

thus feeding a 4 element store with a non-uniform SLP opportunity
starting with { MIN, MIN, MAX, MAX }.  With 2-element vector type
vectorization this eventually gets vectorized by splitting the group
which is prioritized over just building the { MIN..., MAX } vector
from scalars but with 4-element vector type vectorization no splitting
is considered and we end up successfully vectorizing just the store
with never considering the smaller vector size.

So at the moment the testcase PASSes with SSE but fails with AVX.

Comment 1 Richard Biener 2022-01-11 12:15:10 UTC

Also partly because we are not evaluating costing of multiple vector sizes on x86.