98748 – Increased precision for points-to analysis

Bug 98748 - Increased precision for points-to analysis

Summary: Increased precision for points-to analysis

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	11.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	alias, missed-optimization

Depends on:
Blocks:

Reported:	2021-01-19 13:09 UTC by bugzilla
Modified:	2021-12-23 20:57 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-01-19 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description bugzilla 2021-01-19 13:09:53 UTC

Hi,

this ticket is to discuss the possibility of extending the points-to analysis in GCC. The points-to analysis in GCC is currently field-sensitive, context-insensitive and flow-insensitive. However the current GCC implementation suffers from some limitations:

1) field-sensitivity is unavailable for heap variables (i.e. those that are allocated via malloc)
2) field-sensitivity is similarly unavailable for structs allocated in dynamic arrays (again via malloc).

I understand that these two limitations stem from the fact that no type information is available at the allocation site. I also believe that the current IPA-PTA may not take advantage of type information (however, I am not 100% sure of this at the moment).

I've been reading this paper [0] which has a Datalog implementation for a points-to analysis that computes the points-to solution with field-sensitivity for heap variables and field-sensitivity for dynamic arrays. The implementation is also available on GitHub [1]. At the moment it analyzes LLVM-IR.

This added sensitivity should be enough to unlock data-layout transformations at the granularity of alias-sets (i.e., it will change the layout of all variables pointing to the same object) as opposed to a coarser type granularity (i.e., to change the layout of all variables with a given type). For example: dropping unused fields from structs [2] and transforming a "struct of arrays to array of structs & vice versa".

If one went and performed the most direct translation of the available analysis in GCC, one would need to iterate over all gimple code and generate Datalog facts that would then be fed to the solver. I believe that XSB is an implementation of Datalog that could be called from a GCC-plugin to compute the points-to analysis. However, it would be preferable to have the points-to analysis be performed directly in GCC.

Please let me know if you have thoughts about this. I am happy to have a dialogue so that we can all come to a maintainable and reliable solution.

Thanks!

[0] Balatsouras, George, and Yannis Smaragdakis. "Structure-sensitive points-to analysis for C and C++." International Static Analysis Symposium. Springer, Berlin, Heidelberg, 2016. (available at the author's website)

[1] https://github.com/plast-lab/cclyzer

[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92801

Comment 1 ptomsich 2021-01-19 13:20:21 UTC

The link (through the author's website) for [0] is https://zenodo.org/record/61898/files/cclyzer.pdf

Comment 2 Richard Biener 2021-01-19 14:57:43 UTC

PTA maintains a 'shadow layout' of variables it represents.  Whenever that does not match the actual access layout inefficiencies arise so the current non-handling of certain cases is just conservative.  A possible 'shadow layout' would be void *[] which should catch any layout with aligned pointers with the overhead
of tracking non-pointers in extra precision.  Note any field is really a
separate variable as far as the solver and its memory and compile time use is concerned.