Bug 98748 - Increased precision for points-to analysis
Summary: Increased precision for points-to analysis
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 11.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2021-01-19 13:09 UTC by bugzilla
Modified: 2021-12-23 20:57 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-01-19 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bugzilla 2021-01-19 13:09:53 UTC
Hi,

this ticket is to discuss the possibility of extending the points-to analysis in GCC. The points-to analysis in GCC is currently field-sensitive, context-insensitive and flow-insensitive. However the current GCC implementation suffers from some limitations:

1) field-sensitivity is unavailable for heap variables (i.e. those that are allocated via malloc)
2) field-sensitivity is similarly unavailable for structs allocated in dynamic arrays (again via malloc).

I understand that these two limitations stem from the fact that no type information is available at the allocation site. I also believe that the current IPA-PTA may not take advantage of type information (however, I am not 100% sure of this at the moment).

I've been reading this paper [0] which has a Datalog implementation for a points-to analysis that computes the points-to solution with field-sensitivity for heap variables and field-sensitivity for dynamic arrays. The implementation is also available on GitHub [1]. At the moment it analyzes LLVM-IR. 

This added sensitivity should be enough to unlock data-layout transformations at the granularity of alias-sets (i.e., it will change the layout of all variables pointing to the same object) as opposed to a coarser type granularity (i.e., to change the layout of all variables with a given type). For example: dropping unused fields from structs [2] and transforming a "struct of arrays to array of structs & vice versa".

If one went and performed the most direct translation of the available analysis in GCC, one would need to iterate over all gimple code and generate Datalog facts that would then be fed to the solver. I believe that XSB is an implementation of Datalog that could be called from a GCC-plugin to compute the points-to analysis. However, it would be preferable to have the points-to analysis be performed directly in GCC.

Please let me know if you have thoughts about this. I am happy to have a dialogue so that we can all come to a maintainable and reliable solution.

Thanks!


[0] Balatsouras, George, and Yannis Smaragdakis. "Structure-sensitive points-to analysis for C and C++." International Static Analysis Symposium. Springer, Berlin, Heidelberg, 2016. (available at the author's website)

[1] https://github.com/plast-lab/cclyzer

[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92801
Comment 1 ptomsich 2021-01-19 13:20:21 UTC
The link (through the author's website) for [0] is https://zenodo.org/record/61898/files/cclyzer.pdf
Comment 2 Richard Biener 2021-01-19 14:57:43 UTC
PTA maintains a 'shadow layout' of variables it represents.  Whenever that does not match the actual access layout inefficiencies arise so the current non-handling of certain cases is just conservative.  A possible 'shadow layout' would be void *[] which should catch any layout with aligned pointers with the overhead
of tracking non-pointers in extra precision.  Note any field is really a
separate variable as far as the solver and its memory and compile time use is concerned.