Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature branch] JS: Migrate to shared dataflow library #14412

Open
wants to merge 226 commits into
base: js/shared-dataflow-branch
Choose a base branch
from

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Oct 9, 2023

Targeting a feature branch

This PR is targeting a branch named js/shared-dataflow-branch, not main as we normally would.

  • Merging into main will happen at a later point, and will require that
    • we have a good-looking DCA report (currently we have too many performance regressions), and
    • possibly more evaluations on different suites and/or a QA trial run, and
    • TODO comments have been resolved
  • The criteria for merging to js/shared-dataflow-branch should be that the code has been reviewed, and is considered good enough to go into main once the above criteria are met.
  • This means things like readability should be reviewed as if we were targeting main. We don't want to have to go back a re-review the code for readability later.

Review order

Commits have been organised into a reading order. The large number of commits is there to make it easier to the review.

Note that not every commit can compile in isolation. I split some commits into smaller chunks for easy of readability, but you may sometimes see odd artifacts like see a trailing and/or, an empty predicate body, or the occasional mention of a class that doesn't exist yet.

Otherwise the overarching order of commits are:

  1. The first commit is from this PR, which should be reviewed in its own PR.
  2. Core data flow changes: instantiating shared libraries and related work to enable that
  3. Library modeling changes: mainly rewriting collection models as flow summaries
  4. Query changes: porting to ConfigSig-style and updating tests
  5. Library tests: port test configurations to ConfigSig style, generic test output changes
  6. Add TODOs and further attempts at fixing some of the performance and precision issues.

Steps across function boundaries

The most difficult change for JS is that steps across function boundaries are no longer permitted, except for jump steps which can result in FPs and performance issues.

This means the following type of steps need to be rewritten:

  • Steps into or out of callbacks. This applies equally to value-preversing steps, taint steps, read steps, and store steps.
  • Store steps that target getALocalSource(), because that can cross function boundaries.
  • Steps that use the combined loadStoreStep feature, which is no longer supported.

The current status of this migration is:

  • Ported the core models for Array, Map, Set, Promise, and generators.
  • The above took care of all our existing uses of loadStoreStep.
  • String#replace has been converted, as it is the only string-related step that needed to be ported.
  • Value steps and taint steps from other models are implicitly converted to jump steps when they cross a function boundary.
  • Read/store steps from other models are just retained as they are for now, which in some case breaks the invariant by crossing a function boundary, as there is no combined jump-read or jump-store step we can map to.

Related reading:

Evaluation

The latest evaluation can be found here. As mentioned there are too many performance regressions at the moment, and alerts need to be triaged.

asgerf added 11 commits June 27, 2024 09:06
These initially got messed up by a merge conflict where I couldn't rerun the tests due to breaking
changes in the data flow library. I wanted the breaking-change updates to live in their own commits,
not just eaten by a merge resolution commit, so the test output became broken for a while.

The '#select' result set is unchanged in all of these, so they should be safe to accept.
@asgerf
Copy link
Contributor Author

asgerf commented Jun 28, 2024

@erik-krogh after keeping up the CI changes and breaking changes in the data flow library, this PR is finally green again (apart from the QLDoc check which complains about shared libraries).

I'd like to merge it to the feature branch as it is now, so we can start working with smaller PRs. The first unreviewed commit so far is 102ca77.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready for a feature branch 👍

Comment on lines +200 to +204
int totalorder() {
result = TotalOrdering::astNodeId(this.asSourceCallable()).bitShiftLeft(1)
or
result = TotalOrdering::libraryCallableId(this.asLibraryCallable()).bitShiftLeft(1) + 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the left-shift?

@@ -1,11 +1,11 @@
legacyDataFlowDifference
| arrays.js:2:16:2:23 | "source" | arrays.js:39:8:39:24 | arr4_spread.pop() | only flow with OLD data flow library |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add this step to the new library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 participants