Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: aggregated graphs only show data for modified/added files #210

Closed
devdanzin opened this issue Aug 17, 2023 · 0 comments
Closed
Assignees

Comments

@devdanzin
Copy link
Collaborator

devdanzin commented Aug 17, 2023

When building, wily only calculates metrics for modified or added files. This works well for individual files, but gives erroneous results for aggregated metrics on directories because each data point comes only from files that changed in that revision. So for e.g. LOC, values fluctuate a lot because what's being counted is LOC of changed files, meaning commits touching a lot of files give high values and small commits, low values.

To Reproduce

  1. checkout wily's repo
  2. wily build -n 200
  3. wily graph src/ --aggregate loc --all
  4. See a graph similar to this one below, where wild variations of a metric occur:
    aggregated_unpatched

Expected behavior
An aggregated metric probably should account for all files instead of only change/added. That way, the values would reflect history of the metric for the repository, instead of for changed files in each commit.

By patching wily to calculate metrics for all tracked files, we get a much more consistent result (with drawbacks pointed below):
aggregated_patched

There are issues that make the simplest patch not ready to present: first, this makes building much slower (250s -> 450s to build 200 revisions of wily's repo), hopefully some of it can be mitigated by caching (edit: indeed it can, but see below). And second, there are some artifacts I can't explain yet:

aggregated_patched_artifact

Desktop

  • OS: Windows
  • Browser: Firefox
  • Version: 1.24.2

Edited to add: caching brings time down to almost the same as only processing modified files (250s -> 280s), but it has to be applied to radon functions. We could either recreate part of radon's machinery with added caching (may be a lot of code) or try to monkey-patch radon with caching versions of analyze(), h_visit(), mi_parameters(), mi_visit() and cc_visit(). However, now I think it would be easier and cleaner to keep a running tally of metrics for all files, adding, removing and updating files as needed from changes in commits.

@devdanzin devdanzin changed the title Aggregated graphs only show data for modified/added files Aug 18, 2023
@devdanzin devdanzin self-assigned this Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant