You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When building, wily only calculates metrics for modified or added files. This works well for individual files, but gives erroneous results for aggregated metrics on directories because each data point comes only from files that changed in that revision. So for e.g. LOC, values fluctuate a lot because what's being counted is LOC of changed files, meaning commits touching a lot of files give high values and small commits, low values.
To Reproduce
checkout wily's repo
wily build -n 200
wily graph src/ --aggregate loc --all
See a graph similar to this one below, where wild variations of a metric occur:
Expected behavior
An aggregated metric probably should account for all files instead of only change/added. That way, the values would reflect history of the metric for the repository, instead of for changed files in each commit.
By patching wily to calculate metrics for all tracked files, we get a much more consistent result (with drawbacks pointed below):
There are issues that make the simplest patch not ready to present: first, this makes building much slower (250s -> 450s to build 200 revisions of wily's repo), hopefully some of it can be mitigated by caching (edit: indeed it can, but see below). And second, there are some artifacts I can't explain yet:
Desktop
OS: Windows
Browser: Firefox
Version: 1.24.2
Edited to add: caching brings time down to almost the same as only processing modified files (250s -> 280s), but it has to be applied to radon functions. We could either recreate part of radon's machinery with added caching (may be a lot of code) or try to monkey-patch radon with caching versions of analyze(), h_visit(), mi_parameters(), mi_visit() and cc_visit(). However, now I think it would be easier and cleaner to keep a running tally of metrics for all files, adding, removing and updating files as needed from changes in commits.
The text was updated successfully, but these errors were encountered:
devdanzin
changed the title
Aggregated graphs only show data for modified/added filesAug 18, 2023
When building, wily only calculates metrics for modified or added files. This works well for individual files, but gives erroneous results for aggregated metrics on directories because each data point comes only from files that changed in that revision. So for e.g. LOC, values fluctuate a lot because what's being counted is LOC of changed files, meaning commits touching a lot of files give high values and small commits, low values.
To Reproduce
wily build -n 200
wily graph src/ --aggregate loc --all
Expected behavior
An aggregated metric probably should account for all files instead of only change/added. That way, the values would reflect history of the metric for the repository, instead of for changed files in each commit.
By patching wily to calculate metrics for all tracked files, we get a much more consistent result (with drawbacks pointed below):
There are issues that make the simplest patch not ready to present: first, this makes building much slower (250s -> 450s to build 200 revisions of wily's repo), hopefully some of it can be mitigated by caching (edit: indeed it can, but see below). And second, there are some artifacts I can't explain yet:
Desktop
Edited to add: caching brings time down to almost the same as only processing modified files (250s -> 280s), but it has to be applied to radon functions. We could either recreate part of radon's machinery with added caching (may be a lot of code) or try to monkey-patch radon with caching versions of
analyze()
,h_visit()
,mi_parameters()
,mi_visit()
andcc_visit()
. However, now I think it would be easier and cleaner to keep a running tally of metrics for all files, adding, removing and updating files as needed from changes in commits.The text was updated successfully, but these errors were encountered: