Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert to the GA4 data API #5973

Merged
merged 2 commits into from
May 6, 2024

Conversation

escattone
Copy link
Contributor

@escattone escattone commented Apr 23, 2024

mozilla/sumo#1705

This PR will also resolve the following bugs:
mozilla/sumo#1140
mozilla/sumo#1169

This PR converts our current GA3 API queries to GA4 data API queries.

Please merge https://github.com/mozilla-it/webservices-infra/pull/1949 immediately prior to merging this PR.

Prior to releasing the code in this PR, make sure https://github.com/mozilla-it/webservices-infra/pull/1950 is merged.

Notes

  • The work to grant our existing stage and prod GA-specific service accounts access to our new stage and prod GA4 properties and the GA4 data API has already been completed. ✅
  • I've already verified the correctness of the RunReportRequest instances for each case by running each of the functions locally using the stage GA-specific service account.
  • In general, the code in this PR consumes less memory and database resources than the current implementation. I avoid creating any intermediate data structures where possible and most of the GA4 functions, like pageviews_by_document, are now generator functions.
  • The new pageviews_by_document function is much more efficient that its predecessor, for the following reasons:
    • The old version would return many more rows, often hundreds of thousands of rows, mainly because the response could not be limited to just KB article page views -- it would also include KB article edits and discussion views and so on -- but also because if viewed URL's differed in their query parameters, each variation would be a separate row. The new version only returns one row per viewed KB article, so the upper limit is roughly 45K rows which would only happen if every article in every locale was viewed within the time period requested.
    • The old version would make at least 1 database query for each row that was a KB article view -- to get the article's ID -- and so would often make thousands of database queries. The new version makes no database queries at all.
  • The old pageview_by_document function, because it used the Document.from_url class method, would sometimes associate the page views of a localized KB article with a different KB article, for example with its parent, if the localized article no longer existed. This is no longer the case. The page views for a KB article of a given locale and slug are now only associated with that exact KB article or nothing at all if the article no longer exists.
@escattone escattone force-pushed the convert-to-ga4-api-1705 branch 7 times, most recently from f33abf7 to 969ac79 Compare April 26, 2024 02:26
@escattone escattone marked this pull request as ready for review April 26, 2024 02:40
@mozilla mozilla deleted a comment from Tui2519 Apr 29, 2024
@escattone escattone requested review from smithellis and removed request for akatsoulas May 2, 2024 21:16
Copy link
Contributor

@smithellis smithellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so much cleaner overall!. Just a few changes (maybe).

Thanks!

kitsune/dashboards/models.py Outdated Show resolved Hide resolved
kitsune/kpi/tests/test_cron.py Show resolved Hide resolved
kitsune/questions/models.py Outdated Show resolved Hide resolved
@escattone escattone force-pushed the convert-to-ga4-api-1705 branch 2 times, most recently from 7a7a4f5 to 7911885 Compare May 3, 2024 23:42
@escattone escattone requested a review from smithellis May 3, 2024 23:54
@escattone
Copy link
Contributor Author

escattone commented May 6, 2024

I just tested the new WikiDocumentVisits.reload_period_from_analytics and QuestionVisits.reload_from_analytics commands from this PR on stage and they are smokin' fast! The GA4 data is still in its infancy on stage, so for example the pageviews_by_document function only returns about 1500 rows for the 90-day period of data, and the pageviews_by_question function returns about 1600 rows -- both will return tens of thousands of rows when the data is mature -- but creating new stats for both took a fraction of a second. That's a dramatic change from our current code! We run the WikiDocumentVisits.reload_period_from_analytics function weekly for all periods (7 days, 30 days, 90 days, and one year), and the current code sometimes takes so much time and memory that it fails, and when I tested it months ago, found that it could take hours to complete. That same weekly run with the code in this PR -- assuming fully populated data in GA4, up to a years worth -- will take at most a few seconds. 🎉

@escattone escattone merged commit 7c72d5b into mozilla:main May 6, 2024
2 checks passed
@escattone escattone deleted the convert-to-ga4-api-1705 branch May 6, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants