Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843

nicjansma · 2023-03-01T01:44:43Z

FAQ

Yes, my issue is not about variability or throttling.
Yes, my issue is not about a specific accessibility audit (file with axe-core instead).

URL

What happened?

(submitted URL is a placeholder URL, not the site I've reproduced this issue on)

As a third party RUM provider, my team provides a JavaScript RUM library called boomerang.js that is included in our customer's websites. boomerang.js is loaded on every page load and beacons performance analytics information.

Our customers, who are paying for RUM or CDN services, are usually very motivated and proactive in monitoring their site's performance, as you would expect. They will utilize tools like Lighthouse and want to ensure that boomerang.js is not causing any performance side-effects on their sites. We sometimes get Lighthouse reports that highlight boomerang.js in some of the audits. Being a RUM provider, our development team is very interested in ensuring boomerang.js is as performant and non-invasive as possible.

We recently were sent a Lighthouse report by a customer that highlighted boomerang.js as one of the top hits under the Reduce the impact of third-party code audit. I was surprised to see boomerang.js highlighted there, as we haven't seen occurrences of blocking time in our own self-audits of our boomerang.js runtime. Their report showed 989ms of Main-Thread Blocking Time (though my screenshots below will show a smaller amount (286ms) with my local repro).

Here's an example screenshot from my development machine:

(the rest of the third-parties in this audit have been chopped off for brevity and the privacy of our customer)

I started to dig into what was causing the Main-Thread Blocking Time being 286ms in this trace. In other examples, when running Lighthouse locally, I would potentially see it even higher, 500-1000ms or more.

I had a hard time correlating those 286ms to anything within the Trace itself. Once loaded in the Performance tab, I could see our script executing infrequently, but most tasks were 0.1ms or smaller, and added up (even at the microsecond granularity) the total FunctionCall time was only around ~23ms. Even taking the mobile .cpuMultiplier 4x into account that would be just 92ms.
We have a few 100ms-interval timers and a rAF callback, most of which complete in under 0.1ms. The below list is filtered to just our URL:

I additionally took several JavaScript Profiler traces, and generally found the same results: 25-50ms maximum spent via sampled profiles under our JavaScript library.

I eventually dug into the Lighthouse source to understand how the Main-Thread Blocking Time was calculated, and came across third-party-summary.js:

lighthouse/core/audits/third-party-summary.js

Lines 102 to 114 in 516d32c

    
           for (const task of mainThreadTasks) { 
        
             const attributableURL = getAttributableURLForTask(task, jsURLs); 
        
             const urlSummary = byURL.get(attributableURL) || {...defaultSummary}; 
        
             const taskDuration = task.selfTime * cpuMultiplier; 
        
             // The amount of time spent on main thread is the sum of all durations. 
        
             urlSummary.mainThreadTime += taskDuration; 
        
             // The amount of time spent *blocking* on main thread is the sum of all time longer than 50ms. 
        
             // Note that this is not totally equivalent to the TBT definition since it fails to account for FCP, 
        
             // but a majority of third-party work occurs after FCP and should yield largely similar numbers. 
        
             urlSummary.blockingTime += Math.max(taskDuration - 50, 0); 
        
             byURL.set(attributableURL, urlSummary); 
        
           }

So for each task, find its source URL and sum .blockingTime if it's over 50ms.

This function gets the source URL from getAttributeableURLForTask:

lighthouse/core/lib/tracehouse/task-summary.js

Lines 45 to 58 in 516d32c

    
           function getAttributableURLForTask(task, jsURLs) { 
        
             const jsURL = task.attributableURLs.find(url => jsURLs.has(url)); 
        
             const fallbackURL = task.attributableURLs[0]; 
        
             let attributableURL = jsURL || fallbackURL; 
        
             // If we can't find what URL was responsible for this execution, attribute it to the root page 
        
             // or Chrome depending on the type of work. 
        
             if (!attributableURL || attributableURL === 'about:blank') { 
        
               if (BROWSER_TASK_NAMES_SET.has(task.event.name)) attributableURL = 'Browser'; 
        
               else if (BROWSER_GC_TASK_NAMES_SET.has(task.event.name)) attributableURL = 'Browser GC'; 
        
               else attributableURL = 'Unattributable'; 
        
             } 
        
             return attributableURL; 
        
           }

The above function looks for any matching 3P script URLs, and falls back to the "first" URL of the task if it can't determine a cause.

When I looked back at the trace, I think I found the actual cause of the "blocking" task -- a Run Microtasks which has ~80 items. This looks possibly like the DCL event, and all related tasks executed:

The only time our boomerang.js is in that Run Microtasks of 80 sub-tasks is the very last FunctionCall, of 0.1ms:

In the trace JSON, we see that RunMicrotasks w/ 80 subtasks:

{
    "args": {
        "microtask_count": 80
    },
    "cat": "v8.execute",
    "dur": 104382,
    "name": "RunMicrotasks",
    "ph": "X",
    "pid": 41000,
    "tdur": 102588,
    "tid": 13620,
    "ts": 1112473559258,
    "tts": 1451432
},

And the previous FunctionCall happened to be our script appearing once:

{
    "args": {
        "data": {
            "columnNumber": 10854,
            "frame": "9D1A3DB0E9E508834136FE2E28B89B67",
            "functionName": "l",
            "lineNumber": 10,
            "scriptId": "41",
            "url": "https://s2.go-mpulse.net/boomerang/KX3GJ..."
        }
    },
    "cat": "devtools.timeline",
    "dur": 347,
    "name": "FunctionCall",
    "ph": "X",
    "pid": 41000,
    "tdur": 339,
    "tid": 13620,
    "ts": 1112473558883,
    "tts": 1451065
},

Debugging this in Node also confirmed that it's just a big bunch of tasks, and our boomerang.js just happens to be the first in the attributableURLs list:

{
  event: {
    args: {},
    cat: 'v8',
    dur: 104379,
    name: 'V8.RunMicrotasks',
    ph: 'X',
    pid: 41000,
    tdur: 102586,
    tid: 13620,
    ts: 1112473559259,
    tts: 1451433
  },
  endEvent: undefined,
  startTime: 1326.394,
  endTime: 1430.773,
  duration: 104.379,
  unbounded: false,
...
  attributableURLs: [
    'https://s2.go-mpulse.net/boomerang/KX3GJ...',
    'https://customer.com/xyz.html',
    'https://customer.com/assets/index.js',
    'https://customer.com/assets/PDP.js',
    'https://customer.com/assets/ImageGallery.js',
    'https://customer.com/assets/vendor.js',
    'https://customer.com/gtm.js?id=GTM-XXX,
    'https://customer.com/gtag/js?id=G-XXX
  ],
  group: {
    id: 'scriptEvaluation',
    label: 'Script Evaluation',
    traceEventNames: [
      'EventDispatch',
      'EvaluateScript',
      'v8.evaluateModule',
      'FunctionCall',
      'TimerFire',
      'FireIdleCallback',
      'FireAnimationFrame',
      'RunMicrotasks',
      'V8.Execute'
    ]
  },
  selfTime: 83.955
}

If we take that 83.955 selfTime and multiply by 4 (cpuMultiplier) minus 50ms we get the Main-Thread Blocking Time reported in the original audit of 286ms.

So unfortunately because our boomerang.js script has one task executing at DCL that took ~0.1ms, we're unluckily being blamed for a 83.955ms Microtask chunk that gets shown as 286ms of Main-Thread Blocking Time time.

d'oh!

I've run Lighthouse a bunch of times on the same URL, and some times the Main-Thread Blocking Time is 0ms, and other times it's that much higher value. It's up to luck of the draw. I suspect the times it's 0ms, it's not the [0]th entry in attributableURLs or 3P URLs list.

What did you expect?

Ultimately I think the question is how we should deal with a microtask-execution chunk when there are many microtasks from various sources involved. Task attribution is hard, and I'm not sure how to "assign blame" to a Run Microtasks that has a lot of sub-calls to various other scripts/handlers/callbacks/timeouts.

In the above examples where boomerang.js was unlucky, it's ~0.1ms task(s) happen to get blamed for the whole chunk of microtask time.

Since this audit's purpose is to highlight third-party scripts that are behaving badly, I'd like to suggest we make a change to the calculation to ensure it's more accurate and repeatable.

I'm not a Chrome or Lighthouse expert so I may be missing some context on how this all works, but with my understanding here are some possible ideas for how we could deal with microtask chunks:

Do nothing! Sometimes 3P scripts will be involved in a chonky microtasks run and may be blamed for the whole thing
Don't include Microtask execution chunks in any consideration for .blockingTime calculation for 3P URLs (this may hide real problems)
Dive a level deeper into the Microtask execution time's children, and treat each child FunctionCalls independently for .blockingTime consideration.
Dive a level deeper into the Microtask execution time's children, and accumulate time in child FunctionCalls grouped by URL. Consider the sum() for each URL for .blockingTime calculation.
Dive a level deeper into the Microtask execution time's children, and only accumulate time in child FunctionCalls etc that are related to that specific URL. Combine all related ones for that URL in that Run Microtasks, and if it's > 50(?)% of the total Run Microtasks execution time, declare that URL to be blamed for the whole Run Microtasks.

Personally, I think (3) sounds the most logical -- essentially, group each execution of Microtasks by the 3P URLs and treat their weight of runtime independently, to see if they cross the .blockingTime threshold (50ms).

I'm happy to take a stab at the code changes for this, if there's agreement on the approach.

What have you tried?

No response

How were you running Lighthouse?

CLI, Chrome DevTools

Lighthouse Version

10.0.1

Chrome Version

113.0.0.0

Node Version

16.19.1

OS

Windows

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

yoavweiss · 2023-03-01T19:28:29Z

Thank you so much for the great bug report!!

I think there are 3 questions here:

Why is the browser running all these unrelated functions as a single long task? @noamr mentioned today that this could be what the spec says in case of deferred scripts, but maybe we can change that somehow?
Are these tasks really microtasks? I don't think that e.g. the DCL event handler should be a microtask. Maybe they are being mislabeled?
Why is Lighthouse identifying any particular script as the guilty party here, where it's clearly a group effort? :) @paulirish @patrickhulce may have insights there

patrickhulce · 2023-03-07T20:08:48Z

Heyo 👋 excellent bug report 👍 thank you!

I can help provide some explanation as to how we arrived here. This deck is helpful for explaining how Lighthouse attribution is supposed to work.

To answer @yoavweiss 's formulation of the question "Why is Lighthouse identifying any particular script as the guilty party here, where it's clearly a group effort? :)" the short answer is our view is that long tasks are almost always a group effort and we attempt to find a reasonable proximate cause.

We've taken the approach that proximate cause from Lighthouse's perspective means who is responsible for scheduling this work rather than performing it.

To many of your questions @nicjansma, we don't really care what actually executed inside the RunMicrotasks at all and only use first executor when we lack instrumentation from Chrome about who scheduled the task (as a best guess for, it's rather likely that whoever scheduled the work will call their own code first before delegating to other scripts/libraries). I would've expected to be able to tie RunMicrotasks back to its scheduling script without this fallback case being triggered though 🤔 , so perhaps that's where a bug has been introduced at some point in the last few years since this was written?

Why is the browser running all these unrelated functions as a single long task? @noamr mentioned today that this could be what the spec says in case of deferred scripts, but maybe we can change that somehow?
Are these tasks really microtasks? I don't think that e.g. the DCL event handler should be a microtask. Maybe they are being mislabeled?

+1 to these questions. Combining with the above, it'd be really helpful to understand a human expert's manual assessment of the proximate cause description found in the LH deck to be able to pinpoint where exactly Chrome tracing/Lighthouse fell off the wagon.

paulirish · 2023-03-22T19:54:22Z

@nicjansma I just want to say Thank Tou for a phenomenal bug report. I'm sorry we haven't responded yet.. and this isn't a proper response either. We've got a few things going on.

But.. the short of it is: your bug is legit and it'll take us a little bit to draft a response. Attribution is a hard problem with indeterminate solutions. But we want attribution/blame to make sense to developers, point the way to actionability, be fair, and treat things consistently across tools. I'm eager figure the right solution.

Edit: previously: #7005 #7059 slides #8526

noamr · 2023-03-23T05:00:41Z

Why is the browser running all these unrelated functions as a single long task? @noamr mentioned today that this could be what the spec says in case of deferred scripts, but maybe we can change that somehow?

There could be several reasons why unrelated functions would run in the same task:

they are all handlers (then/catch) of the same promise, e.g. a fetch()
They are all event listeners of events that are fired synchronously
It's the DCL task, which runs all the available deferred & module scripts plus several event handlers

In this case, it seems like the long task is caused by several event handlers that happen at DCL time, some to DCL itself and some (like the first one) to readyStateChange. If you had deferred scripts they would probably make that task even longer.

One of the actual causes of long tasks here btw is DOZENS of style thrashes (coming from calls to getComputedStyle). Tracing this page with LoAF did show this.

nicjansma · 2023-08-16T14:52:17Z

Hi team!

Just a note that we're seeing additional cases of customers reporting this issue to us.

In this Lighthouse trace, boomerang.js was being blamed for 3,256ms of Main-Thread Blocking Time.

Upon looking at the trace, we found a 809.8ms (*4x) Run Microtasks where boomerang.js executed once for 9.6ms:

Total boomerang.js CPU usage was around 93ms for the whole trace.

nicjansma added the bug label Mar 1, 2023

github-actions bot assigned brendankenny Mar 1, 2023

devtools-bot added the needs-priority label Mar 1, 2023

adamraine added needs-investigation P1.5 and removed needs-priority labels Aug 16, 2023

adamraine mentioned this issue Oct 16, 2023

core(trace): enable JS samples for advanced workflows #15542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843

Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843

nicjansma commented Mar 1, 2023

yoavweiss commented Mar 1, 2023

patrickhulce commented Mar 7, 2023 •

edited

Loading

paulirish commented Mar 22, 2023 •

edited

Loading

noamr commented Mar 23, 2023

nicjansma commented Aug 16, 2023

Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843

Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843

Comments

nicjansma commented Mar 1, 2023

FAQ

URL

What happened?

What did you expect?

What have you tried?

How were you running Lighthouse?

Lighthouse Version

Chrome Version

Node Version

OS

Relevant log output

yoavweiss commented Mar 1, 2023

patrickhulce commented Mar 7, 2023 • edited Loading

paulirish commented Mar 22, 2023 • edited Loading

noamr commented Mar 23, 2023

nicjansma commented Aug 16, 2023

patrickhulce commented Mar 7, 2023 •

edited

Loading

paulirish commented Mar 22, 2023 •

edited

Loading