Leaks in mochitests with MV3 extensions content scripts
Categories
(WebExtensions :: General, defect, P2)
Tracking
(firefox96 fixed)
Tracking | Status | |
---|---|---|
firefox96 | --- | fixed |
People
(Reporter: willdurand, Assigned: rpl)
References
Details
(Whiteboard: [addons-jira])
Attachments
(1 file)
:robwu and I looked into some test failures caused by leaks in mochitests for Bug 1740601. I was able to reproduce with debug build artifacts on MacOS.
The root cause seems to be an issue with manifest version 3 extensions under certain conditions. In order to reproduce, one can take an existing mochitest (not part of Bug 1740601) like https://searchfox.org/mozilla-central/rev/df18dd52da04ee2bad434b0ba2d9fcb196d4d15e/toolkit/components/extensions/test/mochitest/test_ext_contentscript_permission.html and apply this patch, which turns a MV2 test extension into a MV3 test extension:
diff --git a/toolkit/components/extensions/test/mochitest/test_ext_contentscript_permission.html b/toolkit/components/extensions/test/mochitest/test_ext_contentscript_permission.html
index 8ab4b1fb28812..21f366ae73d91 100644
--- a/toolkit/components/extensions/test/mochitest/test_ext_contentscript_permission.html
+++ b/toolkit/components/extensions/test/mochitest/test_ext_contentscript_permission.html
@@ -31,11 +31,16 @@ add_task(async function test_contentscript() {
let extensionData = {
manifest: {
- permissions: ["<all_urls>"],
+ manifest_version: 3,
+ host_permissions: ["<all_urls>"],
},
background,
};
+ await SpecialPowers.pushPrefEnv({
+ set: [["extensions.manifestV3.enabled", true]],
+ });
+
let extension = ExtensionTestUtils.loadExtension(extensionData);
await extension.startup();
Running ./mach test
will report leaks (all the tests pass, though):
[...]
1:13.61 INFO == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 78383
1:13.61 INFO
1:13.61 INFO |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
1:13.61 INFO | | Per-Inst Leaked| Total Rem|
1:13.61 INFO 0 |TOTAL | 45 4952| 445848 56|
1:13.62 INFO 63 |CondVar | 80 80| 51 1|
1:13.62 INFO 220 |Mutex | 96 192| 490 2|
1:13.62 INFO 440 |SubstitutingURL | 336 1344| 106 4|
1:13.62 INFO 450 |ThreadEventTarget | 64 64| 11 1|
1:13.62 INFO 454 |ThreadTargetSink | 16 16| 11 1|
1:13.62 INFO 485 |WeakReference | 32 128| 95 4|
1:13.62 INFO 557 |nsAuthURLParser | 24 48| 2 2|
1:13.62 INFO 573 |nsCSPContext | 128 256| 5 2|
1:13.62 INFO 688 |nsJSPrincipals | 24 144| 157 6|
1:13.62 INFO 741 |nsStandardURL | 336 2016| 3685 6|
1:13.62 INFO 746 |nsStringBuffer | 8 208| 16970 26|
1:13.62 INFO 781 |nsThread | 456 456| 19 1|
1:13.62 INFO
1:13.62 INFO nsTraceRefcnt::DumpStatistics: 820 entries
1:13.62 INFO leakcheck: tab leaked 1 CondVar
1:13.62 INFO leakcheck: tab leaked 2 Mutex
1:13.62 INFO leakcheck: tab leaked 4 SubstitutingURL
1:13.62 INFO leakcheck: tab leaked 1 ThreadEventTarget
1:13.62 INFO leakcheck: tab leaked 1 ThreadTargetSink
1:13.62 INFO leakcheck: tab leaked 4 WeakReference
1:13.62 INFO leakcheck: tab leaked 2 nsAuthURLParser
1:13.62 INFO leakcheck: tab leaked 2 nsCSPContext
1:13.62 INFO leakcheck: tab leaked 6 nsJSPrincipals
1:13.62 INFO leakcheck: tab leaked 6 nsStandardURL
1:13.62 INFO leakcheck: tab leaked 26 nsStringBuffer
1:13.62 INFO leakcheck: tab leaked 1 nsThread
1:13.62 UNEXPECTED-FAIL leakcheck: tab 4952 bytes leaked
[...]
The problematic code seems to be located after this block: https://searchfox.org/mozilla-central/rev/df18dd52da04ee2bad434b0ba2d9fcb196d4d15e/js/xpconnect/src/Sandbox.cpp#1193-1195 because some of the code after this if
block is mentioned in the leakcheck report above.
The expectation is to have no leaks.
Updated•3 years ago
|
Reporter | ||
Comment 1•3 years ago
|
||
Please re-enable toolkit/components/extensions/test/mochitest/test_ext_scripting_executeScript.html
in debug mode when this bug gets fixed.
Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
Hey Will,
Today I was able to focus a bit on this and take a proper look as I promised, and I suspect that the leak may be triggered by the ExpandedPrincipal
and the nsCSPContext
keeping each other alive forever, because each of them is keeping a nsCOMPtr
to the other one:
- The
nsCSPContext
instance is getting ansCOMPtr
to theExpandedPrincipal
(set on its mLoadingPrincipal member data as part of the call tonsCSPContext::SetRequestContextWithPrincipal
) - Then the
ExpandedPrincipal
instance is keeping ansCOMPtr
to thensCSPContext
(set on itsmCSP
member data as part of the call toExpandedPrincipal::SetCsp
)
To confirm this theory I did try locally the following couple of temporary changes:
- first I tried to just comment out just the call to
expanded->SetCsp
and then tried to runtest_ext_scripting_executeScript.js
locally to confirm that no shutdown leak was detected - then I tried another approach: to extract the extension principal from the
ExpandedPrincipal
and then use that one in the call tocsp->SetRequestContextWithPrincipal
(and uncommentexpanded->SetCsp
) and then tried to runtest_ext_scripting_executeScript.js
locally to confirm that no shutdown leak was detected [^1]
In both cases no shutdown leak was detected, the second one did even pass the MV3 CSP tests landed along with Bug 1594234 (but even if it did pass those tests, I haven't really double-checked if using the extension principal would have other side-effect that would not match what we want that CSP to enforce in the long run, I consider it just a test I did to confirm my theory, more than an actual proposed fix).
I see that we do have a bug already filed to move the CSP out of the ExpandedPrincipal: Bug 1548468 - Move CSP off ExpandedPrincipal
That actually sounds like a better and proper fix for this leak.
It looks that Niklas took that bug not long ago (~2 months ago), and so it seems worth needinfo-ing Niklas to get an additional perspective about what I described above and touch base about which plans we do have for Bug 1548468 (so that we can confirm if the direction we are
aiming for is going to take care of this issue too).
[^1]: this one actually surprised me a bit, because technically ExpandedPrincipal
does have an array of nsCOMPtr<nsIPrincipal>
of the principals part of it, but I also see that ExpandedPrincipal
does have custom implementation for AddRef
and Release
which are calling nsJSPrincipals::AddRef
/Release
, and nsJSPrincipals
was part of the detected shutdown leaks output, and so that may be related to why not using the expanded principal for the SetRequestContextWithPrincipal
was also enough to prevent the leak.
Assignee | ||
Comment 3•3 years ago
|
||
Hi Niklas,
as I described in comment 2, I suspect that this leak may be due to ExpandedPrincipal and nsCSPContext keeping each other alive (which, if that's the case as it seems, would be trigger by this lines at the end of ApplyAddonContentScriptCSP), and I did notice we do have Bug 1548468 for moving CSP out of ExpandedPrincipal.
I see you assigned Bug 1548468 to yourself not long ago, may I ask you if you have already a plan for the changes we are going to introduce for Bug 1548468 and your perspective about the leak?
Thank you in advance for your help!
Assignee | ||
Comment 4•3 years ago
|
||
This patch is meant to be a proposed short run fix to prevent ApplyAddonContentScriptCSP
from leaking the ExpandedPrincipal and nsCSPContext instance because they keep a reference
to each other.
This patch prevent that leak by creating a clone of the ExpandedPrincipal and then use
that cloned instance in the call to nsCSPContext::SetRequestContextWithPrincipal.
Once Bug 1548468 will move the CSP off the ExpandedPrincipal class, cloning the expanded
principal to prevent that leak should not be necessary anymore.
Assignee | ||
Comment 5•3 years ago
|
||
Push to try for the patch in comment 4 (plus a temporary patch that is only meant to force test_ext_scripting_executeScript.html to be executed as part of the TV job):
Hi Luca,
I assigned myself to Bug 1548468 and then i got sidetracked with shipping SameSite=Lax by default
, once we are done with that i will pick up the work again.
We are still having architectural discussions around where the CSP should move to, there is no concrete plan yet but Rob suggested moving it to the sandbox. For now i think your approach in https://phabricator.services.mozilla.com/D132144 is good to avoid the leak and we will keep this in mind for Bug 1548468, because as you mentioned moving the CSP somewhere else should also take care of the leak.
Updated•3 years ago
|
Comment 7•3 years ago
|
||
I did a deeper dive into the code (and attached implementation guidance to bug 1548468 while I was at it).
This bug is caused by nsCSPContext
and ExpandedPrincipal
having strong references to each other.
This used to not always be the case: when the dependencies were added at first, nsCSPContext
held a raw pointer to the principal, but the principal's destructor explicitly cleared the pointer: https://hg.mozilla.org/mozilla-central/rev/acc983ca0dec#l6.12
When ExpandedPrincipal
was introduced (extracted from the base principal) (in bug 1341250), the clearLoadingPrincipal
call was not included: https://hg.mozilla.org/mozilla-central/rev/31ca9ebd033a#l2.77 (note: diff is relative to original file, removed means not included in new ExpandedPrincipal, not removed from the source file).
When mCSP
was moved off BasePrincipal
(to ExpandedPrincipal
, the last trace of clearLoadingPrincipal
was removed (https://hg.mozilla.org/mozilla-central/rev/ddf4012a7652e36d144fc43bc99335276deeafbe#l3.30), because the raw principal pointer in CSPContext changed to a RefPtr
(in https://hg.mozilla.org/mozilla-central/rev/ddf4012a7652e36d144fc43bc99335276deeafbe#l53.48).
Consequently ExpandedPrincipal and nsCSPContext started to potentially have strong references to each other. This condition was ultimately triggered by the changes in bug 1581611 (note: preffed-off by-default, and the implementation changed later, still preffed-off by default).
TL;DR of analysis:
The original design of the "expanded principal" implementation accounted for the cyclic dependency by having one RefPtr and one raw pointer, together with an explicit call to clearLoadingPrincipal
.
The current implementation still has the clearLoadingPrincipal
method but with no callers at all, and RefPtr<nsIContentSecurityPolicy>
(i.e. nsCSPContext
) + nsCOMPtr<nsIPrincipal>
(e.g. ExpandedPrincipal
) that may keep each other alive.
The patch above solves the issue by cloning the ExpandedPrincipal
before assigning it to the nsCSPContext
. An alternative would be to call clearLoadingPrincipal
when the sandbox is destructed.
The ideal solution is to move mCSPContext
off ExpandedPrincipal
to the "sandbox" (as explained in https://bugzilla.mozilla.org/show_bug.cgi?id=1548468#c2). Since that will be worked on, I suggest to also remove the last trace of clearLoadingPrincipal
: https://searchfox.org/mozilla-central/rev/7fe9421af35256a95acc4620e9e0b76df7867287/dom/security/nsCSPContext.h#127-131
Assignee | ||
Updated•3 years ago
|
Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/7907a0b90419 Fix ExpandedPrincipal and nsCSPContext leaks triggered by ApplyAddonContentScriptCSP. r=ckerschb,robwu
Assignee | ||
Updated•3 years ago
|
Comment 9•3 years ago
|
||
bugherder |
Description
•