Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate cache when using import #49442

Open
jonathantneal opened this issue Apr 5, 2019 · 141 comments
Open

Invalidate cache when using import #49442

jonathantneal opened this issue Apr 5, 2019 · 141 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js.

Comments

@jonathantneal
Copy link

How do I invalidate the cache of import?

I have a function that installs missing modules when an import fails, but the import statement seems to preserve the failure while the script is still running.

import('some-module').catch(
  // this catch will only be reached the first time the script is run because resolveMissingModule will successfully install the module
  () => resolveMissingModule('some-module').then(
    // again, this will only be reached once, but it will fail, because the import seems to have cached the previous failure
    () => import('some-module')
  )
)

The only information I found in regards to import caching was this documentation, which does not tell me where the “separate cache” used by import can be found.

No require.cache

require.cache is not used by import. It has a separate cache.
https://nodejs.org/api/esm.html#esm_no_require_cache

@devsnek
Copy link
Member

devsnek commented Apr 5, 2019

the import cache is purposely unexposed. adding a query has been the generally accepted ecosystem practice to re-import something.

however, a failure to import something will not fill the cache.

this trivial program works fine for me (assuming nope.mjs does not exist):

import fs from 'fs';

import('./nope.mjs')
  .catch(() => fs.writeFileSync('./nope.mjs'))
  .then(() => import('./nope.mjs'))
  .then(console.log);
@jonathantneal
Copy link
Author

@devsnek, hmm, might this be limited to imports that use node_modules? This similarly trivial program fails for me the first time, but not the second.

import child_process from 'child_process';

import('color-names')
  .catch(() => child_process.execSync('npm install --no-save color-names'))
  .then(() => import('color-names'))
  .then(console.log);
@bmeck
Copy link
Member

bmeck commented Apr 5, 2019 via email

@devsnek
Copy link
Member

devsnek commented Apr 5, 2019

if its just happening with node_modules it could be #26926

@MylesBorins
Copy link
Contributor

can this be closd?

@jkrems
Copy link
Contributor

jkrems commented Aug 29, 2019

I think a use case like this would hopefully be implemented as a loader. Do we already track this as a use case in that context?

@bmeck
Copy link
Member

bmeck commented Aug 30, 2019

@jkrems we have old documents with that as a feature, but no success criteria examples.

@giltayar
Copy link
Contributor

FYI, I'm implementing ESM support in Mocha (mochajs/mocha#4038), and cannot currently implement "watch mode", whereby Mocha watches the test files, and reruns them when they change. So "watch mode" in Mocha, in the first iteration, will probably not support ESM, which is a bummer.

While we could use cache busting query parameters, that would mean that we are always increasing memory usage, and old and never-to-be-used versions of the file will continue staying in memory due to the cache holding on to them.

And I'm not sure a loader would help here, as the loader also has no access to the cache.

@guybedford
Copy link
Contributor

guybedford commented Nov 26, 2019 via email

@devsnek
Copy link
Member

devsnek commented Nov 26, 2019

I'm really not a fan of the idea of our module cache being anything except insert-only. CJS cache modification is bad already, and CJS modules don't even form graphs.

Additionally, other runtimes (like browsers) will never expose this functionality, so some alternative system will have to be used for them regardless of what node does, in which case it seems like that system could just be used for node.

@bmeck
Copy link
Member

bmeck commented Nov 26, 2019

@giltayar have you looked into using Workers or other solutions to have a module cache that you can destroy (such as by killing the Worker)?

@giltayar
Copy link
Contributor

@bmeck - interesting. That would mean that the tests themselves run in Workers. While I am theoretically familiar with workers, I haven't yet had any experience with them: is any code that runs in the main process compatible with worker inside a worker? In other words, compatibility-wise, would all test code that works today in the "main process" work inside workers?

I wouldn't want Mocha to have a version (even a semver-major breaking one) where developers will need to tweak their code because now it's running inside a worker. I'm guessing that there's a vast amount of that code running inside Mocha, and any incompatibility would be a deal breaker.

@devsnek
Copy link
Member

devsnek commented Nov 28, 2019

there are differences between workers and the main thread, mostly surrounding the functions on process, like process.exit() in a worker doesn't end the process, just the thread. There's a good list here: https://nodejs.org/api/worker_threads.html#worker_threads_class_worker

@giltayar
Copy link
Contributor

Looking at the list, I can see process.chdir() is not available, which is probably a deal breaker in many tests (unit tests probably don't use process.chdir(), but Mocha is used for all sorts of tests), as is breaking some native add-ons (although I'm not sure how big of a problem this is in the real world).

I would hesitate to say this, as my only contribution to Mocha currently is this pull request, but I would guess that the owners would veto this. Or maybe allow this only if we add a --run-in-workers option. In any case, without looking too much at the code, this is probably a significant investment to implement for supporting ES Modules, as this is not a simple refactor, but rather an architectural change in how Mocha works.

@giltayar
Copy link
Contributor

If it wasn't apparent from the above, I believe I would still prefer a "module unloading" API, unless the working group is adamant and official about not having one, of course. Which would probably mean going the "subprocess"/"worker" route.

@devsnek
Copy link
Member

devsnek commented Nov 28, 2019

i admittedly don't know much about mocha... is using a separate process not doable either?

@giltayar
Copy link
Contributor

I'll go back to the Mocha contributors team with this.

@boneskull
Copy link
Contributor

Hi, I work on Mocha!! I am trying to see how we can move @giltayar's PR forward.

There are actually two situations in which "module unloading" is needed in Mocha:

  1. In "watch" mode with CJS scripts, when Mocha detects a file must be reloaded, it is deleted from require.cache and re-required, then tests are re-run. Mocha is not the only tool that does this sort of cache-busting.
  2. When developers are writing tests with Mocha (and many other test frameworks), they may want to use module-level mocking--they essentially replace one module with another phony one (I'm going to tag @theKashey here because he knows more about this ). Or even pretend like a module does not exist at all. It is then very important to Mocha that users can consume these sort of mocking frameworks to write their test code.

In the first case, it's possible, though probably at a performance cost, Mocha could leverage workers to handle ESM. I don't know enough about workers to say whether this will provide a sufficient environment for the test cases, but it feels like a misuse of the workers feature. At minimum it seems like a lot of added complexity.

In the second case, I can't see how using workers would be feasible. Test authors need to be able to mock modules on-the-fly and reference them directly from test cases, using mocking frameworks.


I don't know why this sort of behavior was omitted from the official specification. If the reasons involve "browser security", well, it further reinforces that browsers are a hostile environment for testing. I do know that this behavior is a very real need for many, from library and tooling authors down to developers working on production code.

We do need an "unload module" API; until such a thing lands, tools will be limited, implementations will be difficult (if possible), and end users will be frustrated when their tests written in ESM don't work. I will also be frustrated, because those frustrated users will complain in Mocha's issue tracker!

I'm happy to talk in further detail about use-cases, but I'm eager to put an eventual API description in the more-capable hands of people like @guybedford.

@devsnek Given that enabling it also enables tooling, I'm curious why you feel locking this sort of thing down is a better direction?

cc @nodejs/tooling

P.S. I will be at the collab summit, and the tooling group will be hosting a collaboration session; maybe this can be a topic of discussion, or vice-versa if there's a modules group meeting...?

@theKashey
Copy link

Do you need unloading API for the watch mode? Yes, you need it to update the changed module code.

However, it is enough to handle watch mode? No, as long as the idea is to use changed module, as you have to find parents between you(a test) and changed module, and wipe them to perform a proper reinitialization.

So - an ability to invalidate a cache line is not enough, for the mocking task we also have to know the cache graph, we could traverse and understand which work should be done.

An API for unloading modules certainly makes sense.

In my opinion - this feature is something missing for a proper code splitting. There are already 100Mb bundles, separated into hundreds of pieces, you will never load simultaneously. But you if will - there is no way to unload them. Eventually, the Page or an Application would just crash.

@giltayar
Copy link
Contributor

giltayar commented Dec 4, 2019

@boneskull - the second case you mentioned, I believe can and should be handled by module "loaders", which are a formal way to do "require hooks" for ESM. These will enable testing frameworks (like sinon and others) to manipulate how ES modules are loaded, and, for example, exchange other modules for theirs.

The spec and implementation for that are actively being discussed and worked on by the modules working group (see nodejs/modules#351).

@jonerer
Copy link

jonerer commented Feb 19, 2020

I also need this. I'm making a template rendering engine. When generating the compiled template, I read from a custom format and output to a .js file (a standard ES Module). In order to use the file, I just import it. Upon file changes, I would like to re-write the file, clear the import cache and then re-import it.

@devsnek
Copy link
Member

devsnek commented Feb 19, 2020

These all sound like use cases for V8's LiveEdit debug api (https://chromedevtools.github.io/devtools-protocol/v8/Debugger#method-setScriptSource). You can call into it using https://nodejs.org/api/inspector.html. cc @giltayar @boneskull

@georges-gomes
Copy link

+1 for unloading ES Modules.
It's hard to make Hot Module Reload otherwise. Not for production but for development tools.
And using a ?query=x doesn't seem to work on file node 13.11.0 at least.
Thanks

@georges-gomes
Copy link

@devsnek Can you provide a little example or pseudo-code on usage of setScriptSource. I have been researching for an 1hour without progress. Thanks

@georges-gomes
Copy link

@devsnek ok I progressed, I will post my findings back

@devsnek
Copy link
Member

devsnek commented Mar 20, 2020

@georges-gomes you can subscribe to the Debugger.scriptParsed event to track the script id, and then when you need to modify the script you can call Debugger.setScriptSource.

@jonerer
Copy link

jonerer commented Mar 20, 2020

@georges-gomes If you are successful, I would be very grateful if you could post a short description on how you could use setScriptSource to solve this problem. On a blog post or something.

@laverdet
Copy link
Contributor

There’s another cache within V8 for all the ES modules that have ever been loaded and evaluated, and as far as I know there’s still no API for removing or updating those.

This is a common misconception but it is not true and I don't believe it has ever been true. I've refuted it using isolated-vm here #49442 (comment) and here nodejs/loaders#157 (comment). Module records in v8 are no more special than a plain object. Since the fix in nodejs v20.8.0 you can also prove this using SourceTextModule of node:vm.

Or were you suggesting the experimental status just because it’s strongly discouraged?

Yes, I would discourage its use. require.register and require.cache contributed to the total anarchy we've seen under CommonJS that the ecosystem is still feeling to this day. The evict function explicitly breaks the invariants promised under the specification. I'm clearly ok with reality-altering features (see fibers) but I do want to make sure I communicate the nature of what I've proposed.

@GeoffreyBooth
Copy link
Member

This is a common misconception but it is not true and I don’t believe it has ever been true.

Then are you using such an API, and if not, why not? Wouldn’t replacing modules loaded into V8 solve memory leaks?

@laverdet
Copy link
Contributor

Then are you using such an API, and if not, why not? Wouldn’t replacing modules loaded into V8 solve memory leaks?

I'm not sure I understand. There is no v8 API for this. Simply, once there are no more handles to a module it is garbage collected, in the same way a JSON object or any other value would be garbage collected. Since the module is in node's loadCache forever it can never be collected. There's no replacing a module, in the same way you can't modify the source code of a function after it's been created.

@mcollina
Copy link
Member

mcollina commented Nov 1, 2023

We have an implementation of the "hack" (with the leak) in https://github.com/platformatic/platformatic/blob/main/packages/runtime/lib/loader.mjs. Works great.

Note that we don't do full HMR. Basically we reload the app, and we have special code to not close the HTTP server and live restart a Fastify application. This is mostly side effect free.

Note that quite a lot of people are working towards a world where this is possible. My NodeContext PR stalled because of memory leaks in instantiating Node.js core objects, but the team is busy fixing those.

@GeoffreyBooth
Copy link
Member

If the leak is only caused by loadCache / “module map”, then I think it should be fine to have an API like import { clearCache } from 'node:module' that allows you to delete a particular entry (clearCache(moduleAbsoluteURL) or even all entries; that’s just like clearing the cache in a browser. I think the only consequence would be that a subsequent import might produce a different result rather than getting the previously cached result, but that’s also similar to a browser cache having been cleared. I know the spec says that subsequent imports must return the same result, but I assume that that means “in general usage,” not if the user has specifically instructed the runtime to not return a cached result.

@laverdet
Copy link
Contributor

laverdet commented Nov 3, 2023

that’s just like clearing the cache in a browser

It's not a good comparison, since the browser cache doesn't affect correctness. I mentioned that here, that I think "cache" is a poor choice of name for this since it is a matter of correctness and not performance.

I think the only consequence would be that a subsequent import might produce a different result rather than getting the previously cached result, but that’s also similar to a browser cache having been cleared.

Clearing the browser cache of a running web page will not affect the result of import(). It has no observable effect, except on timing.

I know the spec says that subsequent imports must return the same result, but I assume that that means “in general usage,” not if the user has specifically instructed the runtime to not return a cached result.

The language in the specification is clear, and our proposed function is a violation of the invariants expected of the host. As a matter of personal style I am totally ok violating any rule for any reason, but I do want to make sure we're on the same page that this is a dangerous and powerful violation of the specification.

@GeoffreyBooth
Copy link
Member

I do want to make sure we're on the same page that this is a dangerous and powerful violation of the specification.

I mean, the while point of the module customization hooks is to allow the user to violate spec however they please (that Node can achieve). It's not like importing CoffeeScript is spec compliant. If we're worried about dependencies using this API to mischievous ends we could gate it behind a flag or create a permission for it; but I'm not sure what the risk is.

@laverdet
Copy link
Contributor

laverdet commented Nov 4, 2023

It's not like importing CoffeeScript is spec compliant.

It is, though. CoffeeScript would be a Cyclic Module Record (a module which participates in the specified cyclic resolution algorithm). Cyclic Module Record is described in the specification as "abstract".

Source Text Module Record, which is the ES Modules that we all know, is a concrete implementation of that abstract interface. So a CoffeeScript module would be a separate but valid implementation of a Cyclic Module Record.

If you go up one level there is a plain Module Record, from which Cyclic Module Record is implemented. This provides the means for modules which don't participate in the cyclic resolution algorithm (wasm, json, or the whole node: scheme). Neither CoffeeScript, WASM, or TypeScript are specified in ES262 but they are still spec compliant.

Anyway, you are right that the loaders API provides the means to break specification, since the result of resolve is neither pure nor is it memoized. This breaks the invariants of HostLoadImportedModule: "If this operation is called multiple times with the same (referrer, specifier) pair [then it must resolve] with the same result each time". I suppose the difference here is that we are proposing a globally-importable utility which could be used outside of a loader.

but I'm not sure what the risk is

The risks are impossible to enumerate since we're reneging on an presumed invariant. Like what happens if a user attempts to evict node:fs? Idk, will it segfault?

Anyway the risks are probably mostly hypothetical in nature. I'll try and open a PR soon to continue the discussion.

@laverdet
Copy link
Contributor

I'm abandoning the PR at #50618. After more reflection I think this is going to harm the ecosystem more than it will do any good. From what I can tell there are 3-4 different use-cases mentioned in this issue which can be solved in other ways:

Q: "How do I retry a module whose file content was created after a failed import"
A: Just reimport it with a cache bust: await import("failed-module?retry=1")

Q: "How do I reload a module and all its dependencies"
A: Use a loader which carries forward the cache bust: #50618 (comment)

Q: "How do I add module support to unit test frameworks"
A: Use vm.SourceTextModule

Q: "How do I hot reload modules during development"
A: Use dynohot

@pygy
Copy link

pygy commented Jan 18, 2024

@laverdet Does this look fine to you (this is a reworked version of #50618 (comment))?

https://github.com/pygy/esm-reload

@laverdet
Copy link
Contributor

@pygy The loader code looks correct and does what it says it does.

import * as mDev from './my-module.js?dev'
process.env.NODE_ENV='production'
import * as mProd from './my-module.js?prod'

This counter-example from the documentation doesn't make sense though. Imports are not executed in an imperative manner, so the body of ./my-module.js?prod will actually run before process.env.NODE_ENV='production' is evaluated. In fact you can't even guarantee that ./my-module.js?dev will run before ./my-module.js?prod without also looking at the rest of the module graph.

@pygy
Copy link

pygy commented Jan 18, 2024

Good catch, thanks this should have been dynamic imports.

Fixed, and I added an example with dependencies for extra clarity:


With dependencies

Suppose these files:

// foo.js
export {x} from "./bar.js"

// bar.js
export const x = {}

We can then do

import "esm-reload"

const foo1 = await import("./foo.js?instance=1")
const bar1 = await import("./bar.js?instance=1")

const foo2 = await import("./foo.js?instance=2")
const bar2 = await import("./bar.js?instance=2")

assert.equal(foo1.x, bar1.x)
assert.equal(foo2.x, bar1.x)

assert.notEqual(bar1.x, bar2.x)

Edit again: https://www.npmjs.com/package/esm-reload

Copy link
Contributor

There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the never-stale Mark issue so that it is never considered stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment.
For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Jul 17, 2024
@simlu
Copy link

simlu commented Jul 17, 2024

Should probably close this as "wont do"?

@github-actions github-actions bot removed the stale label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js.