Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce code size by optionally refactoring errors into a function #1098

Closed
fabiospampinato opened this issue Oct 6, 2019 · 7 comments
Closed
Milestone

Comments

@fabiospampinato
Copy link

fabiospampinato commented Oct 6, 2019

I'm using ajv in a library for managing settings, and while optimizing it I think I discovered a few potentially good ideas for optimizing ajv's initialization time.

ajv is plenty fast when validating, but in order to get there one has to first instantiate it and compile the schema, which are not particularly fast operations.

Lazy loading dependencies

It takes about ~30ms to require ajv on my machine, that's not too bad but I think it could be brought down further by lazy loading dependencies once they are actually about to be used.

Disabling schema caching and serialization

There's no standalone option for disabling schema caching and serialization, currently one has to set the following options, which are a bit tedious to write (and perhaps kind of difficult to come up with):

{
  serialize: false,
  cache: {
    put: () => {},
    get: () => {},
    del: () => {},
    clear: () => {}
  }
}

I think there are some very common use cases where schema caching is just not useful, for example if I only have a single schema, or if I never re-compile the same schemas, then this is just:

  1. wasting time serializing objects.
  2. wasting time importing fast-json-stable-stringify without using it (under some scenarios at least).
    • I'm also not sure fast-json-stable-stringify is needed, JSON.stringify is faster, and although the order of object properties is not guaranteed in JS in practice V8 keeps them ordered, I think the other major engines probably do the same, and they aren't going to change this for backwards compatibility.
  3. wasting memory storing the string representation of our schemas, plus potentially older compiled schemas no longer in use.

I think a standalone option for disabling schema caching entirely, and a mention in the readme that a slight performance and memory gain can be achieved by disabling it, would be a good idea.

Shorter validation functions

This is perhaps the biggest optimization opportunity I found.

The generated validation function is too long and grows too fast. I've attached a sample source code generated by compiling a very simple schema with just about 100 properties in it, it weighs 40kb!

As soon as the schema becomes more complicated, and especially if more properties are added, the generated source code may very well weigh 1mb or more. All this source code wastes memory and it needs to be parsed and that's not free, especially on mobile.

Most of the source code is actually about generating error objects and updating the related counters for each validation block. I think this issue should be addressed like so:

  • when it.level === 0 an helper function for generating the error object for that specific validation function should be outputted, then it should be called in the rest of the file when generating the error objects.
    • Most of the validations pass and validation is usually stopped as soon as the first error is found, so the extra function call here won't have any even remotely meaningful cost performance-wise, perhaps the engine might even inline it itself.

I think the generated source code under my example scenario could weigh about 15kb rather than 40kb just by implementing this, a reduction of about 60%.

source.js


I hope these suggestions are useful, they might not sound like much under some scenarios, but I think they can make the difference under others.

@epoberezkin
Copy link
Member

@fabiospampinato thanks for the suggestions

Disabling schema caching and serialization

The simplest change without adding options would be to add cache: false (and not call any cache methods in this case), and add the suggested note to readme.

Shorter validation functions

I thought about it and tried extracting error generation into a helper function long time ago, but it is not as simple as it may seem - it requires substantial re-factoring... See this branch https://github.com/epoberezkin/ajv/commits/error-function

@fabiospampinato
Copy link
Author

fabiospampinato commented Nov 26, 2019

The simplest change without adding options would be to add cache: false (and not call any cache methods in this case), and add the suggested note to readme.

Sounds like the best solution 👍.

I thought about it and tried extracting error generation into a helper function long time ago, but it is not as simple as it may seem - it requires substantial re-factoring... See this branch error-function (commits)

Does it "just" require a lot of manual refactoring or are there any technically challenging problems of this approach? Because I'm using it here, and other than for writing/reading kind-of minified JS (which could still be generated via an external tool to keep the code readable) it was relatively simple to do.

@epoberezkin
Copy link
Member

epoberezkin commented Nov 26, 2019

Does it "just" require a lot of manual refactoring or are there any technically challenging problems of this approach? Because I'm using it here, and other than for writing/reading kind-of minified JS (which could still be generated via an external tool to keep the code readable) it was relatively simple to do.

As I wrote it was long time ago :) I honestly don't remember what blocked me when I tried. It was likely some challenge on the template level where there are lots of template level conditionals in string constants...

@epoberezkin
Copy link
Member

It should be relatively easy to address by optionally making error generation into a separate function stored in the shared scope (from v7-alpha). Given that it is a trade-off (smaller function and faster parsing, but also slower execution on failing cases) it should be done with option (name TBC).

Documentation on code generation would help (TODO).

@epoberezkin epoberezkin added this to the 7.0.0 milestone Sep 15, 2020
@epoberezkin
Copy link
Member

the branch error-function is removed as no longer relevant

@willfarrell willfarrell mentioned this issue Jan 2, 2021
63 tasks
@epoberezkin epoberezkin modified the milestones: 7.0.0, 8.0.0 Feb 10, 2021
@epoberezkin epoberezkin changed the title Optimize initialization time Feb 10, 2021
@epoberezkin
Copy link
Member

epoberezkin commented Feb 10, 2021

Schemas are no longer serialised, but it did cause some confusion - #1413

Renamed issue to show what is left.

@epoberezkin
Copy link
Member

The solution is to use messages: false and if messages are needed they can be generated with ajv-i18n (they are effectively refactored there). Other parameters in error objects are needed anyway (they would be passed as error function parameters if it were refactored, so no real benefit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants