Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change from v6 to v7: schema object itself used as a key for compiled schema function, not serialization #1413

Open
supersolid-AitorDB opened this issue Jan 27, 2021 · 17 comments

Comments

@supersolid-AitorDB
Copy link

supersolid-AitorDB commented Jan 27, 2021

Hi! I have noticed that after updating from v6 to v7, if you pass the same schema but with a different reference there's an increment on the memory used leading this to a potential memory leak. I understand that this is not the recommended way to use the library, but because there's not explicit mention about this change (that I have found) and I think other people could have not noticed this, adding a warning to the changelog or reviewing this if it was not an intentional change would be helpful.

Here is an example where the difference in memory can be observed between 6.12.6 and 7.0.2

const Ajv = require('ajv').default
const ajvInstance = new Ajv()

const sleep = (time) => new Promise((resolve) => setTimeout(() => resolve(), time))

async function start () {
	while (true) {
		const schema = {
			type: 'object',
			properties: {
				test: {type: 'number'}
			}	
		}
		
		const data = {
			test: Math.random()
		}

		ajvInstance.validate(schema, data)
		await sleep(10)
	}
}

start()
@epoberezkin
Copy link
Member

That is not a memory leak, it is a change in v7.

v6 used serialised schema as a key to compiled schema functions, v7 uses the actual schema object as a key to the map - it actually reduces memory utilisation in real life use cases by the size of the schema.

The change is covered in release notes, but it may be worth mentioning in the docs.

@epoberezkin epoberezkin changed the title Potential memory leak when updating from v6 to v7 Jan 29, 2021
@villasv
Copy link
Contributor

villasv commented Feb 3, 2021

v7 uses the actual schema object as a key to the map - it actually reduces memory utilisation in real life use cases by the size of the schema

Just like OP, I was bitten by this change because the JSON schema was constant but with different references:

export function getValidatorForMapping(
  mappings: ObjectMapping
): ValidateFunction<unknown> {
  return ajv.compile(translateObjectMapping(mappings));
}

It was totally my mistake for missing that part of the release notes, but I'd like to point out that you have at least two people presenting real life use cases that the new behavior didn't favor. I'll update our code to cache a reference for the schema ourselves and I understand that in most scenarios the new behavior will save memory and be beneficial.

@epoberezkin
Copy link
Member

It is covered in this post: https://www.poberezkin.com/posts/2021-02-11-ajv-version-7-big-changes-and-improvements.html#caching-compiled-schemas

Given that his change highlights incorrect usage of Ajv I think it should be kept as is for now - to be re-assessed as migration to v7 continues.

The argument presented in #1098 to remove serialisation was quite compelling...

@nrkn
Copy link

nrkn commented Jun 21, 2021

@epoberezkin can you confirm that this is expected and desirable behaviour - it seems according to everything I've read in this and related issues that incorrect usage of ajv is using the same id for two schema that don't pass deep-equal - but my problem happens if I try to have two sibling properties that share the same $id:

const childSchema = {
  $id: '#/child',
  type: 'string'
} as const

const parentSchema = {
  $id: '#/parent',
  type: 'object',
  properties: {
    child1: childSchema,
    child2: childSchema
  }
} as const

ajv.addSchema( parentSchema )
Error: reference "#/child" resolves to more than one schema

This only happens when sibling properties share a schema - this works fine:

const childSchema = {
  $id: '#/child',
  type: 'string'
} as const

const parent1Schema = {
  $id: '#/parent-1',
  type: 'object',
  properties: {
    child: childSchema
  }
} as const

const parent2Schema = {
  $id: '#/parent-2',
  type: 'object',
  properties: {
    child: childSchema
  }
} as const

ajv.addSchema( parent1Schema )
ajv.addSchema( parent2Schema )

I can do some pre-processing to eg remove the $id from subschema before passing to ajv if necessary, but I would like to confirm with you that ajv is working as expected in this specific instance first

Thank you!

@epoberezkin
Copy link
Member

That is expected, you should not be assigning IDs like that. You can put a shared schema in the definition and use the same reference, you do not really need $id in it.

@epoberezkin
Copy link
Member

... or you can reference it as an external schema (added to ajv instance with addSchema, not inlined)

@nrkn
Copy link

nrkn commented Jun 21, 2021

OK - thank you - in our case I cannot do either of those and will have to do a traverse first to remove $id from the subschema, as we have recently moved from using $ref to inlining all our subschema so we can use json-schema-to-ts

@marcbachmann
Copy link

marcbachmann commented Jun 23, 2021

if you pass the same schema but with a different reference there's an increment on the memory used leading this to a potential memory leak

The memory leak also occurs when there's no $id or id attribute and also with the compile function. Using addUsedSchema: false or removing the schema explicitly also doesn't fix it.

We could switch to using a WeakMap at some places.
This would allow automatic garbage collection.

Maybe we should create a new issue for it.

Here's a reproduction:

'use strict'
async function start () {
  const Ajv = require('ajv')

  const baseSchema = {
    type: 'object',
    required: ['first', 'second'],
    properties: {
      first: {type: 'string'},
      second: {type: 'string'},
      third: {type: 'string'},
      fourth: {type: 'string'},
      fifth: {type: 'string'},
      sixth: {type: 'string'},
      seventh: {type: 'string'}
    }
  }

  const total = 10000
  let ajv, i, remaining

  function fixedCompile (schema) {
    const validate = ajv.compile(schema)
    ajv.removeSchema(schema)

    // Force reset values that aren't reset with removeSchema
    ajv.scope._values.schema.delete(schema)
    ajv.scope._values.validate.delete(validate)
    const schemaIdx = ajv.scope._scope.schema.indexOf(schema)
    const validateIdx = ajv.scope._scope.validate.indexOf(validate)
    if (schemaIdx !== -1) ajv.scope._scope.schema.splice(schemaIdx, 1)
    if (validateIdx !== -1) ajv.scope._scope.validate.splice(validateIdx, 1)
    return validate
  }

  function brokenCompile (schema) {
    const validate = ajv.compile(schema)
    ajv.removeSchema(schema)
    return validate
  }

  setInterval(() => {
    global.gc?.()
    const used = process.memoryUsage().heapUsed / 1024 / 1024
    console.log(
      `${total - remaining}:`,
      `The script uses ${Math.round(used * 100) / 100} MB`
    )
  }, 500).unref()


  console.log('With leak\n----------')
  ajv = new Ajv({addUsedSchema: false})
  i = 1000
  remaining = total
  while (remaining--) {
    brokenCompile({...baseSchema})

    if (--i === 0) {
      i = 1000
      await new Promise((r) => setTimeout(r, 1))
    }
  }

  console.log('\n\nWithout leak\n----------')
  ajv = new Ajv({addUsedSchema: false})
  i = 1000
  remaining = total
  while (remaining--) {
    fixedCompile({...baseSchema})

    if (--i === 0) {
      i = 1000
      await new Promise((r) => setTimeout(r, 1))
    }
  }

}

start()
$ node --expose-gc ajv-compile.js
With leak
----------
1000: The script uses 9.3 MB
2000: The script uses 13.99 MB
3000: The script uses 18.77 MB
4000: The script uses 23.21 MB
5000: The script uses 27.87 MB
6000: The script uses 32.61 MB
7000: The script uses 37.02 MB
8000: The script uses 41.46 MB
9000: The script uses 46.34 MB
10000: The script uses 50.82 MB


Without leak
----------
1000: The script uses 5.66 MB
2000: The script uses 5.56 MB
3000: The script uses 5.44 MB
4000: The script uses 5.24 MB
5000: The script uses 5.11 MB
6000: The script uses 5.06 MB
7000: The script uses 5.07 MB
8000: The script uses 5.14 MB
9000: The script uses 5.08 MB
10000: The script uses 5.13 MB
@epoberezkin
Copy link
Member

epoberezkin commented Jun 23, 2021

I understand the demo - this is an incorrect usage. You should not allocate constant javascript objects within the functions - they will be indeed different object instances - they should be allocated on the module level. In case of using these schemas with Ajv, they are compiled every time.

As I wrote above, this tradeoff is explained in #1098 - it penalises incorrect usage more than the previous choice in v6, but it reduces memory usage in case schemas are allocated once.

@marcbachmann
Copy link

marcbachmann commented Jun 23, 2021

In our case we need to generate validation functions for dynamic user-defined schemas which are persisted in a remote storage. The return values of compile get cached separately outside of ajv and the validation function scope.

I expected that at least the removeSchema function would clean everything up.
Is there anything we could help to get it working again?

I expected that behavior for .validate, but IMO .compile shouldn't persist anything.

In the specific case we have, only the first level properties are custom. We could loop through all the keys on the first level and call the validate function for each structure.

But extending the removeSchema or compile function with the fix above could also solve the memory issue we're experiencing.

@epoberezkin
Copy link
Member

In our case we need to generate validation functions for dynamic user-defined schemas which are persisted in a remote storage.

For better performance, you could generate "standalone" code and save it to storage as well, and then require it from string - so it would need to be compiled only on creation / update

I expected that behavior for .validate, but IMO .compile shouldn't persist anything.

there is an option addUsedSchema that could allow disabling it.

@epoberezkin
Copy link
Member

looking at your code - you are right, the shared scope needs cleaning up.

@marcbachmann
Copy link

marcbachmann commented Jun 24, 2021

I'd say we want to have the following behavior:

// Remove should always try to remove by Object and $id
ajv.removeSchema({type: 'object', properties: {first: {type: 'string'}}})

// With addUsedSchema: true
// Cached by Object
ajv.validate({type: 'object', properties: {first: {type: 'string'}}})

// Cache by Object and $id
ajv.validate({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})

// No caching at all (not optimized for multiple calls, maybe we could still cache for consistency)
ajv.compile({type: 'object', properties: {first: {type: 'string'}}})

// Cache by Object and $id
ajv.compile({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})


// With addUsedSchema: false
// Cached by Object
ajv.validate({type: 'object', properties: {first: {type: 'string'}}})

// Cache by Object
ajv.validate({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})

// No caching at all (not optimized for multiple calls)
ajv.compile({type: 'object', properties: {first: {type: 'string'}}})

// No caching at all (not optimized for multiple calls)
ajv.compile({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})

Maybe another option is to get rid of the addUsedSchema option and support an uncached ajv.compile(schema, {cache: false}) explicitly. Or addUsedSchema will just be used to automatically register the $id for lookup by that as documented. It shouldn't change the cache behavior.

@juanrgm
Copy link

juanrgm commented Jun 29, 2021

I am facing with the same problem, over 2GB of memory leak and increasing :(

Definitely there is a bug or I don't understand the ajv.removeSchema behavior.

const Ajv = require("ajv")

const ajv = new Ajv({
  allowUnionTypes: true
})

function printMemory(counter) {

  const memoryUsage = Object.entries(
    process.memoryUsage()
  ).reduce((object, [name, memory]) => {
    object[name] = Math.ceil(memory / 1024 / 1024)
    return object
  }, {})

  console.log({
    counter,
    ...memoryUsage
  })

}

function repeat(total, rutine) {
  while(total--) rutine(counter++)
}

function rutine(counter) {
  ajv.compile({ $id: counter.toString(), type: "string" })
  ajv.removeSchema(counter.toString())
  ajv.removeSchema() // it has not effect
  if (!(counter % 1000))
    printMemory(counter)
}

let counter = 0
//while(true) rutine(index++)
setInterval(() => repeat(1000, rutine), 0)

How could I clear compile memory usage?

Meanwhile I am going to downgrade the version.

@epoberezkin
Copy link
Member

The problem that needs fixing is that in case addUsedSchema: false option is used, the shared scope should be cleaned after the compilation (or restored in some other way - cleaning would be non-trivial).

ajv.removeSchema method does not attempt to fully clean the variables in the shared scope used by the schema, it's main purpose to make sure that the schema is re-compiled.

So the solution would be to use addUsedSchema: false option, if you indeed need to compile schemas every time they are used (which can be the case if they are dynamically generated or supplied by the users).

A better option would be whenever possible to compile schemas only once - see this page on some suggestions: https://ajv.js.org/guide/managing-schemas.html

@juanrgm
Copy link

juanrgm commented Jun 30, 2021

Firstly, thanks for your reply and your awesome job.

The problem that needs fixing is that in case addUsedSchema: false option is used, the shared scope should be cleaned after the compilation (or restored in some other way - cleaning would be non-trivial).

ajv.removeSchema method does not attempt to fully clean the variables in the shared scope used by the schema, it's main purpose to make sure that the schema is re-compiled.

I don't understand why removeSchema lets residual memory, is that not a bug? With Ajv v6 there is not memory leaks, it works fine and I did not see any notice in the migration docs about this feat.

So the solution would be to use addUsedSchema: false option, if you indeed need to compile schemas every time they are used (which can be the case if they are dynamically generated or supplied by the users).

A better option would be whenever possible to compile schemas only once - see this page on some suggestions: https://ajv.js.org/guide/managing-schemas.html

That is not possible, at least for my case. The JSON schemas must be reloaded without restarting the server and others schemas are dynamics.

@epoberezkin
Copy link
Member

epoberezkin commented Jun 30, 2021

I don't understand why removeSchema lets residual memory, is that not a bug

The dependency graph that can exist between different schemas makes it impossible in some cases to fully remove it. While the top level schema is removed, some of its fragments may be used elsewhere...

If you really need a full clean-up you need to discard the instance and create the new one - which is even more inefficient than re-compiling every time.

Another approach is not to use Ajv as cache - you can serialise and cache schemas on the application level - doing what Ajv v6 was doing.

That is not possible, at least for my case. The JSON schemas must be reloaded without restarting the server and others schemas are dynamics.

I understand it, but using addUsedSchema: false option and caching in the application must be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment