Skip to content
quentin-jaquier-sonarsource edited this page Mar 22, 2022 · 3 revisions

Context

Since the version 7.9 of the Java analyzer, the parsing of source files is done in multiple batches of a given size, instead of file by file for previous versions. Note that it only applies to the parsing, the analysis (rules execution) is still done on a file by file basis.

Batch size

This is the size of the batch, in Kilo Bytes. The analyzer will consume files until the size is reached, parse them in batch, and repeat the operation until all files have been processed.

Dynamic batch size computation

By default, an ideal batch size is dynamically computed, based on the total memory available. More precisely, the size is equal to 0.005% of the maximum memory (available though -Xmx). The dynamic computation is capped at 500KB.

Manual batch size settings

The dynamic computation of the batch size is on the safe side, as we have empirically identified that the performance benefits are already observable, without taking any risk on the memory side, when the size is relatively small. In certain situations, you may want to manually set the batch size value. You can do this by using the property sonar.java.experimental.batchModeSizeInKB. Note that the perfect value depends on the project and the ecosystem setup, bigger batch size will not necessarily increase the performance and can even slow things down if the memory is a limiting factor.

Special values

When set to -1, the parsing will be done in a single batch, whatever the memory or size of the project. This value is only meant to be used internally and should not be used.

Deprecated key

sonar.java.internal.batchMode is deprecated, and should not be used in batch mode related actions.

Parsing file by file

In certain situation, you may want to not run the analysis in batch. You can do this by setting the property sonar.java.fileByFile=true.

Analysis difference from batch mode

Except for memory/speed, batch and file by file mode should yield the same analysis results. Still, we have identified a small difference when the project is misconfigured (dependencies missing, source files not compiled, properties missing, ...). If you face differences, you should verify your configuration (check analysis logs for "Unresolved" types),

Technical motivations

The main benefit of batch mode is to avoid computing again and again the semantic of dependencies. It means that the bigger the project, the bigger the possible performance gain, especially if it contains many dependencies. Another secondary benefit can be observed in case of missing semantic (incorrectly configured projects). When have multiple files in the same batch, we can improve our partial knowledge about types. Batch mode is unrelated to parallel execution. We are still parsing sequentially the files, the performance improvements are coming from the fact that we can re-use partially the semantic already computed.