Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](bloom filter) Fix bloom filter memory leak #34871

Merged
merged 4 commits into from
May 23, 2024

Conversation

Yukang-Lian
Copy link
Collaborator

@Yukang-Lian Yukang-Lian commented May 14, 2024

Proposed changes

Issue Number: close #xxx

Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.65% (8982/25193)
Line Coverage: 27.32% (74264/271812)
Region Coverage: 26.55% (38373/144512)
Branch Coverage: 23.37% (19569/83738)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ba50549f489159ad7df63b9249d4544d16f3f4ff_ba50549f489159ad7df63b9249d4544d16f3f4ff/report/index.html

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/olap/rowset/segment_v2/segment.cpp Outdated Show resolved Hide resolved
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.71% (8999/25200)
Line Coverage: 27.39% (74481/271969)
Region Coverage: 26.61% (38488/144616)
Branch Coverage: 23.43% (19639/83806)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6d3664a77c67b4962e16f0da9292a96fda602213_6d3664a77c67b4962e16f0da9292a96fda602213/report/index.html

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.70% (9006/25226)
Line Coverage: 27.37% (74526/272261)
Region Coverage: 26.61% (38527/144782)
Branch Coverage: 23.44% (19658/83878)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a60b610b407158dc598e6e2792ae836894d6e368_a60b610b407158dc598e6e2792ae836894d6e368/report/index.html

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/test/olap/segment_cache_test.cpp Show resolved Hide resolved
be/test/olap/segment_cache_test.cpp Show resolved Hide resolved
be/test/olap/segment_cache_test.cpp Show resolved Hide resolved
zhannngchen
zhannngchen previously approved these changes May 20, 2024
Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

zhannngchen
zhannngchen previously approved these changes May 21, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 21, 2024
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label May 21, 2024
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.67% (9013/25265)
Line Coverage: 27.33% (74537/272680)
Region Coverage: 26.57% (38555/145122)
Branch Coverage: 23.41% (19666/84004)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a3c78af2f63f0965b830aa39356524c05e48ca0b_a3c78af2f63f0965b830aa39356524c05e48ca0b/report/index.html

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.68% (9015/25265)
Line Coverage: 27.34% (74551/272680)
Region Coverage: 26.58% (38571/145124)
Branch Coverage: 23.41% (19670/84006)
Coverage Report: http://coverage.selectdb-in.cc/coverage/505cd66bdfaeff63c763c04048142e18f5d929a4_505cd66bdfaeff63c763c04048142e18f5d929a4/report/index.html

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.70% (9019/25265)
Line Coverage: 27.34% (74559/272683)
Region Coverage: 26.56% (38539/145122)
Branch Coverage: 23.41% (19664/84004)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c0c0f1f6fdef8973174af7d80e218127a391bf13_c0c0f1f6fdef8973174af7d80e218127a391bf13/report/index.html

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 22, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit a62659a into apache:master May 23, 2024
38 of 42 checks passed
yiguolei pushed a commit that referenced this pull request May 24, 2024
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
dataroaring pushed a commit that referenced this pull request May 26, 2024
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
@yiguolei yiguolei mentioned this pull request Jun 1, 2024
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Jul 10, 2024
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Jul 15, 2024
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.4-merged reviewed
5 participants