Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](parquet-reader) Fix INT96 timestamp min-max statistics is incorrect when was written by some old parquet writers by disable it. #35041

Merged

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented May 20, 2024

Proposed changes

[Fix] (parquet-reader) Fix INT96 timestamp min-max statistics is incorrect when was written by some old parquet writers by disabling it.
Parquet INT96 timestamp values were compared incorrectly for the purposes of producing statistics by older parquet writers, so PARQUET-1065 deprecated them. The result is that any writer that produced stats was producing unusable incorrect values, except the special case where min == max and an incorrect ordering would not be material to the result. PARQUET-1026 made binary stats available and valid in that special case.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…rect when was written by some old parquet writers by disable it.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buildall

1 similar comment
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.69% (9013/25251)
Line Coverage: 27.34% (74524/272571)
Region Coverage: 26.59% (38544/144943)
Branch Coverage: 23.42% (19662/83954)
Coverage Report: http://coverage.selectdb-in.cc/coverage/09ca04a6c3126b0f1945d9aa6d624ef85e6283d6_09ca04a6c3126b0f1945d9aa6d624ef85e6283d6/report/index.html

@morningman morningman merged commit bd6f5b6 into apache:master May 21, 2024
27 of 30 checks passed
yiguolei pushed a commit that referenced this pull request May 21, 2024
…rect when was written by some old parquet writers by disable it. (#35041)

Parquet INT96 timestamp values were compared incorrectly for the purposes of producing statistics
by older parquet writers, so PARQUET-1065 deprecated them. The result is that any writer that produced
stats was producing unusable incorrect values, except the special case where min == max and an incorrect
ordering would not be material to the result. PARQUET-1026 made binary stats available and valid in that special case.
morningman pushed a commit to morningman/doris that referenced this pull request May 21, 2024
…rect when was written by some old parquet writers by disable it. (apache#35041)

Parquet INT96 timestamp values were compared incorrectly for the purposes of producing statistics
by older parquet writers, so PARQUET-1065 deprecated them. The result is that any writer that produced
stats was producing unusable incorrect values, except the special case where min == max and an incorrect
ordering would not be material to the result. PARQUET-1026 made binary stats available and valid in that special case.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 21, 2024
…rect when was written by some old parquet writers by disable it. (apache#35041)

Parquet INT96 timestamp values were compared incorrectly for the purposes of producing statistics
by older parquet writers, so PARQUET-1065 deprecated them. The result is that any writer that produced
stats was producing unusable incorrect values, except the special case where min == max and an incorrect
ordering would not be material to the result. PARQUET-1026 made binary stats available and valid in that special case.
morningman pushed a commit that referenced this pull request May 22, 2024
…rect when was written by some old parquet writers by disable it. (#35041) (#35160)

backport #35041
dataroaring pushed a commit that referenced this pull request May 26, 2024
…rect when was written by some old parquet writers by disable it. (#35041)

Parquet INT96 timestamp values were compared incorrectly for the purposes of producing statistics
by older parquet writers, so PARQUET-1065 deprecated them. The result is that any writer that produced
stats was producing unusable incorrect values, except the special case where min == max and an incorrect
ordering would not be material to the result. PARQUET-1026 made binary stats available and valid in that special case.
@morningman morningman mentioned this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 participants