Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](tvf) Fix that tvf reading empty files in compressed formats. #34926

Merged
merged 6 commits into from
May 21, 2024

Conversation

BePPPower
Copy link
Contributor

@BePPPower BePPPower commented May 15, 2024

Proposed changes

Issue Number: close #xxx

  1. Fix the issue with tvf reading empty compressed files.

  2. move two test cases (test_local_tvf_compression and test_s3_tvf_compression) from p2 to p0

docs: apache/doris-website#654

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@BePPPower
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40153 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 79fa45d5c69215a0f574cafc8b34beae0e83e37d, data reload: false

------ Round 1 ----------------------------------
q1	6913	4370	4283	4283
q2	1627	186	191	186
q3	7251	1144	1132	1132
q4	6013	792	840	792
q5	2910	2930	2732	2732
q6	214	132	131	131
q7	1008	569	562	562
q8	2210	2090	2095	2090
q9	6639	6592	6549	6549
q10	8921	3685	3747	3685
q11	453	239	237	237
q12	388	223	212	212
q13	16900	2925	2963	2925
q14	258	224	215	215
q15	514	469	471	469
q16	505	371	382	371
q17	987	669	646	646
q18	7962	7559	7499	7499
q19	1622	1558	1511	1511
q20	666	313	311	311
q21	5079	3335	3931	3335
q22	352	282	280	280
Total cold run time: 79392 ms
Total hot run time: 40153 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4283	4285	4272	4272
q2	381	264	269	264
q3	3024	2795	2746	2746
q4	1862	1540	1583	1540
q5	5270	5288	5277	5277
q6	212	123	125	123
q7	2268	1890	1874	1874
q8	3186	3356	3376	3356
q9	8347	8309	8386	8309
q10	3914	3679	3693	3679
q11	586	494	490	490
q12	769	606	590	590
q13	16478	2956	2977	2956
q14	283	291	282	282
q15	502	473	466	466
q16	473	431	413	413
q17	1802	1481	1501	1481
q18	7766	7656	7534	7534
q19	5103	1520	1549	1520
q20	1955	1763	1759	1759
q21	4966	4900	4734	4734
q22	584	495	467	467
Total cold run time: 74014 ms
Total hot run time: 54132 ms
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.71% (8998/25199)
Line Coverage: 27.38% (74462/271954)
Region Coverage: 26.62% (38494/144617)
Branch Coverage: 23.43% (19637/83806)
Coverage Report: http://coverage.selectdb-in.cc/coverage/79fa45d5c69215a0f574cafc8b34beae0e83e37d_79fa45d5c69215a0f574cafc8b34beae0e83e37d/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 187624 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 79fa45d5c69215a0f574cafc8b34beae0e83e37d, data reload: false

query1	927	388	376	376
query2	6455	2548	2310	2310
query3	6662	217	216	216
query4	22471	21479	21267	21267
query5	4169	426	423	423
query6	266	177	182	177
query7	4597	313	284	284
query8	235	187	193	187
query9	8477	2415	2397	2397
query10	455	256	256	256
query11	14696	14213	14245	14213
query12	147	97	93	93
query13	1659	392	396	392
query14	11175	7710	7936	7710
query15	217	179	173	173
query16	7824	266	269	266
query17	1835	577	571	571
query18	1927	280	277	277
query19	205	156	157	156
query20	96	86	89	86
query21	200	124	126	124
query22	4981	4871	4778	4778
query23	34466	33653	33589	33589
query24	12020	2920	2993	2920
query25	661	372	362	362
query26	1784	165	157	157
query27	2985	325	331	325
query28	7364	2049	2035	2035
query29	1116	612	596	596
query30	310	173	172	172
query31	975	758	741	741
query32	90	52	56	52
query33	743	255	245	245
query34	1072	471	476	471
query35	793	692	687	687
query36	1061	883	907	883
query37	272	71	71	71
query38	2957	2787	2767	2767
query39	1609	1553	1584	1553
query40	274	122	127	122
query41	48	40	48	40
query42	104	96	99	96
query43	578	562	561	561
query44	1283	737	752	737
query45	268	261	253	253
query46	1076	750	730	730
query47	1966	1883	1893	1883
query48	369	303	301	301
query49	1189	395	401	395
query50	769	391	385	385
query51	6995	6789	6738	6738
query52	103	88	89	88
query53	357	280	289	280
query54	1031	433	430	430
query55	78	72	74	72
query56	247	222	224	222
query57	1277	1146	1146	1146
query58	232	210	217	210
query59	3425	3408	3178	3178
query60	261	236	254	236
query61	93	89	86	86
query62	647	498	468	468
query63	322	287	289	287
query64	9774	7397	7374	7374
query65	3165	3108	3081	3081
query66	1375	354	335	335
query67	15404	15009	14959	14959
query68	4534	531	537	531
query69	477	306	328	306
query70	1173	1100	1149	1100
query71	433	280	275	275
query72	7296	2573	2350	2350
query73	707	321	326	321
query74	6473	6219	6179	6179
query75	3545	2665	2634	2634
query76	2982	1020	969	969
query77	660	274	270	270
query78	10619	10047	10125	10047
query79	2088	513	532	513
query80	1041	426	430	426
query81	512	244	242	242
query82	1504	102	100	100
query83	191	175	172	172
query84	247	91	85	85
query85	1508	274	268	268
query86	449	320	308	308
query87	3264	3100	3122	3100
query88	4208	2367	2359	2359
query89	475	394	398	394
query90	2010	196	186	186
query91	126	99	98	98
query92	57	48	49	48
query93	2040	495	498	495
query94	1163	184	180	180
query95	397	303	307	303
query96	593	264	266	264
query97	3202	3007	2954	2954
query98	285	220	217	217
query99	1251	920	879	879
Total cold run time: 288174 ms
Total hot run time: 187624 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 30.38 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 79fa45d5c69215a0f574cafc8b34beae0e83e37d, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.67	0.08	0.10
query5	0.50	0.48	0.51
query6	1.13	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.47	0.49
query10	0.54	0.56	0.54
query11	0.16	0.11	0.11
query12	0.15	0.12	0.11
query13	0.60	0.59	0.62
query14	0.78	0.77	0.78
query15	0.84	0.80	0.82
query16	0.38	0.38	0.36
query17	0.94	0.97	1.03
query18	0.24	0.24	0.24
query19	1.76	1.70	1.72
query20	0.01	0.01	0.01
query21	15.70	0.65	0.64
query22	4.59	6.78	1.78
query23	18.31	1.39	1.24
query24	1.70	0.31	0.21
query25	0.14	0.08	0.08
query26	0.27	0.18	0.18
query27	0.08	0.08	0.08
query28	13.27	1.03	1.02
query29	13.24	3.41	3.37
query30	0.24	0.06	0.05
query31	2.87	0.40	0.38
query32	3.28	0.48	0.48
query33	2.88	2.79	2.84
query34	17.15	4.42	4.38
query35	4.53	4.51	4.57
query36	0.65	0.45	0.47
query37	0.18	0.16	0.16
query38	0.15	0.15	0.15
query39	0.04	0.04	0.04
query40	0.16	0.14	0.15
query41	0.09	0.04	0.05
query42	0.05	0.05	0.05
query43	0.04	0.03	0.04
Total cold run time: 110.28 s
Total hot run time: 30.38 s
@BePPPower
Copy link
Contributor Author

run p0

@@ -484,7 +484,8 @@ private PFetchTableSchemaRequest getFetchTableStructureRequest() throws TExcepti
// get first file, used to parse table schema
TBrokerFileStatus firstFile = null;
for (TBrokerFileStatus fileStatus : fileStatuses) {
if (fileStatus.isIsDir() || fileStatus.size == 0) {
if (fileStatus.isIsDir() || fileStatus.size == 0
|| (compressionType != TFileCompressType.UNKNOWN && fileStatus.size <= 4)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure all compressed file size have at least 4 bytes?

@BePPPower
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.70% (9013/25250)
Line Coverage: 27.34% (74522/272529)
Region Coverage: 26.60% (38542/144920)
Branch Coverage: 23.42% (19657/83934)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6a8b540b717e2c89ee8b208f431da52414a2fcf0_6a8b540b717e2c89ee8b208f431da52414a2fcf0/report/index.html

@BePPPower
Copy link
Contributor Author

run p0

@BePPPower
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.69% (9011/25248)
Line Coverage: 27.34% (74512/272515)
Region Coverage: 26.58% (38525/144916)
Branch Coverage: 23.42% (19656/83934)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8e03cd8aa18dcc0f2a044534a7c6b164f192c48f_8e03cd8aa18dcc0f2a044534a7c6b164f192c48f/report/index.html

morningman
morningman previously approved these changes May 20, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label May 20, 2024
@morningman
Copy link
Contributor

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.69% (9013/25251)
Line Coverage: 27.34% (74531/272582)
Region Coverage: 26.59% (38541/144947)
Branch Coverage: 23.42% (19659/83956)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0f9d0c4432b2c8733d8dc580a63df9cb4f36f30e_0f9d0c4432b2c8733d8dc580a63df9cb4f36f30e/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 21, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 07269c5 into apache:master May 21, 2024
25 of 28 checks passed
yiguolei pushed a commit that referenced this pull request May 21, 2024
…34926)

1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
morningman pushed a commit to morningman/doris that referenced this pull request May 21, 2024
…pache#34926)

1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
BePPPower added a commit to BePPPower/doris that referenced this pull request May 21, 2024
…pache#34926)

1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
morningman pushed a commit to BePPPower/doris that referenced this pull request May 22, 2024
…pache#34926)

1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
dataroaring pushed a commit that referenced this pull request May 26, 2024
…34926)

1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
@morningman morningman mentioned this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants