Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](external) ignore not find files #35319

Merged
merged 2 commits into from
May 28, 2024

Conversation

morningman
Copy link
Contributor

Proposed changes

The file list is got from external meta cache, and the file may already be removed from storage.
We should ignore not found files and that query continue.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.65% (9018/25294)
Line Coverage: 27.31% (74567/273051)
Region Coverage: 26.54% (38601/145426)
Branch Coverage: 23.40% (19688/84132)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1f687ff59c9f14b2b15f455cfd32311c85304aad_1f687ff59c9f14b2b15f455cfd32311c85304aad/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41175 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1f687ff59c9f14b2b15f455cfd32311c85304aad, data reload: false

------ Round 1 ----------------------------------
q1	17612	4286	4288	4286
q2	2022	195	197	195
q3	10439	1290	1231	1231
q4	10183	883	727	727
q5	7476	2717	2685	2685
q6	222	143	137	137
q7	961	623	615	615
q8	9217	2130	2101	2101
q9	9722	6674	6783	6674
q10	9428	3912	3945	3912
q11	427	240	241	240
q12	456	236	231	231
q13	17464	3198	3274	3198
q14	261	210	218	210
q15	509	490	472	472
q16	516	406	397	397
q17	987	650	735	650
q18	8433	7874	7842	7842
q19	5981	1567	1477	1477
q20	656	312	313	312
q21	5208	4058	3309	3309
q22	337	274	277	274
Total cold run time: 118517 ms
Total hot run time: 41175 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4556	4391	4427	4391
q2	371	268	276	268
q3	3176	2958	2877	2877
q4	1860	1615	1647	1615
q5	5499	5551	5510	5510
q6	216	133	129	129
q7	2172	1824	1860	1824
q8	3209	3391	3426	3391
q9	8659	8696	8700	8696
q10	3955	3792	3880	3792
q11	591	484	501	484
q12	807	645	619	619
q13	15928	3173	3193	3173
q14	304	268	277	268
q15	521	497	477	477
q16	499	440	434	434
q17	1770	1486	1474	1474
q18	7798	7639	7454	7454
q19	1675	1596	1495	1495
q20	2003	1807	1771	1771
q21	9200	4840	4840	4840
q22	557	475	490	475
Total cold run time: 75326 ms
Total hot run time: 55457 ms
@doris-robot
Copy link

TPC-DS: Total hot run time: 172142 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1f687ff59c9f14b2b15f455cfd32311c85304aad, data reload: false

query1	921	382	368	368
query2	6542	2475	2388	2388
query3	6634	211	212	211
query4	19863	17350	17266	17266
query5	4138	420	445	420
query6	246	156	152	152
query7	4584	304	286	286
query8	242	195	179	179
query9	8592	2349	2362	2349
query10	455	291	251	251
query11	10407	10126	10016	10016
query12	140	91	84	84
query13	1632	354	362	354
query14	10000	6816	7468	6816
query15	216	169	175	169
query16	7719	265	265	265
query17	1726	526	499	499
query18	1948	276	262	262
query19	199	158	152	152
query20	91	81	84	81
query21	193	139	128	128
query22	4227	3834	3849	3834
query23	33719	33027	32907	32907
query24	7011	2801	2921	2801
query25	566	373	362	362
query26	705	159	161	159
query27	2172	318	318	318
query28	4949	2030	2034	2030
query29	844	625	615	615
query30	264	171	174	171
query31	964	784	751	751
query32	89	52	57	52
query33	521	271	292	271
query34	855	483	472	472
query35	708	615	603	603
query36	1051	919	923	919
query37	108	68	79	68
query38	2900	2798	2775	2775
query39	849	804	806	804
query40	192	123	126	123
query41	45	43	43	43
query42	101	98	97	97
query43	579	542	562	542
query44	1067	734	734	734
query45	177	164	162	162
query46	1065	716	722	716
query47	1841	1749	1780	1749
query48	365	293	292	292
query49	828	385	386	385
query50	768	379	393	379
query51	6914	6696	6867	6696
query52	109	121	88	88
query53	353	286	298	286
query54	532	440	449	440
query55	73	74	77	74
query56	260	239	243	239
query57	1111	1009	1042	1009
query58	230	205	207	205
query59	3455	3229	3295	3229
query60	281	257	249	249
query61	97	89	87	87
query62	591	452	433	433
query63	314	291	287	287
query64	8573	2252	1736	1736
query65	3171	3098	3110	3098
query66	785	331	329	329
query67	15355	14844	14787	14787
query68	4538	535	555	535
query69	445	268	319	268
query70	1123	1143	1170	1143
query71	363	276	268	268
query72	7088	5687	5683	5683
query73	746	334	322	322
query74	6037	5650	5539	5539
query75	3300	2670	2608	2608
query76	2296	1005	1009	1005
query77	385	268	266	266
query78	10283	9827	9712	9712
query79	2440	508	512	508
query80	918	436	424	424
query81	538	245	244	244
query82	944	95	98	95
query83	244	168	173	168
query84	248	89	92	89
query85	1198	326	313	313
query86	436	300	319	300
query87	3272	3114	3149	3114
query88	4334	2354	2348	2348
query89	466	382	384	382
query90	2044	199	192	192
query91	140	107	110	107
query92	58	51	53	51
query93	1801	522	506	506
query94	1270	202	197	197
query95	417	324	316	316
query96	585	266	270	266
query97	3150	3020	3005	3005
query98	251	225	223	223
query99	1154	843	853	843
Total cold run time: 260734 ms
Total hot run time: 172142 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 31.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1f687ff59c9f14b2b15f455cfd32311c85304aad, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.07
query5	0.48	0.50	0.51
query6	1.12	0.73	0.71
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.50	0.48
query10	0.56	0.56	0.53
query11	0.16	0.12	0.12
query12	0.15	0.13	0.11
query13	0.60	0.58	0.60
query14	0.77	0.78	0.78
query15	0.83	0.83	0.81
query16	0.37	0.34	0.37
query17	1.02	0.96	1.04
query18	0.21	0.26	0.23
query19	1.78	1.65	1.69
query20	0.02	0.01	0.01
query21	15.45	0.69	0.67
query22	4.56	6.11	2.48
query23	18.33	1.45	1.25
query24	1.75	0.28	0.20
query25	0.12	0.08	0.08
query26	0.25	0.16	0.16
query27	0.08	0.08	0.08
query28	13.32	1.00	0.99
query29	13.19	3.37	3.32
query30	0.24	0.06	0.06
query31	2.86	0.38	0.38
query32	3.30	0.47	0.46
query33	2.96	2.89	2.89
query34	17.24	4.41	4.47
query35	4.44	4.55	4.55
query36	0.67	0.48	0.48
query37	0.18	0.15	0.16
query38	0.16	0.14	0.15
query39	0.04	0.04	0.03
query40	0.16	0.17	0.14
query41	0.09	0.05	0.04
query42	0.05	0.05	0.04
query43	0.03	0.04	0.04
Total cold run time: 110.17 s
Total hot run time: 31.09 s
@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40951 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 88096d0e39b32ce605f1fcab3dcf5ddff63a43ff, data reload: false

------ Round 1 ----------------------------------
q1	17630	4399	4280	4280
q2	2017	183	198	183
q3	10502	1244	1206	1206
q4	10192	820	758	758
q5	7469	2684	2742	2684
q6	222	129	134	129
q7	970	616	597	597
q8	9215	2125	2087	2087
q9	9265	6690	6652	6652
q10	9233	3974	3888	3888
q11	467	240	242	240
q12	464	224	219	219
q13	17256	3135	3252	3135
q14	269	233	231	231
q15	517	469	463	463
q16	500	385	408	385
q17	1012	654	680	654
q18	8450	7785	7837	7785
q19	3061	1566	1532	1532
q20	629	310	316	310
q21	5142	4123	3251	3251
q22	364	291	282	282
Total cold run time: 114846 ms
Total hot run time: 40951 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4501	4405	4335	4335
q2	376	276	268	268
q3	3130	2946	3007	2946
q4	1920	1572	1599	1572
q5	5395	5512	5466	5466
q6	210	125	118	118
q7	2192	1803	1837	1803
q8	3236	3381	3358	3358
q9	8642	8641	8665	8641
q10	4010	3701	3822	3701
q11	598	479	490	479
q12	807	643	624	624
q13	15859	3162	3164	3162
q14	302	268	279	268
q15	544	492	493	492
q16	480	448	452	448
q17	1832	1525	1497	1497
q18	7648	7575	7427	7427
q19	1714	1552	1578	1552
q20	2008	1782	1813	1782
q21	10042	4769	4695	4695
q22	585	503	501	501
Total cold run time: 76031 ms
Total hot run time: 55135 ms
@doris-robot
Copy link

TPC-DS: Total hot run time: 168976 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 88096d0e39b32ce605f1fcab3dcf5ddff63a43ff, data reload: false

query1	930	379	385	379
query2	6457	2442	2293	2293
query3	6639	208	203	203
query4	19251	17511	17157	17157
query5	4142	430	427	427
query6	256	165	156	156
query7	4583	294	285	285
query8	245	190	184	184
query9	8457	2360	2350	2350
query10	453	279	273	273
query11	10569	10106	10163	10106
query12	138	94	92	92
query13	1647	384	371	371
query14	10349	7017	7519	7017
query15	256	161	164	161
query16	8038	259	269	259
query17	1727	511	501	501
query18	2083	287	271	271
query19	198	155	156	155
query20	91	81	81	81
query21	195	128	147	128
query22	4196	3930	3804	3804
query23	33701	32962	32971	32962
query24	9935	2852	2766	2766
query25	570	355	363	355
query26	704	154	161	154
query27	2176	324	328	324
query28	5736	2038	2034	2034
query29	845	612	596	596
query30	235	146	157	146
query31	953	769	760	760
query32	94	52	58	52
query33	658	274	292	274
query34	866	478	485	478
query35	703	585	591	585
query36	1096	902	920	902
query37	103	66	67	66
query38	2912	2750	2822	2750
query39	832	795	829	795
query40	194	130	123	123
query41	47	46	43	43
query42	103	95	96	95
query43	589	567	541	541
query44	1087	741	759	741
query45	178	158	156	156
query46	1056	717	726	717
query47	1837	1726	1752	1726
query48	384	297	313	297
query49	850	390	418	390
query50	773	401	397	397
query51	6896	6724	6711	6711
query52	102	96	90	90
query53	349	294	294	294
query54	823	436	444	436
query55	75	73	71	71
query56	263	246	247	246
query57	1129	1059	1024	1024
query58	234	211	217	211
query59	3414	3125	3104	3104
query60	285	260	269	260
query61	96	92	91	91
query62	586	471	457	457
query63	312	287	294	287
query64	8460	2152	1709	1709
query65	3153	3089	3108	3089
query66	780	328	331	328
query67	15251	14837	14721	14721
query68	4470	526	570	526
query69	443	268	276	268
query70	1194	1147	1140	1140
query71	374	270	273	270
query72	7463	5725	2687	2687
query73	736	332	322	322
query74	6035	5634	5561	5561
query75	3331	2667	2631	2631
query76	2309	957	954	954
query77	371	273	265	265
query78	10434	9659	9709	9659
query79	1932	516	508	508
query80	1031	443	429	429
query81	516	222	218	218
query82	669	94	91	91
query83	242	173	173	173
query84	252	90	92	90
query85	1526	333	324	324
query86	464	306	318	306
query87	3314	3108	3153	3108
query88	4236	2438	2440	2438
query89	468	395	390	390
query90	2045	192	192	192
query91	123	96	99	96
query92	58	51	50	50
query93	2319	515	493	493
query94	1278	187	187	187
query95	414	312	306	306
query96	586	284	267	267
query97	3169	2952	3010	2952
query98	248	225	218	218
query99	1114	869	850	850
Total cold run time: 265489 ms
Total hot run time: 168976 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 30.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 88096d0e39b32ce605f1fcab3dcf5ddff63a43ff, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.06	0.05
query4	1.66	0.10	0.09
query5	0.50	0.56	0.49
query6	1.13	0.73	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.54	0.49	0.48
query10	0.53	0.55	0.53
query11	0.16	0.11	0.11
query12	0.16	0.13	0.12
query13	0.59	0.59	0.60
query14	0.76	0.79	0.78
query15	0.82	0.81	0.82
query16	0.35	0.36	0.37
query17	1.01	1.01	1.02
query18	0.22	0.23	0.22
query19	1.82	1.80	1.75
query20	0.01	0.02	0.01
query21	15.45	0.69	0.67
query22	4.29	7.28	1.87
query23	18.33	1.32	1.18
query24	1.45	0.31	0.26
query25	0.13	0.10	0.09
query26	0.25	0.16	0.16
query27	0.08	0.08	0.09
query28	13.40	1.00	0.99
query29	12.63	3.34	3.29
query30	0.23	0.05	0.06
query31	2.88	0.40	0.38
query32	3.26	0.47	0.47
query33	2.93	2.88	2.91
query34	17.03	4.43	4.49
query35	4.48	4.47	4.52
query36	0.68	0.50	0.45
query37	0.18	0.16	0.15
query38	0.15	0.15	0.15
query39	0.04	0.04	0.03
query40	0.16	0.15	0.14
query41	0.08	0.04	0.05
query42	0.05	0.05	0.05
query43	0.04	0.03	0.04
Total cold run time: 108.88 s
Total hot run time: 30.56 s
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.74% (9009/25205)
Line Coverage: 27.37% (74560/272458)
Region Coverage: 26.59% (38576/145100)
Branch Coverage: 23.44% (19667/83902)
Coverage Report: http://coverage.selectdb-in.cc/coverage/88096d0e39b32ce605f1fcab3dcf5ddff63a43ff_88096d0e39b32ce605f1fcab3dcf5ddff63a43ff/report/index.html

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 28, 2024
@morningman morningman merged commit 8c8169b into apache:master May 28, 2024
27 of 28 checks passed
yiguolei pushed a commit that referenced this pull request May 28, 2024
The file list is got from external meta cache, and the file may already
be removed from storage.
We should ignore not found files and that query continue.
dataroaring pushed a commit that referenced this pull request May 31, 2024
The file list is got from external meta cache, and the file may already
be removed from storage.
We should ignore not found files and that query continue.
@morningman morningman mentioned this pull request Jun 1, 2024
morningman added a commit that referenced this pull request Jul 5, 2024
PR #35319 ignore the not found files in external table by default.
This PR add a BE config `ignore_not_found_file_in_external_table` to
control this behavior,
and the default value is still `true`.

Also add a new metric `NotFoundFileNum`, separate from `EmptyFileNum`,
to record the number of not found files in a query
morningman added a commit to morningman/doris that referenced this pull request Jul 14, 2024
PR apache#35319 ignore the not found files in external table by default.
This PR add a BE config `ignore_not_found_file_in_external_table` to
control this behavior,
and the default value is still `true`.

Also add a new metric `NotFoundFileNum`, separate from `EmptyFileNum`,
to record the number of not found files in a query
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
PR #35319 ignore the not found files in external table by default.
This PR add a BE config `ignore_not_found_file_in_external_table` to
control this behavior,
and the default value is still `true`.

Also add a new metric `NotFoundFileNum`, separate from `EmptyFileNum`,
to record the number of not found files in a query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment