Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](routine-load) optimize routine load task allocation algorithm #34778

Merged
merged 1 commit into from
May 17, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented May 13, 2024

The current task allocation algorithm takes into account the cache information of the object pool. If there are still more than half of the idle slots in the previously executed BE, they will be allocated to the previous BE.

However, in the actual process of execution in the production environment, there have been serious imbalances.

So we optimized the algorithm based on the following reasons:

  1. We believe that the benefits of load balance outweigh the benefits of object pool cache, we try to find the one with the most idle slots as much as possible.
  2. On the basis of selecting the maximum idle slot be, try to reuse the object cache as much as possible.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui sollhui force-pushed the opt_task_allocate branch 4 times, most recently from 5b9e6b7 to d55ed99 Compare May 13, 2024 09:55
@sollhui
Copy link
Contributor Author

sollhui commented May 13, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41060 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c7f0641935947a0d0bd170fe31740aba30e0a82f, data reload: false

------ Round 1 ----------------------------------
q1	6999	4270	4238	4238
q2	1080	185	191	185
q3	6755	1122	1222	1122
q4	1005	862	717	717
q5	2625	2620	2734	2620
q6	268	153	160	153
q7	1148	645	583	583
q8	2009	2171	2138	2138
q9	6961	6816	6713	6713
q10	4052	4105	3776	3776
q11	362	247	237	237
q12	376	219	217	217
q13	17511	3170	3189	3170
q14	269	234	231	231
q15	531	485	489	485
q16	520	402	407	402
q17	1005	728	737	728
q18	8245	7930	7850	7850
q19	2627	1562	1528	1528
q20	516	319	306	306
q21	5214	3376	4298	3376
q22	370	300	285	285
Total cold run time: 70448 ms
Total hot run time: 41060 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4423	4385	4451	4385
q2	385	265	273	265
q3	3090	2780	2763	2763
q4	1909	1641	1608	1608
q5	5365	5360	5356	5356
q6	212	123	123	123
q7	1759	1381	1426	1381
q8	3265	3421	3485	3421
q9	8431	8399	8444	8399
q10	3929	3647	3647	3647
q11	569	492	498	492
q12	778	581	589	581
q13	7896	3025	2948	2948
q14	283	263	258	258
q15	523	474	468	468
q16	468	429	416	416
q17	1823	1521	1496	1496
q18	7611	7572	7493	7493
q19	1677	1569	1587	1569
q20	1949	1758	1766	1758
q21	5141	4949	4950	4949
q22	608	502	513	502
Total cold run time: 62094 ms
Total hot run time: 54278 ms
@doris-robot
Copy link

TPC-DS: Total hot run time: 187205 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c7f0641935947a0d0bd170fe31740aba30e0a82f, data reload: false

query1	917	366	347	347
query2	6425	2446	2379	2379
query3	6641	206	215	206
query4	25026	21176	21170	21170
query5	4164	426	430	426
query6	257	171	178	171
query7	4580	291	290	290
query8	247	190	189	189
query9	8495	2419	2402	2402
query10	435	255	279	255
query11	14758	14165	14273	14165
query12	120	93	92	92
query13	1645	368	362	362
query14	10647	7802	8303	7802
query15	216	182	178	178
query16	7815	268	259	259
query17	1745	567	548	548
query18	1955	274	266	266
query19	191	148	149	148
query20	94	84	88	84
query21	192	125	125	125
query22	5065	4863	4834	4834
query23	34256	33489	33551	33489
query24	6378	2911	2895	2895
query25	468	367	363	363
query26	695	163	155	155
query27	1838	324	328	324
query28	3811	2076	2053	2053
query29	830	634	595	595
query30	229	160	152	152
query31	967	758	742	742
query32	59	55	54	54
query33	474	250	248	248
query34	870	476	490	476
query35	761	681	686	681
query36	1052	947	918	918
query37	105	67	68	67
query38	2907	2755	2751	2751
query39	1597	1564	1571	1564
query40	197	128	123	123
query41	44	37	38	37
query42	107	95	100	95
query43	588	555	577	555
query44	1093	736	741	736
query45	273	252	253	252
query46	1101	744	710	710
query47	1968	1871	1848	1848
query48	370	294	304	294
query49	770	428	402	402
query50	770	388	397	388
query51	6884	6864	6724	6724
query52	105	93	94	93
query53	350	288	289	288
query54	526	434	429	429
query55	75	72	73	72
query56	237	225	223	223
query57	1207	1128	1161	1128
query58	217	226	204	204
query59	3371	3253	3273	3253
query60	263	250	260	250
query61	92	87	84	84
query62	582	463	473	463
query63	312	282	277	277
query64	7680	7345	7386	7345
query65	3153	3092	3141	3092
query66	814	339	340	339
query67	15413	14716	15084	14716
query68	4490	543	533	533
query69	477	300	312	300
query70	1110	1151	1161	1151
query71	371	270	268	268
query72	7484	2594	2345	2345
query73	701	326	329	326
query74	6547	6123	6097	6097
query75	3321	2666	2569	2569
query76	2257	1044	1024	1024
query77	425	265	266	265
query78	10689	10184	9984	9984
query79	2242	519	521	519
query80	1055	439	434	434
query81	502	221	218	218
query82	660	95	100	95
query83	257	165	172	165
query84	252	89	89	89
query85	1679	325	323	323
query86	491	325	333	325
query87	3275	3093	3170	3093
query88	3842	2358	2354	2354
query89	475	375	388	375
query90	2070	200	198	198
query91	136	111	110	110
query92	65	51	50	50
query93	2858	522	511	511
query94	1245	192	197	192
query95	403	325	311	311
query96	597	265	276	265
query97	3174	2997	2982	2982
query98	246	221	216	216
query99	1195	873	889	873
Total cold run time: 272080 ms
Total hot run time: 187205 ms
Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 15, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@XuJianxu XuJianxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-DS: Total hot run time: 185241 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c7f0641935947a0d0bd170fe31740aba30e0a82f, data reload: false

query1	914	385	371	371
query2	6154	2499	2179	2179
query3	6651	232	238	232
query4	23018	22303	22198	22198
query5	3836	442	418	418
query6	267	179	186	179
query7	4535	304	308	304
query8	241	190	193	190
query9	8526	2389	2361	2361
query10	414	243	249	243
query11	11916	11237	11329	11237
query12	116	87	85	85
query13	1673	356	371	356
query14	11891	8663	8667	8663
query15	262	194	174	174
query16	8211	271	257	257
query17	1976	581	535	535
query18	1387	278	267	267
query19	339	151	152	151
query20	92	85	86	85
query21	190	137	133	133
query22	5211	4874	4827	4827
query23	34192	33394	33504	33394
query24	10616	2926	2864	2864
query25	582	363	352	352
query26	1141	153	148	148
query27	2325	316	308	308
query28	6909	2012	2002	2002
query29	850	613	589	589
query30	257	174	178	174
query31	947	720	730	720
query32	94	50	50	50
query33	722	246	235	235
query34	1018	489	474	474
query35	820	671	660	660
query36	1050	861	931	861
query37	147	74	70	70
query38	2914	2772	2792	2772
query39	1594	1592	1558	1558
query40	189	120	126	120
query41	42	40	40	40
query42	105	97	96	96
query43	564	507	525	507
query44	1183	722	735	722
query45	271	257	257	257
query46	1104	722	686	686
query47	1948	1869	1889	1869
query48	367	295	288	288
query49	846	382	379	379
query50	745	390	388	388
query51	6846	6761	6814	6761
query52	103	96	98	96
query53	360	280	283	280
query54	870	413	417	413
query55	80	73	70	70
query56	246	226	217	217
query57	1220	1118	1139	1118
query58	234	196	199	196
query59	3315	3047	3079	3047
query60	265	231	229	229
query61	86	87	88	87
query62	682	459	492	459
query63	306	288	281	281
query64	8723	7404	7326	7326
query65	3141	3105	3083	3083
query66	899	339	327	327
query67	15517	14920	14856	14856
query68	5432	524	535	524
query69	541	293	308	293
query70	1134	1143	1119	1119
query71	461	267	265	265
query72	7904	2553	2388	2388
query73	711	314	327	314
query74	6585	6150	6003	6003
query75	3729	2641	2577	2577
query76	3614	989	976	976
query77	591	274	265	265
query78	10578	10016	10132	10016
query79	2812	514	527	514
query80	1638	422	425	422
query81	524	251	240	240
query82	1092	92	99	92
query83	206	166	168	166
query84	260	85	84	84
query85	1353	272	283	272
query86	440	322	325	322
query87	3299	3150	3129	3129
query88	3905	2328	2327	2327
query89	461	370	368	368
query90	1956	194	187	187
query91	131	107	108	107
query92	62	51	50	50
query93	2369	494	488	488
query94	1123	184	187	184
query95	406	303	298	298
query96	588	267	272	267
query97	3208	3001	2999	2999
query98	242	220	214	214
query99	1272	897	928	897
Total cold run time: 282882 ms
Total hot run time: 185241 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 30.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c7f0641935947a0d0bd170fe31740aba30e0a82f, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.24	0.05	0.04
query4	1.68	0.07	0.07
query5	0.51	0.52	0.51
query6	1.14	0.72	0.73
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.54	0.49	0.48
query10	0.55	0.55	0.55
query11	0.14	0.11	0.11
query12	0.15	0.12	0.11
query13	0.60	0.58	0.60
query14	0.76	0.78	0.78
query15	0.84	0.82	0.81
query16	0.36	0.37	0.36
query17	1.04	1.04	1.03
query18	0.21	0.24	0.25
query19	1.82	1.77	1.82
query20	0.01	0.00	0.01
query21	15.60	0.66	0.64
query22	4.45	7.04	1.99
query23	18.27	1.34	1.27
query24	1.63	0.26	0.21
query25	0.14	0.08	0.08
query26	0.28	0.17	0.17
query27	0.09	0.08	0.08
query28	13.46	1.01	1.01
query29	13.30	3.32	3.24
query30	0.24	0.06	0.06
query31	2.86	0.38	0.39
query32	3.29	0.47	0.46
query33	2.84	2.87	2.85
query34	17.14	4.36	4.45
query35	4.50	4.51	4.59
query36	0.70	0.48	0.46
query37	0.17	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.03	0.03
query40	0.16	0.14	0.14
query41	0.09	0.05	0.05
query42	0.05	0.05	0.04
query43	0.03	0.03	0.04
Total cold run time: 110.25 s
Total hot run time: 30.62 s
@liaoxin01
Copy link
Contributor

run performance

@liaoxin01 liaoxin01 merged commit d4c6ae5 into apache:master May 17, 2024
28 of 30 checks passed
@yiguolei yiguolei mentioned this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
4 participants