Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](mtmv)Support agg state roll up and optimize the roll up code #35026

Merged
merged 8 commits into from
May 24, 2024

Conversation

seawinde
Copy link
Contributor

@seawinde seawinde commented May 17, 2024

Proposed changes

agg_state is agg intermediate state, detail see state combinator

this support agg function roll up as following

query materialized view roll up
agg_funtion() agg_funtion_unoin() or agg_funtion_state() agg_funtion_merge()
agg_funtion_unoin() agg_funtion_unoin() or agg_funtion_state() agg_funtion_union()
agg_funtion_merge() agg_funtion_unoin() or agg_funtion_state() agg_funtion_merge()

for example which can be rewritten by mv sucessfully as following
mv def is

            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;

query is

            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@seawinde
Copy link
Contributor Author

run buildall

1 similar comment
@seawinde
Copy link
Contributor Author

run buildall

Comment on lines 375 to 380
o 3 \  o,o,o,o,o,o,  mi�  ���"K?��D ZW�_-�A�Vʧ��t�E
o 4  o,o,  yy\r  ���"K?��D ZW

-- !query34_0_after --
o 3 \  o,o,o,o,o,o,  mi�  ���"K?��D ZW�_-�A�Vʧ��t�E
o 4  o,o,  yy\r  ���"K?��D ZW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agg state result is stable? i don't think so

@@ -72,92 +61,15 @@
*/
public abstract class AbstractMaterializedViewAggregateRule extends AbstractMaterializedViewRule {

protected static final Multimap<Function, Expression>
AGGREGATE_ROLL_UP_EQUIVALENT_FUNCTION_MAP = ArrayListMultimap.create();
public static List<AggFunctionRollUpHandler> ROLL_UP_HANDLERS =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

protected static final Multimap<Function, Expression>
AGGREGATE_ROLL_UP_EQUIVALENT_FUNCTION_MAP = ArrayListMultimap.create();
public static List<AggFunctionRollUpHandler> ROLL_UP_HANDLERS =
Lists.newArrayList(DirectRollupHandler.INSTANCE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ImmutableList

Comment on lines 292 to 293
Pair<Expression, Expression> mvExprToMvScanExprQueryBasedPair = Pair.of(expressionEntry.getKey(),
expressionEntry.getValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init out of inner for for better performance

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40088 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f79381a820a6c7dea789521915e2bfa84dee92d, data reload: false

------ Round 1 ----------------------------------
q1	17733	4423	4361	4361
q2	2668	203	198	198
q3	11013	1334	1122	1122
q4	11103	837	842	837
q5	7518	2696	2696	2696
q6	221	138	140	138
q7	959	603	615	603
q8	9260	2111	2106	2106
q9	8777	6509	6493	6493
q10	8969	3725	3755	3725
q11	442	238	239	238
q12	416	218	217	217
q13	17775	2962	2981	2962
q14	257	219	223	219
q15	529	469	478	469
q16	488	375	381	375
q17	960	601	691	601
q18	8048	7406	7472	7406
q19	4112	1552	1533	1533
q20	659	312	306	306
q21	4928	3227	3209	3209
q22	345	276	274	274
Total cold run time: 117180 ms
Total hot run time: 40088 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4337	4178	4182	4178
q2	379	263	266	263
q3	2986	2724	2775	2724
q4	1890	1622	1586	1586
q5	5283	5251	5313	5251
q6	217	123	127	123
q7	2135	1761	1724	1724
q8	3155	3352	3302	3302
q9	8300	8397	8369	8369
q10	3874	3745	3710	3710
q11	592	502	493	493
q12	783	599	577	577
q13	17546	2965	2995	2965
q14	286	271	275	271
q15	510	481	475	475
q16	471	420	420	420
q17	1771	1482	1454	1454
q18	7705	7620	7663	7620
q19	1682	1569	1543	1543
q20	1982	1747	1780	1747
q21	4902	4799	4746	4746
q22	551	489	491	489
Total cold run time: 71337 ms
Total hot run time: 54030 ms
@doris-robot
Copy link

TPC-DS: Total hot run time: 169528 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6f79381a820a6c7dea789521915e2bfa84dee92d, data reload: false

query1	907	365	378	365
query2	6459	2286	2224	2224
query3	6646	211	210	210
query4	19795	17391	17405	17391
query5	4231	412	410	410
query6	252	156	155	155
query7	4596	306	308	306
query8	238	198	181	181
query9	8507	2379	2389	2379
query10	458	271	261	261
query11	10570	10261	10187	10187
query12	137	89	88	88
query13	1662	359	362	359
query14	9961	6888	6672	6672
query15	205	170	168	168
query16	7577	259	264	259
query17	1513	522	529	522
query18	1225	281	290	281
query19	204	151	145	145
query20	93	88	85	85
query21	196	129	130	129
query22	4360	4167	3993	3993
query23	33628	33139	33102	33102
query24	11292	2937	2915	2915
query25	669	357	384	357
query26	1356	154	153	153
query27	2787	324	339	324
query28	7131	2021	2044	2021
query29	926	617	591	591
query30	304	174	178	174
query31	945	754	757	754
query32	89	52	53	52
query33	761	293	262	262
query34	979	481	472	472
query35	722	592	581	581
query36	1073	904	907	904
query37	131	72	70	70
query38	2929	2833	2808	2808
query39	848	813	786	786
query40	264	125	122	122
query41	46	45	46	45
query42	99	98	94	94
query43	591	539	549	539
query44	1192	716	731	716
query45	177	165	163	163
query46	1080	701	715	701
query47	1880	1817	1825	1817
query48	359	293	304	293
query49	1097	380	412	380
query50	759	398	382	382
query51	6835	6736	6797	6736
query52	102	90	91	90
query53	349	288	278	278
query54	987	425	420	420
query55	77	72	73	72
query56	264	248	248	248
query57	1139	1078	1059	1059
query58	232	211	204	204
query59	3319	3450	3118	3118
query60	275	257	252	252
query61	92	120	87	87
query62	649	454	434	434
query63	304	278	271	271
query64	9958	2214	1704	1704
query65	3153	3193	3113	3113
query66	1284	351	332	332
query67	15337	14847	15225	14847
query68	4590	543	550	543
query69	487	275	285	275
query70	1210	1153	1080	1080
query71	433	272	268	268
query72	7116	2704	2595	2595
query73	713	326	323	323
query74	6127	5792	5737	5737
query75	3357	2603	2621	2603
query76	3116	945	965	945
query77	663	264	262	262
query78	10329	9944	9696	9696
query79	3163	501	499	499
query80	1351	442	430	430
query81	508	244	241	241
query82	831	96	96	96
query83	215	172	171	171
query84	259	88	83	83
query85	1453	276	328	276
query86	449	289	307	289
query87	3368	3130	3187	3130
query88	4606	2318	2324	2318
query89	493	390	370	370
query90	1987	183	183	183
query91	126	109	99	99
query92	56	49	49	49
query93	4435	514	489	489
query94	1194	181	181	181
query95	403	303	309	303
query96	604	259	259	259
query97	3153	3056	3053	3053
query98	242	220	215	215
query99	1205	854	858	854
Total cold run time: 276504 ms
Total hot run time: 169528 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 30.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6f79381a820a6c7dea789521915e2bfa84dee92d, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.04	0.04
query3	0.23	0.06	0.05
query4	1.66	0.06	0.07
query5	0.50	0.49	0.50
query6	1.12	0.72	0.72
query7	0.01	0.01	0.01
query8	0.05	0.04	0.04
query9	0.54	0.48	0.49
query10	0.53	0.55	0.55
query11	0.15	0.11	0.11
query12	0.15	0.12	0.13
query13	0.59	0.59	0.60
query14	0.75	0.76	0.78
query15	0.82	0.82	0.81
query16	0.37	0.38	0.35
query17	0.96	0.98	1.02
query18	0.20	0.27	0.23
query19	1.78	1.67	1.70
query20	0.02	0.01	0.01
query21	15.70	0.67	0.66
query22	4.21	8.14	1.59
query23	18.28	1.43	1.23
query24	1.64	0.22	0.28
query25	0.14	0.08	0.08
query26	0.28	0.17	0.17
query27	0.07	0.08	0.08
query28	13.34	1.01	1.00
query29	13.20	3.24	3.26
query30	0.24	0.06	0.06
query31	2.85	0.39	0.37
query32	3.29	0.47	0.47
query33	2.89	2.94	2.90
query34	17.16	4.50	4.46
query35	4.47	4.51	4.63
query36	0.66	0.46	0.47
query37	0.17	0.15	0.16
query38	0.16	0.15	0.15
query39	0.05	0.04	0.03
query40	0.16	0.14	0.14
query41	0.09	0.05	0.04
query42	0.06	0.05	0.05
query43	0.04	0.03	0.03
Total cold run time: 109.71 s
Total hot run time: 30.19 s
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 24, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 2d94142 into apache:master May 24, 2024
26 of 28 checks passed
yiguolei pushed a commit that referenced this pull request May 24, 2024
…de (#35026)

agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
dataroaring pushed a commit that referenced this pull request May 26, 2024
…de (#35026)

agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
seawinde added a commit to seawinde/doris that referenced this pull request May 27, 2024
…de (apache#35026)

agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.4-merged reviewed
4 participants