Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Fix)[hive-writer] Fixed the issue when partition values contain spaces when writing to s3. #35645

Merged
merged 1 commit into from
May 31, 2024

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented May 30, 2024

Proposed changes

Issue Number: close #31442

(Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3.

Error msg

org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
        at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-

Root Cause

Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI.

Solution

The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen kaka11chen force-pushed the fix_s3_partition_name_uri_error branch 2 times, most recently from 0c78614 to 129701a Compare May 30, 2024 10:11
@kaka11chen
Copy link
Contributor Author

run buildall

1 similar comment
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the fix_s3_partition_name_uri_error branch from 129701a to 76d8518 Compare May 30, 2024 10:41
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen marked this pull request as ready for review May 30, 2024 10:41
@doris-robot
Copy link

TPC-H: Total hot run time: 41410 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 76d85185546dfbbe4ccaabeb4535a87137b9f363, data reload: false

------ Round 1 ----------------------------------
q1	17619	4370	4241	4241
q2	2031	189	196	189
q3	10499	1338	1210	1210
q4	10207	862	848	848
q5	7540	2728	2747	2728
q6	231	132	137	132
q7	970	665	639	639
q8	9224	2179	2128	2128
q9	9827	6733	6753	6733
q10	9545	3930	3888	3888
q11	462	248	266	248
q12	448	240	234	234
q13	17480	3201	3338	3201
q14	256	208	220	208
q15	511	465	472	465
q16	490	408	425	408
q17	1016	680	734	680
q18	8481	7925	7697	7697
q19	5461	1622	1645	1622
q20	648	339	323	323
q21	5227	3253	4195	3253
q22	389	335	335	335
Total cold run time: 118562 ms
Total hot run time: 41410 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4505	4425	4426	4425
q2	374	278	277	277
q3	3187	2964	3009	2964
q4	1903	1610	1628	1610
q5	5452	5551	5520	5520
q6	213	126	130	126
q7	2225	1870	1800	1800
q8	3274	3430	3433	3430
q9	8637	8807	8714	8714
q10	4074	3758	3827	3758
q11	594	485	515	485
q12	788	641	628	628
q13	15885	3172	3144	3144
q14	311	277	278	277
q15	517	500	481	481
q16	498	451	445	445
q17	1818	1529	1510	1510
q18	7813	7537	7345	7345
q19	4619	1616	1588	1588
q20	2039	1775	1797	1775
q21	13811	4890	4854	4854
q22	577	541	531	531
Total cold run time: 83114 ms
Total hot run time: 55687 ms
@doris-robot
Copy link

TPC-DS: Total hot run time: 171580 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 76d85185546dfbbe4ccaabeb4535a87137b9f363, data reload: false

query1	916	404	366	366
query2	6454	2471	2361	2361
query3	6644	217	213	213
query4	19243	17275	17288	17275
query5	4138	427	431	427
query6	250	163	161	161
query7	4592	310	291	291
query8	324	290	285	285
query9	8479	2403	2399	2399
query10	461	288	276	276
query11	10452	10154	10095	10095
query12	140	92	90	90
query13	1630	369	365	365
query14	8522	7749	6116	6116
query15	234	192	192	192
query16	7631	268	262	262
query17	1304	525	520	520
query18	1956	273	271	271
query19	201	158	155	155
query20	93	93	87	87
query21	218	133	130	130
query22	4215	3937	3802	3802
query23	33668	33100	33181	33100
query24	6997	2819	2944	2819
query25	534	350	388	350
query26	706	157	155	155
query27	1928	345	330	330
query28	3808	2099	2124	2099
query29	862	611	596	596
query30	247	151	152	151
query31	945	742	771	742
query32	93	56	54	54
query33	506	285	265	265
query34	845	479	505	479
query35	738	624	638	624
query36	1065	956	917	917
query37	111	65	69	65
query38	2923	2825	2766	2766
query39	880	798	792	792
query40	195	128	126	126
query41	58	51	51	51
query42	106	95	98	95
query43	615	579	555	555
query44	1103	742	752	742
query45	190	181	171	171
query46	1056	733	715	715
query47	1840	1756	1760	1756
query48	360	300	304	300
query49	773	395	389	389
query50	768	395	396	395
query51	6808	6821	6688	6688
query52	101	94	95	94
query53	356	289	293	289
query54	542	436	437	436
query55	76	74	77	74
query56	277	285	246	246
query57	1118	1079	1021	1021
query58	225	213	215	213
query59	3439	3367	3322	3322
query60	282	264	263	263
query61	94	87	91	87
query62	556	446	442	442
query63	316	290	289	289
query64	8462	2261	1689	1689
query65	3206	3099	3125	3099
query66	809	325	339	325
query67	15230	14643	14659	14643
query68	4570	546	532	532
query69	443	275	268	268
query70	1177	1125	1156	1125
query71	401	284	270	270
query72	7645	5822	5338	5338
query73	748	335	328	328
query74	6005	5590	5671	5590
query75	3307	2618	2637	2618
query76	2231	926	965	926
query77	389	273	274	273
query78	11774	10278	9708	9708
query79	2341	520	529	520
query80	1496	446	431	431
query81	527	226	215	215
query82	613	92	91	91
query83	291	175	176	175
query84	267	88	90	88
query85	1010	286	282	282
query86	485	320	309	309
query87	3307	3143	3111	3111
query88	3969	2363	2384	2363
query89	471	408	401	401
query90	2170	193	193	193
query91	200	97	104	97
query92	62	51	53	51
query93	2013	524	510	510
query94	1142	193	186	186
query95	407	317	313	313
query96	595	272	278	272
query97	3154	2995	2987	2987
query98	253	222	218	218
query99	1137	866	833	833
Total cold run time: 258514 ms
Total hot run time: 171580 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 30.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 76d85185546dfbbe4ccaabeb4535a87137b9f363, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.07
query5	0.53	0.49	0.49
query6	1.14	0.73	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.48
query10	0.55	0.54	0.54
query11	0.15	0.12	0.12
query12	0.15	0.12	0.11
query13	0.60	0.59	0.60
query14	0.79	0.76	0.78
query15	0.82	0.81	0.82
query16	0.36	0.38	0.35
query17	0.96	0.94	0.97
query18	0.23	0.23	0.26
query19	1.75	1.69	1.71
query20	0.02	0.01	0.01
query21	15.43	0.74	0.69
query22	4.29	6.86	2.31
query23	18.28	1.34	1.26
query24	1.88	0.28	0.22
query25	0.16	0.09	0.08
query26	0.27	0.17	0.17
query27	0.08	0.08	0.09
query28	13.28	1.00	0.99
query29	13.31	3.28	3.27
query30	0.24	0.05	0.05
query31	2.88	0.38	0.40
query32	3.29	0.46	0.47
query33	2.90	2.86	2.88
query34	16.98	4.41	4.43
query35	4.49	4.45	4.62
query36	0.67	0.46	0.48
query37	0.17	0.16	0.16
query38	0.15	0.15	0.15
query39	0.05	0.04	0.04
query40	0.17	0.14	0.14
query41	0.09	0.04	0.04
query42	0.05	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.85 s
Total hot run time: 30.83 s
@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41160 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c886dcbb260656b1bf1ef3ff8895a76a2ab941df, data reload: false

------ Round 1 ----------------------------------
q1	17605	4331	4282	4282
q2	2023	196	202	196
q3	10453	1213	991	991
q4	10193	821	872	821
q5	7445	2752	2721	2721
q6	224	140	141	140
q7	954	633	617	617
q8	9214	2113	2111	2111
q9	9197	6684	6610	6610
q10	9195	3978	3837	3837
q11	462	263	269	263
q12	448	230	232	230
q13	17339	3245	3230	3230
q14	279	248	238	238
q15	524	482	486	482
q16	522	405	402	402
q17	1001	685	675	675
q18	8271	7805	7853	7805
q19	4443	1595	1528	1528
q20	645	323	327	323
q21	5082	3307	4086	3307
q22	409	369	351	351
Total cold run time: 115928 ms
Total hot run time: 41160 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4514	4418	4467	4418
q2	405	278	277	277
q3	3142	2924	2882	2882
q4	2001	1746	1596	1596
q5	5292	5511	5507	5507
q6	214	123	128	123
q7	2216	1813	1811	1811
q8	3234	3420	3375	3375
q9	8614	8594	8687	8594
q10	4123	3781	3740	3740
q11	595	509	526	509
q12	811	648	636	636
q13	16397	3117	3168	3117
q14	303	271	279	271
q15	537	480	484	480
q16	511	435	440	435
q17	1801	1535	1510	1510
q18	7713	7546	7389	7389
q19	1702	1532	1642	1532
q20	2003	1780	1763	1763
q21	10676	4776	4675	4675
q22	630	519	542	519
Total cold run time: 77434 ms
Total hot run time: 55159 ms
@doris-robot
Copy link

TPC-DS: Total hot run time: 169370 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c886dcbb260656b1bf1ef3ff8895a76a2ab941df, data reload: false

query1	941	387	366	366
query2	6444	2344	2418	2344
query3	6630	206	211	206
query4	19127	17297	17043	17043
query5	4109	430	422	422
query6	280	156	153	153
query7	4586	302	309	302
query8	316	292	285	285
query9	8497	2403	2391	2391
query10	442	282	267	267
query11	10582	10172	10098	10098
query12	132	89	88	88
query13	1646	363	365	363
query14	10100	7367	6845	6845
query15	231	188	194	188
query16	7879	266	268	266
query17	1748	530	507	507
query18	1993	286	275	275
query19	202	151	154	151
query20	91	87	85	85
query21	202	131	128	128
query22	4290	3888	3876	3876
query23	33559	32939	33211	32939
query24	6688	2902	2816	2816
query25	543	364	362	362
query26	703	157	158	157
query27	2006	329	321	321
query28	3642	2070	2063	2063
query29	890	604	597	597
query30	226	152	152	152
query31	957	754	748	748
query32	94	52	54	52
query33	506	277	266	266
query34	852	475	485	475
query35	700	592	589	589
query36	1054	930	925	925
query37	99	70	66	66
query38	2893	2764	2763	2763
query39	854	779	809	779
query40	190	125	122	122
query41	53	50	56	50
query42	100	98	93	93
query43	589	551	556	551
query44	1079	724	755	724
query45	192	171	165	165
query46	1059	718	729	718
query47	1837	1741	1796	1741
query48	368	297	299	297
query49	846	393	386	386
query50	769	390	383	383
query51	6839	6601	6749	6601
query52	105	88	90	88
query53	351	286	286	286
query54	554	440	438	438
query55	73	70	73	70
query56	260	242	245	242
query57	1097	1051	1065	1051
query58	229	217	208	208
query59	3281	3257	3250	3250
query60	289	254	261	254
query61	92	93	90	90
query62	555	441	455	441
query63	312	292	292	292
query64	8482	2228	1746	1746
query65	3146	3119	3105	3105
query66	778	338	334	334
query67	15225	14808	14703	14703
query68	4578	543	543	543
query69	499	267	268	267
query70	1127	1146	1067	1067
query71	420	270	271	270
query72	7517	2863	2687	2687
query73	732	337	327	327
query74	6057	5639	5596	5596
query75	3538	2671	2667	2667
query76	2878	1064	1128	1064
query77	598	270	272	270
query78	10215	9736	9743	9736
query79	2160	518	513	513
query80	1000	458	453	453
query81	531	223	218	218
query82	1282	93	94	93
query83	215	179	180	179
query84	246	89	95	89
query85	1210	343	318	318
query86	462	317	292	292
query87	3278	3092	3113	3092
query88	3981	2396	2382	2382
query89	482	397	403	397
query90	2066	192	196	192
query91	136	107	111	107
query92	61	51	51	51
query93	2401	512	506	506
query94	1197	204	198	198
query95	419	322	322	322
query96	595	273	272	272
query97	3214	3031	2993	2993
query98	246	230	219	219
query99	1135	848	842	842
Total cold run time: 259960 ms
Total hot run time: 169370 ms
@doris-robot
Copy link

ClickBench: Total hot run time: 30.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c886dcbb260656b1bf1ef3ff8895a76a2ab941df, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.09	0.08
query5	0.51	0.47	0.48
query6	1.12	0.72	0.72
query7	0.02	0.02	0.02
query8	0.06	0.04	0.04
query9	0.55	0.49	0.49
query10	0.54	0.57	0.55
query11	0.16	0.12	0.12
query12	0.14	0.12	0.12
query13	0.59	0.58	0.59
query14	0.77	0.80	0.77
query15	0.82	0.83	0.82
query16	0.36	0.38	0.36
query17	0.96	1.03	1.03
query18	0.23	0.25	0.23
query19	1.76	1.77	1.72
query20	0.02	0.01	0.01
query21	15.77	0.65	0.65
query22	4.45	6.83	2.02
query23	18.25	1.38	1.24
query24	1.99	0.22	0.21
query25	0.15	0.09	0.09
query26	0.26	0.17	0.17
query27	0.08	0.08	0.07
query28	13.34	1.01	0.99
query29	13.14	3.32	3.28
query30	0.24	0.05	0.05
query31	2.87	0.38	0.39
query32	3.31	0.46	0.46
query33	2.93	2.90	2.90
query34	17.21	4.39	4.42
query35	4.47	4.50	4.51
query36	0.68	0.52	0.50
query37	0.18	0.15	0.15
query38	0.15	0.14	0.15
query39	0.04	0.04	0.03
query40	0.16	0.13	0.14
query41	0.08	0.04	0.05
query42	0.05	0.04	0.04
query43	0.04	0.04	0.03
Total cold run time: 110.49 s
Total hot run time: 30.71 s
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 31, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit adfcbe8 into apache:master May 31, 2024
27 of 29 checks passed
yiguolei pushed a commit that referenced this pull request May 31, 2024
…es when writing to s3. (#35645)

## Proposed changes

Issue Number: close #31442

(Fix) [hive-writer] Fixed the issue when partition values contain spaces
when writing to s3.

### Error msg
```
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
        at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
```

### Root Cause
Hadoop partition names will encode some special characters, but not
space characters, which is different from URI encoding. Therefore, an
error will be reported when constructing URI.

### Solution
The solution is to use regular expressions to parse URI, and then pass
in each part of URI to construct URI. This URI constructor will encode
each part of URI.
@morningman morningman mentioned this pull request Jun 1, 2024
dataroaring pushed a commit that referenced this pull request Jun 4, 2024
…es when writing to s3. (#35645)

## Proposed changes

Issue Number: close #31442

(Fix) [hive-writer] Fixed the issue when partition values contain spaces
when writing to s3.

### Error msg
```
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
        at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
```

### Root Cause
Hadoop partition names will encode some special characters, but not
space characters, which is different from URI encoding. Therefore, an
error will be reported when constructing URI.

### Solution
The solution is to use regular expressions to parse URI, and then pass
in each part of URI to construct URI. This URI constructor will encode
each part of URI.
seawinde pushed a commit to seawinde/doris that referenced this pull request Jun 5, 2024
…es when writing to s3. (apache#35645)

## Proposed changes

Issue Number: close apache#31442

(Fix) [hive-writer] Fixed the issue when partition values contain spaces
when writing to s3.

### Error msg
```
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
        at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
```

### Root Cause
Hadoop partition names will encode some special characters, but not
space characters, which is different from URI encoding. Therefore, an
error will be reported when constructing URI.

### Solution
The solution is to use regular expressions to parse URI, and then pass
in each part of URI to construct URI. This URI constructor will encode
each part of URI.
w41ter pushed a commit to w41ter/incubator-doris that referenced this pull request Jul 18, 2024
…es when writing to s3.

Cherry-pick apache#35645.

```
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
        at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
```

Hadoop partition names will encode some special characters, but not
space characters, which is different from URI encoding. Therefore, an
error will be reported when constructing URI.

The solution is to use regular expressions to parse URI, and then pass
in each part of URI to construct URI. This URI constructor will encode
each part of URI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.4-merged reviewed
4 participants