Questions tagged [hive-partitions]
To be used for questions regarding partitions in hive.
hive-partitions
146
questions
0
votes
0
answers
24
views
Hive: Dropping partitions not satisfying the condition
I'm trying to come up with a solution to my problem:
I need to delete all data in a partitioned (managed/external) table in Hive that don't satisfy the condition applied on other non-partition column.
...
0
votes
0
answers
33
views
How do I retrieve the schema fields that pyarrow has inferred from the Hive partitioning of a Dataset?
Once I've loaded a a hive-partitioned Dataset, how do I retrieve the fields that pyarrow has inferred as being the partitioning fields?
I know I can ask that to a Fragment (by using the fragment's ...
0
votes
0
answers
215
views
Delta Tables...do we need partitions for concurrent write/update?
I am starting to use Databricks in AWS. I have a delta table that contains KPIs, with each KPI having a KPI ID (1000, 1001, 1002, etc...). We want to have concurrent processes that update those KPIs ...
0
votes
0
answers
18
views
why hive explain command always shows same result on different conditisions?
I run below command to analyze table in hive.
analyze table my_partitioned_table partition(day='20230101') COMPUTE STATISTICS FOR COLUMNS
and when it finishes i try this to see explain before running ...
5
votes
1
answer
6k
views
How to read filtered partitioned parquet files efficiently using pandas's read_parquet?
Let say my data stored in object storage, say s3, with date time partition like this:
s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet
...
s3://my-bucket/year=2022/month=12/day=31/SOME-...
1
vote
1
answer
2k
views
Hive insert into partitioned table with colums list from select
I want to insert into a partitioned Hive table tb_1(a, b, c, d, p1) only columns (a, b) from a select statement.
Ex: insert into table tb_1 partition (p1) (a, b) select a, b from tb_2;
How can I ...
2
votes
1
answer
319
views
In Foundry, how can I Hive partition with only 1 parquet file per value?
I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the partition column to a column in the dataset (called ...
1
vote
1
answer
746
views
Databricks / Spark storage mechanism for Delta Tables, Delta Logs, Partitions etc
I am trying to understand how data is stored and managed in the DataBricks environment. I have a fairly decent understanding of what is going on under the hood but have seen some conflicting ...
0
votes
1
answer
568
views
Need to merge multiple hive partitions into one partition in spark
I have around 50 partitions in hive table. I need to merge each set of partitions into one partition. I tried to use rename partition command. But getting error message.
Need help in merging multiple ...
2
votes
1
answer
1k
views
How to automatically update the Hive external table metadata partitions for streaming data
I am writing the spark streaming data into hdfs partitions using pyspark.
please find the code
data = (spark.readStream.format("json").schema(fileSchema).load(inputDirectoryOfJsonFiles))
...
3
votes
1
answer
872
views
Performance of pyspark + hive when a table has many partition columns
I am trying to understand the performance impact on the partitioning scheme when Spark is used to query a hive table. As an example:
Table 1 has 3 partition columns, and data is stored in paths like
...
1
vote
1
answer
412
views
Hive - incomplete rows in select from managed partitioned table
I need to copy data from a CSV file to a managed partitioned table in Hive.
CSV file rows are:
id,nome,cognome,ruolo
16,Mike,Maignan,Portiere
23,Fikayo,Tomori,Centrale
24,Simon,Kjaer,Centrale ...
1
vote
1
answer
202
views
How to retain last N partitions for a hive external table?
I need to retain say last 7 partitions and data of a given hive external table.
This can be either done via a shell script or a hive hql script.
The table is partitioned by intgestion_date=YYYY-MM-DD
...
0
votes
0
answers
1k
views
HIVE: Exception: Partition Already Exists while ADDING a NEW Partition to an EXISTING EXTERNAL Table
I am getting the below error when the application (java) tries to execute an 'ADD partition' after 'DROP partition IF EXISTS' command in Hive:-
"""
Caused by: java.sql.SQLException: ...
3
votes
1
answer
1k
views
Repartition in Hadoop
My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we ...