Skip to main content

Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

hive-partitions
0 votes
0 answers
24 views

Hive: Dropping partitions not satisfying the condition

I'm trying to come up with a solution to my problem: I need to delete all data in a partitioned (managed/external) table in Hive that don't satisfy the condition applied on other non-partition column. ...
Kiki's user avatar
  • 11
0 votes
0 answers
33 views

How do I retrieve the schema fields that pyarrow has inferred from the Hive partitioning of a Dataset?

Once I've loaded a a hive-partitioned Dataset, how do I retrieve the fields that pyarrow has inferred as being the partitioning fields? I know I can ask that to a Fragment (by using the fragment's ...
Gabriele Giuseppini's user avatar
0 votes
0 answers
215 views

Delta Tables...do we need partitions for concurrent write/update?

I am starting to use Databricks in AWS. I have a delta table that contains KPIs, with each KPI having a KPI ID (1000, 1001, 1002, etc...). We want to have concurrent processes that update those KPIs ...
chulo's user avatar
  • 63
0 votes
0 answers
18 views

why hive explain command always shows same result on different conditisions?

I run below command to analyze table in hive. analyze table my_partitioned_table partition(day='20230101') COMPUTE STATISTICS FOR COLUMNS and when it finishes i try this to see explain before running ...
CompEng's user avatar
  • 7,291
5 votes
1 answer
6k views

How to read filtered partitioned parquet files efficiently using pandas's read_parquet?

Let say my data stored in object storage, say s3, with date time partition like this: s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet ... s3://my-bucket/year=2022/month=12/day=31/SOME-...
user3595632's user avatar
  • 5,590
1 vote
1 answer
2k views

Hive insert into partitioned table with colums list from select

I want to insert into a partitioned Hive table tb_1(a, b, c, d, p1) only columns (a, b) from a select statement. Ex: insert into table tb_1 partition (p1) (a, b) select a, b from tb_2; How can I ...
david vallet's user avatar
2 votes
1 answer
319 views

In Foundry, how can I Hive partition with only 1 parquet file per value?

I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the partition column to a column in the dataset (called ...
Andrew Andrade's user avatar
1 vote
1 answer
746 views

Databricks / Spark storage mechanism for Delta Tables, Delta Logs, Partitions etc

I am trying to understand how data is stored and managed in the DataBricks environment. I have a fairly decent understanding of what is going on under the hood but have seen some conflicting ...
guacacholay's user avatar
0 votes
1 answer
568 views

Need to merge multiple hive partitions into one partition in spark

I have around 50 partitions in hive table. I need to merge each set of partitions into one partition. I tried to use rename partition command. But getting error message. Need help in merging multiple ...
Arvinth's user avatar
  • 60
2 votes
1 answer
1k views

How to automatically update the Hive external table metadata partitions for streaming data

I am writing the spark streaming data into hdfs partitions using pyspark. please find the code data = (spark.readStream.format("json").schema(fileSchema).load(inputDirectoryOfJsonFiles)) ...
nani's user avatar
  • 21
3 votes
1 answer
872 views

Performance of pyspark + hive when a table has many partition columns

I am trying to understand the performance impact on the partitioning scheme when Spark is used to query a hive table. As an example: Table 1 has 3 partition columns, and data is stored in paths like ...
GeorgeWilson's user avatar
1 vote
1 answer
412 views

Hive - incomplete rows in select from managed partitioned table

I need to copy data from a CSV file to a managed partitioned table in Hive. CSV file rows are: id,nome,cognome,ruolo 16,Mike,Maignan,Portiere 23,Fikayo,Tomori,Centrale 24,Simon,Kjaer,Centrale ...
Moreno's user avatar
  • 25
1 vote
1 answer
202 views

How to retain last N partitions for a hive external table?

I need to retain say last 7 partitions and data of a given hive external table. This can be either done via a shell script or a hive hql script. The table is partitioned by intgestion_date=YYYY-MM-DD ...
Gaurav Bhatnagar's user avatar
0 votes
0 answers
1k views

HIVE: Exception: Partition Already Exists while ADDING a NEW Partition to an EXISTING EXTERNAL Table

I am getting the below error when the application (java) tries to execute an 'ADD partition' after 'DROP partition IF EXISTS' command in Hive:- """ Caused by: java.sql.SQLException: ...
Priyabrata Behera's user avatar
3 votes
1 answer
1k views

Repartition in Hadoop

My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we ...
frammnm's user avatar
  • 537

15 30 50 per page
1
2 3 4 5
10