Newest 'hive-partitions' Questions

0 votes

0 answers

24 views

Hive: Dropping partitions not satisfying the condition

I'm trying to come up with a solution to my problem: I need to delete all data in a partitioned (managed/external) table in Hive that don't satisfy the condition applied on other non-partition column. ...

Kiki

11

asked Apr 17 at 18:41

0 votes

0 answers

33 views

How do I retrieve the schema fields that pyarrow has inferred from the Hive partitioning of a Dataset?

Once I've loaded a a hive-partitioned Dataset, how do I retrieve the fields that pyarrow has inferred as being the partitioning fields? I know I can ask that to a Fragment (by using the fragment's ...

Gabriele Giuseppini

1,581

asked Feb 16 at 10:01

0 votes

0 answers

215 views

Delta Tables...do we need partitions for concurrent write/update?

I am starting to use Databricks in AWS. I have a delta table that contains KPIs, with each KPI having a KPI ID (1000, 1001, 1002, etc...). We want to have concurrent processes that update those KPIs ...

chulo

63

asked Jan 29 at 12:13

0 votes

0 answers

18 views

why hive explain command always shows same result on different conditisions?

I run below command to analyze table in hive. analyze table my_partitioned_table partition(day='20230101') COMPUTE STATISTICS FOR COLUMNS and when it finishes i try this to see explain before running ...

CompEng

7,291

asked Oct 3, 2023 at 11:17

5 votes

1 answer

6k views

How to read filtered partitioned parquet files efficiently using pandas's read_parquet?

Let say my data stored in object storage, say s3, with date time partition like this: s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet ... s3://my-bucket/year=2022/month=12/day=31/SOME-...

user3595632

5,590

asked Aug 31, 2022 at 6:38

1 vote

1 answer

2k views

Hive insert into partitioned table with colums list from select

I want to insert into a partitioned Hive table tb_1(a, b, c, d, p1) only columns (a, b) from a select statement. Ex: insert into table tb_1 partition (p1) (a, b) select a, b from tb_2; How can I ...

david vallet

38

asked Jul 8, 2022 at 1:30

2 votes

1 answer

319 views

In Foundry, how can I Hive partition with only 1 parquet file per value?

I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the partition column to a column in the dataset (called ...

Andrew Andrade

2,768

asked Jun 29, 2022 at 17:48

1 vote

1 answer

746 views

Databricks / Spark storage mechanism for Delta Tables, Delta Logs, Partitions etc

I am trying to understand how data is stored and managed in the DataBricks environment. I have a fairly decent understanding of what is going on under the hood but have seen some conflicting ...

guacacholay

11

asked May 20, 2022 at 13:44

0 votes

1 answer

568 views

Need to merge multiple hive partitions into one partition in spark

I have around 50 partitions in hive table. I need to merge each set of partitions into one partition. I tried to use rename partition command. But getting error message. Need help in merging multiple ...

Arvinth

60

asked Apr 19, 2022 at 11:12

2 votes

1 answer

1k views

How to automatically update the Hive external table metadata partitions for streaming data

I am writing the spark streaming data into hdfs partitions using pyspark. please find the code data = (spark.readStream.format("json").schema(fileSchema).load(inputDirectoryOfJsonFiles)) ...

nani

21

asked Feb 13, 2022 at 18:13

3 votes

1 answer

872 views

Performance of pyspark + hive when a table has many partition columns

I am trying to understand the performance impact on the partitioning scheme when Spark is used to query a hive table. As an example: Table 1 has 3 partition columns, and data is stored in paths like ...

GeorgeWilson

592

asked Dec 19, 2021 at 7:34

1 vote

1 answer

412 views

Hive - incomplete rows in select from managed partitioned table

I need to copy data from a CSV file to a managed partitioned table in Hive. CSV file rows are: id,nome,cognome,ruolo 16,Mike,Maignan,Portiere 23,Fikayo,Tomori,Centrale 24,Simon,Kjaer,Centrale ...

Moreno

25

asked Dec 13, 2021 at 10:49

1 vote

1 answer

202 views

How to retain last N partitions for a hive external table?

I need to retain say last 7 partitions and data of a given hive external table. This can be either done via a shell script or a hive hql script. The table is partitioned by intgestion_date=YYYY-MM-DD ...

Gaurav Bhatnagar

45

asked Nov 30, 2021 at 15:28

0 votes

0 answers

1k views

HIVE: Exception: Partition Already Exists while ADDING a NEW Partition to an EXISTING EXTERNAL Table

I am getting the below error when the application (java) tries to execute an 'ADD partition' after 'DROP partition IF EXISTS' command in Hive:- """ Caused by: java.sql.SQLException: ...

Priyabrata Behera

11

asked Sep 7, 2021 at 15:49

3 votes

1 answer

1k views

Repartition in Hadoop

My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we ...

frammnm

537

asked Aug 11, 2021 at 10:16

Collectives™ on Stack Overflow

Questions tagged [hive-partitions]

Hive: Dropping partitions not satisfying the condition

How do I retrieve the schema fields that pyarrow has inferred from the Hive partitioning of a Dataset?

Delta Tables...do we need partitions for concurrent write/update?

why hive explain command always shows same result on different conditisions?

How to read filtered partitioned parquet files efficiently using pandas's read_parquet?

Hive insert into partitioned table with colums list from select

In Foundry, how can I Hive partition with only 1 parquet file per value?

Databricks / Spark storage mechanism for Delta Tables, Delta Logs, Partitions etc

Need to merge multiple hive partitions into one partition in spark

How to automatically update the Hive external table metadata partitions for streaming data

Performance of pyspark + hive when a table has many partition columns

Hive - incomplete rows in select from managed partitioned table

How to retain last N partitions for a hive external table?

HIVE: Exception: Partition Already Exists while ADDING a NEW Partition to an EXISTING EXTERNAL Table

Repartition in Hadoop

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [hive-partitions]

Related Tags