Questions tagged [apache-hudi]

Ask Question

Apache Hudi is a transactional data lake platform with a focus on batch and event processing (with ACID support). Use this tag for questions specific to problems with Apache-Hudi. Do not use this tag for common issues with topic data lake or delta lake.

190 questions

0 votes

0 answers

38 views

Apache Hudi: Ingesting protobuf data from Kafka

I am exploring using Apache Hudi HoodieStreamer to ingest protobuf messages from Kafka into Hudi. Despite a lot of attempts I have hit a roadblock. I get an exception while the HoodieStreamer tries ...

Gaurav

asked Jul 9 at 3:53

0 votes

0 answers

35 views

Unable to sync non-partitioned Hudi table with BigQuery

I'm trying to to write my structured streaming data to Apache Hudi in a non-partitioned table and then sync it with BigQuery. But even though it is a new table and I've set no partitioning ...

Vinayak Gupta

asked Jul 1 at 8:20

0 votes

1 answer

57 views

Issues while writing xml data to hudi table in azure synapse notebook

I've successfully read blob data (XML) from container in azure synapse notebook and displayed dataframe df as per my need however while writing it as hudi table in azure data lake storage Gen2 I've ...

Vishal Patwardhan

asked Jun 25 at 14:59

0 votes

0 answers

16 views

How to detect and create a alarm for a hudi job failure using hoodie metrics via Prometheus

Problem: While using multi delta streamer for kafka ingestion, out of many tables, if one of the table ingestion fails, job succeeds. There is no way to check for success/failure for a particular ...

Roobal Jindal

asked Jun 10 at 10:36

1 vote

3 answers

90 views

Unexplained s3 slowdowns when ingesting data to hudi tables using spark/python Glue jobs

I'm using AWS Glue Spark/python jobs to ingest data into hudi tables in a s3 bucket. I'm hitting major s3 slowdown issues, in a way that goes beyond reasonable, but unable to pin down the root cause. ...

Aamit

asked Jun 2 at 0:34

1 vote

0 answers

67 views

Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue

I am trying to perform a hard delete operation on a HUDI table, but unable to delete the data in the table. My setup is pretty straightforward I use a normal glue Job to create the hudi tables and use ...

Yashaswi Dondapati

asked May 27 at 15:59

0 votes

0 answers

26 views

Apache Hudi - MOR | Getting same number of records in table and table_rt after each run

We are running an ingestion job on AWS-Glue using Pyspark which reads data from the source and writes it in HUDI | MOR. The HUDI configurations that we are using are as follows: "hoodie.table....

Harsh Kumar

asked May 13 at 9:50

0 votes

0 answers

14 views

spark seesion unable to handle multiple apis hitting at the same time

I have an API naming , getVisualier , when I am hitting it multiple times in milliseconds , I am not getting any response, but when I am hitting the same api , singly ,using Replay XHR ,it is working ...

Shikhar Malviya

asked May 9 at 7:50

0 votes

0 answers

34 views

Spark application running on AWS EMR throws error "this.fileSystem" is null while writing to hudi

We have a spark application running on aws emr, which computes results, but while writing to hudi it throws up below error. And also not sure if it is while writing to hudi or executing at the end(but ...

Gokul S

asked May 7 at 3:35

1 vote

1 answer

48 views

Is it possible to specifically handle Hudi exceptions in Pyspark

I am reading Hudi tables from s3 and sometimes the bucket or prefix may be empty and org.apache.hudi.exception.TableNotFoundException is thrown. is there a way for me to import and handle these ...

lollerskates

1,114

asked Apr 20 at 23:10

0 votes

1 answer

44 views

Unsupported options found for 'hudi'

I'm testing Apache Hudi with Flink SQL Client on Yarn cluster. When I'm trying to create a Hudi catalog (like described) I'm facing an error telling me that hive.conf.dir and mode options are not ...

Niko

asked Mar 29 at 10:04

0 votes

0 answers

70 views

How to print hudi logs in aws emr serverless application

I have created a emr serverless application to run hudi spark job but neither of driver and executor logs are having logs related to hudi. I tried setting applicationProperties of emr serverless app ...

Roobal Jindal

asked Mar 18 at 10:24

0 votes

1 answer

47 views

"hoodie.parquet.max.file.size" and "hoodie.parquet.small.file.limit" Property is Being Ignored

I want my hoodie file size to be between small=50MB and max=100MB. The following configs are being used as map options for upsert: val hudiOptions = Map[String, String]( HoodieWriteConfig....

Amit Kumar

asked Mar 7 at 8:43

1 vote

0 answers

101 views

pySpark hudi table partial updating with org.apache.hudi.common.model.PartialUpdateAvroPayload not working

I have two tables in S3 tableA with columns id, col1, col2 and col3. tableB with columns id, col4 and col5. I want to write this data into another s3 in Hudi format as tableC with columns id, col1, ...

JanakaRao

asked Feb 29 at 7:19

0 votes

1 answer

212 views

Using Minio, how to authenticate amazon s3 endpoint in java

So I have an Java app java -jar utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig onetable.yaml I want it to connect to Minio export AWS_ACCESS_KEY_ID=admin export AWS_SECRET_ACCESS_KEY=password ...

Albert T. Wong

1,593

asked Feb 22 at 22:29

15 30 50 per page

2 3 4 5

…

13 Next

Collectives™ on Stack Overflow

Questions tagged [apache-hudi]

Apache Hudi: Ingesting protobuf data from Kafka

Unable to sync non-partitioned Hudi table with BigQuery

Issues while writing xml data to hudi table in azure synapse notebook

How to detect and create a alarm for a hudi job failure using hoodie metrics via Prometheus

Unexplained s3 slowdowns when ingesting data to hudi tables using spark/python Glue jobs

Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue

Apache Hudi - MOR | Getting same number of records in table and table_rt after each run

spark seesion unable to handle multiple apis hitting at the same time

Spark application running on AWS EMR throws error "this.fileSystem" is null while writing to hudi

Is it possible to specifically handle Hudi exceptions in Pyspark

Unsupported options found for 'hudi'

How to print hudi logs in aws emr serverless application

"hoodie.parquet.max.file.size" and "hoodie.parquet.small.file.limit" Property is Being Ignored

pySpark hudi table partial updating with org.apache.hudi.common.model.PartialUpdateAvroPayload not working

Using Minio, how to authenticate amazon s3 endpoint in java

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [apache-hudi]

Related Tags