Newest 'scala+aws-glue' Questions

1 vote

0 answers

31 views

More Parallelism Than Expected in Glue ETL Spark Job

I am using Glue ETL Spark jobs to run some tests. I am trying to understand why I am getting more parallel processing than the available cores on a single executor. Here's my job config: I setting ...

Yar

7,328

asked Jul 1 at 21:24

0 votes

0 answers

37 views

py4j.protocol.Py4JJavaError: An error occurred while calling o1593.saveAsTable. : java.lang.StackOverflowError

I am reading a file that has about 725 columns as Data Frame (df) and I then do some light trnasformation and append couple of columns about four on the final Data Frame (df_final). I then write or ...

lubabalo

1

asked Apr 17 at 8:13

0 votes

1 answer

34 views

How to replace the datasource by the processed result in same glue task

I want to process some data from A and replace A by the processed result. Is there any "place" I can do something after a write() action completed? or is there a way to replace original dir ...

PuHsiu

1

asked Mar 10 at 7:40

0 votes

0 answers

203 views

AWS Glue job executors dying during shuffle write operations (writing parquets to S3)

I'm currently experiencing some issues with an AWS Glue Job that does some Spark SQL left joins of various datasets, and some help would be appreciated to understand the cause. The issue: The Glue Job ...

Otamad

3

asked Jan 25 at 11:20

0 votes

1 answer

51 views

How to join two dataframes based on start end end timestamps using spark

I have two dataframes like below that contain each trip start and end timestamps For example, consider a source dataframe where BUS1 departs from CITY1 at 2023-12-17 07:27:00. In a second dataframe, ...

RMK

17

asked Jan 11 at 19:13

0 votes

1 answer

166 views

AWS Glue Scala - split script into several scala files

I don't get how I can split the glue script into several scala files. I am aware that one prerequisite is to reference the "other scala file" in the "Referenced files path" and ...

Mouse On Mars

1,074

asked Nov 28, 2023 at 19:41

0 votes

1 answer

105 views

java.lang.StackOverflowError when adding columns to a dataframe with a for loop and Withcolumn fonction in spark scala

I have a spark code that add columns in a dataframe from a configuation file and finally select only the existing columns in the configuration file to create a new dataframe. When I have less that ...

Mame Silmang Diouf

19

asked Oct 19, 2023 at 8:42

0 votes

1 answer

61 views

aws glue version 0.9 python and scala scripts testing

We will be working on aws glue 0.9 version to 4.0 upgrade.As part of the analysis,we were checking the changes to be done. For testing purpose we have creatred some sample aws glue 0.9 python and ...

Adigkar

13

asked Aug 28, 2023 at 4:58

1 vote

0 answers

267 views

Not able to write to AWS Glue catalog metastore from spark jobs running on EMR

writing a simple spark job running on EMR to create a table stored in Glue catalog but it fails to recognize the glue catalog databases and writing to spark default metastore. EMR Configurations:- ...

karthik

36

asked Jun 29, 2023 at 14:41

0 votes

1 answer

417 views

Error while upgrading AWS Glue from 2.0 to 3.0

While upgrading an existing job from AWS Glue 2.0 to 3.0, the current scala version is 2.11.8 and Spark is 3.1 Exception in User Class: java.lang.NoSuchMethodError : scala.Predef$.refArrayOps([Ljava/...

vvazza

397

asked Jun 23, 2023 at 13:40

0 votes

1 answer

203 views

How to call AWS Glue crawler from AWS Glue job using Scala API?

I want to call GlueCrawler from the Glue job. I see there is an API https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-crawling.html#aws-glue-api-crawler-crawling-StartCrawler But I ...

Andrei Markhel

175

asked Apr 21, 2023 at 17:36

5 votes

1 answer

2k views

Unable to read json files in AWS Glue using Apache Spark

For our use case we need to load in json files from an S3 bucket. As processing tool we are using AWS Glue. But because we will soon be migrating to Amazon EMR, we are already developing our Glue jobs ...

RudyVerboven

1,264

asked Jan 24, 2023 at 15:38

0 votes

1 answer

691 views

AWS Glue - AWSGlueETL dependency not resolved

I am trying to run Glue in my local using scala, so I added the below dependency as per the AWS Glue documentation(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html) &...

Chandrasekar S

13

asked Nov 23, 2022 at 19:23

1 vote

0 answers

109 views

AWS Glue Terraform - Specify map as an input argument

Is there anyway to specify map/json structure as an input argument for aws glue job? I'm doing it in this way in terraform: glue_jobs =[ { name ="SampleGlueJob" default_arguments ={...

Kamil W

2,358

asked Oct 28, 2022 at 9:14

0 votes

1 answer

225 views

How can we read invalid date column in spark scala from mysql server using jdbc driver url (connection)

I am getting error while reading this column from mysql server id date 1 0000-00-00 2 0000-00-01 in the above data set we can handle 0000-00-00 by using mysql server Additional parameter ...

Ajay Makkar

13

asked Aug 24, 2022 at 12:27

Collectives™ on Stack Overflow

All Questions

More Parallelism Than Expected in Glue ETL Spark Job

py4j.protocol.Py4JJavaError: An error occurred while calling o1593.saveAsTable. : java.lang.StackOverflowError

How to replace the datasource by the processed result in same glue task

AWS Glue job executors dying during shuffle write operations (writing parquets to S3)

How to join two dataframes based on start end end timestamps using spark

AWS Glue Scala - split script into several scala files

java.lang.StackOverflowError when adding columns to a dataframe with a for loop and Withcolumn fonction in spark scala

aws glue version 0.9 python and scala scripts testing

Not able to write to AWS Glue catalog metastore from spark jobs running on EMR

Error while upgrading AWS Glue from 2.0 to 3.0

How to call AWS Glue crawler from AWS Glue job using Scala API?

Unable to read json files in AWS Glue using Apache Spark

AWS Glue - AWSGlueETL dependency not resolved

AWS Glue Terraform - Specify map as an input argument

How can we read invalid date column in spark scala from mysql server using jdbc driver url (connection)

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags