Skip to main content

All Questions

Tagged with
1 vote
0 answers
31 views

More Parallelism Than Expected in Glue ETL Spark Job

I am using Glue ETL Spark jobs to run some tests. I am trying to understand why I am getting more parallel processing than the available cores on a single executor. Here's my job config: I setting ...
Yar's user avatar
  • 7,328
0 votes
0 answers
37 views

py4j.protocol.Py4JJavaError: An error occurred while calling o1593.saveAsTable. : java.lang.StackOverflowError

I am reading a file that has about 725 columns as Data Frame (df) and I then do some light trnasformation and append couple of columns about four on the final Data Frame (df_final). I then write or ...
lubabalo's user avatar
0 votes
1 answer
34 views

How to replace the datasource by the processed result in same glue task

I want to process some data from A and replace A by the processed result. Is there any "place" I can do something after a write() action completed? or is there a way to replace original dir ...
PuHsiu's user avatar
  • 1
0 votes
0 answers
203 views

AWS Glue job executors dying during shuffle write operations (writing parquets to S3)

I'm currently experiencing some issues with an AWS Glue Job that does some Spark SQL left joins of various datasets, and some help would be appreciated to understand the cause. The issue: The Glue Job ...
Otamad's user avatar
  • 3
0 votes
1 answer
51 views

How to join two dataframes based on start end end timestamps using spark

I have two dataframes like below that contain each trip start and end timestamps For example, consider a source dataframe where BUS1 departs from CITY1 at 2023-12-17 07:27:00. In a second dataframe, ...
RMK's user avatar
  • 17
0 votes
1 answer
166 views

AWS Glue Scala - split script into several scala files

I don't get how I can split the glue script into several scala files. I am aware that one prerequisite is to reference the "other scala file" in the "Referenced files path" and ...
Mouse On Mars's user avatar
0 votes
1 answer
105 views

java.lang.StackOverflowError when adding columns to a dataframe with a for loop and Withcolumn fonction in spark scala

I have a spark code that add columns in a dataframe from a configuation file and finally select only the existing columns in the configuration file to create a new dataframe. When I have less that ...
Mame Silmang Diouf's user avatar
0 votes
1 answer
61 views

aws glue version 0.9 python and scala scripts testing

We will be working on aws glue 0.9 version to 4.0 upgrade.As part of the analysis,we were checking the changes to be done. For testing purpose we have creatred some sample aws glue 0.9 python and ...
Adigkar's user avatar
  • 13
1 vote
0 answers
267 views

Not able to write to AWS Glue catalog metastore from spark jobs running on EMR

writing a simple spark job running on EMR to create a table stored in Glue catalog but it fails to recognize the glue catalog databases and writing to spark default metastore. EMR Configurations:- ...
karthik's user avatar
  • 36
0 votes
1 answer
417 views

Error while upgrading AWS Glue from 2.0 to 3.0

While upgrading an existing job from AWS Glue 2.0 to 3.0, the current scala version is 2.11.8 and Spark is 3.1 Exception in User Class: java.lang.NoSuchMethodError : scala.Predef$.refArrayOps([Ljava/...
vvazza's user avatar
  • 397
0 votes
1 answer
203 views

How to call AWS Glue crawler from AWS Glue job using Scala API?

I want to call GlueCrawler from the Glue job. I see there is an API https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-crawling.html#aws-glue-api-crawler-crawling-StartCrawler But I ...
Andrei Markhel's user avatar
5 votes
1 answer
2k views

Unable to read json files in AWS Glue using Apache Spark

For our use case we need to load in json files from an S3 bucket. As processing tool we are using AWS Glue. But because we will soon be migrating to Amazon EMR, we are already developing our Glue jobs ...
RudyVerboven's user avatar
  • 1,264
0 votes
1 answer
691 views

AWS Glue - AWSGlueETL dependency not resolved

I am trying to run Glue in my local using scala, so I added the below dependency as per the AWS Glue documentation(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html) &...
Chandrasekar S's user avatar
1 vote
0 answers
109 views

AWS Glue Terraform - Specify map as an input argument

Is there anyway to specify map/json structure as an input argument for aws glue job? I'm doing it in this way in terraform: glue_jobs =[ { name ="SampleGlueJob" default_arguments ={...
Kamil W's user avatar
  • 2,358
0 votes
1 answer
225 views

How can we read invalid date column in spark scala from mysql server using jdbc driver url (connection)

I am getting error while reading this column from mysql server id date 1 0000-00-00 2 0000-00-01 in the above data set we can handle 0000-00-00 by using mysql server Additional parameter ...
Ajay Makkar's user avatar

15 30 50 per page
1
2 3 4 5
8