All Questions
109
questions
1
vote
0
answers
31
views
More Parallelism Than Expected in Glue ETL Spark Job
I am using Glue ETL Spark jobs to run some tests. I am trying to understand why I am getting more parallel processing than the available cores on a single executor.
Here's my job config:
I setting ...
0
votes
0
answers
37
views
py4j.protocol.Py4JJavaError: An error occurred while calling o1593.saveAsTable. : java.lang.StackOverflowError
I am reading a file that has about 725 columns as Data Frame (df) and I then do some light trnasformation and append couple of columns about four on the final Data Frame (df_final). I then write or ...
0
votes
1
answer
34
views
How to replace the datasource by the processed result in same glue task
I want to process some data from A and replace A by the processed result.
Is there any "place" I can do something after a write() action completed? or is there a way to replace original dir ...
0
votes
0
answers
203
views
AWS Glue job executors dying during shuffle write operations (writing parquets to S3)
I'm currently experiencing some issues with an AWS Glue Job that does some Spark SQL left joins of various datasets, and some help would be appreciated to understand the cause.
The issue:
The Glue Job ...
0
votes
1
answer
51
views
How to join two dataframes based on start end end timestamps using spark
I have two dataframes like below that contain each trip start and end timestamps
For example, consider a source dataframe where BUS1 departs from CITY1 at 2023-12-17 07:27:00. In a second dataframe, ...
0
votes
1
answer
166
views
AWS Glue Scala - split script into several scala files
I don't get how I can split the glue script into several scala files. I am aware that one prerequisite is to reference the "other scala file" in the "Referenced files path" and ...
0
votes
1
answer
105
views
java.lang.StackOverflowError when adding columns to a dataframe with a for loop and Withcolumn fonction in spark scala
I have a spark code that add columns in a dataframe from a configuation file and finally select only the existing columns in the configuration file to create a new dataframe.
When I have less that ...
0
votes
1
answer
61
views
aws glue version 0.9 python and scala scripts testing
We will be working on aws glue 0.9 version to 4.0 upgrade.As part of the analysis,we were checking the changes to be done. For testing purpose we have creatred some sample aws glue 0.9 python and ...
1
vote
0
answers
267
views
Not able to write to AWS Glue catalog metastore from spark jobs running on EMR
writing a simple spark job running on EMR to create a table stored in Glue catalog but it fails to recognize the glue catalog databases and writing to spark default metastore.
EMR Configurations:-
...
0
votes
1
answer
417
views
Error while upgrading AWS Glue from 2.0 to 3.0
While upgrading an existing job from AWS Glue 2.0 to 3.0, the current scala version is 2.11.8 and Spark is 3.1
Exception in User Class: java.lang.NoSuchMethodError : scala.Predef$.refArrayOps([Ljava/...
0
votes
1
answer
203
views
How to call AWS Glue crawler from AWS Glue job using Scala API?
I want to call GlueCrawler from the Glue job. I see there is an API https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-crawling.html#aws-glue-api-crawler-crawling-StartCrawler
But I ...
5
votes
1
answer
2k
views
Unable to read json files in AWS Glue using Apache Spark
For our use case we need to load in json files from an S3 bucket. As processing tool we are using AWS Glue. But because we will soon be migrating to Amazon EMR, we are already developing our Glue jobs ...
0
votes
1
answer
691
views
AWS Glue - AWSGlueETL dependency not resolved
I am trying to run Glue in my local using scala, so I added the below dependency as per the AWS Glue documentation(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html)
&...
1
vote
0
answers
109
views
AWS Glue Terraform - Specify map as an input argument
Is there anyway to specify map/json structure as an input argument for aws glue job? I'm doing it in this way in terraform:
glue_jobs =[
{
name ="SampleGlueJob"
default_arguments ={...
0
votes
1
answer
225
views
How can we read invalid date column in spark scala from mysql server using jdbc driver url (connection)
I am getting error while reading this column from mysql server
id
date
1
0000-00-00
2
0000-00-01
in the above data set we can handle 0000-00-00 by using mysql server Additional parameter
...