Skip to main content

Questions tagged [aws-glue]

AWS Glue is a fully managed ETL (extract, transform, and load) service that can categorize your data, clean it, enrich it, and move it between various data stores. AWS Glue consists of a central data repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python code, and a scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there's no infrastructure to manage.

0 votes
0 answers
12 views

Delta table column mapping support in Athena/Glue

I'm confused by AWS documentation regarding compatibility with delta tables. We need to delete a column that is the "column mapping" feature supported in delta-lake 1.2.0 and we do it ...
Sergii V.'s user avatar
  • 301
0 votes
2 answers
19 views

Glue: Extracting Bucket Name and Key from AWS Event Triggered NotifyEvent payload in Cloud trail

I have a Event Bridge Trigger set on a s3 bucket and everytime we upload an object, it triggers a NotifyEvent in Cloud Trail. I am trying to extract the bucket name and key from the payload
Manish's user avatar
  • 33
0 votes
1 answer
33 views

Is there a better way to optimize my AWS Glue Script?

I am a novice with AWS Glue and PySpark and I am unable to resolve the problem I am facing and would require the community's help. Task: I was tasked with creating a script on AWS Glue using PySpark ...
Gokul Subramanian's user avatar
0 votes
1 answer
30 views

AWS Glue Python Script Doesn't install wheel from s3 when adding a glue connection

I'm running a glue python-shell script, and I include extra-py-files that are paths in S3 to wheels I've built for the script. These are installed as expected. When I attach a Glue Connection to the ...
Nevermore's user avatar
  • 7,319
1 vote
1 answer
37 views

Pyspark efficient ways to iterate over 1M columns

I have a pyspark dataframe as below: +--------+-------------+---------+---------+---------+ | code| updatedAt|S0x223433|S1yd33333|S4r256467| +--------+-------------+---------+---------+---------+...
datawiz879's user avatar
0 votes
0 answers
22 views

Apache Iceberg - long merge time

I have AWS Glue job which is trying to merge data into Apache Iceberg table partitioned by product_id. What i'm trying to achieve is to be able to run concurrent merge operations using AWS Glue jobs ...
P.Zaw's user avatar
  • 53
3 votes
0 answers
38 views

Getting : "An error occurred while calling o110.pyWriteDynamicFrame. Exception thrown in awaitResult:" in AWS Glue Job

I am getting "An error occurred while calling o110.pyWriteDynamicFrame. Exception is thrown in awaitResult:" in AWS Glue Job. The size of my source data in s3 is around 60 GB. I am reading ...
Nikhil Khandelwal's user avatar
0 votes
0 answers
26 views

Botocore.exceptions.DataNotFoundError: unable to load data points for : glue

I’m working on migrating aws glue etls from v2 to v4. While creating glue client i’m getting error botocore.exception data not found: unable to load data points error. Failing at botocore—> loaders....
Malvika Garg's user avatar
0 votes
0 answers
25 views

AWS Athena Error: Modifying Hive table rows is only supported for transactional tables

I am not able to perform delete operation on row in AWS Athena tables. It is throwing below error as: NOT_SUPPORTED: Modifying Hive table rows is only supported for transactional tables This query ran ...
Lakshay's user avatar
  • 594
1 vote
0 answers
26 views

Flattening a DataFrame in Spark after resolving nested field datatypes

Im having issues flattening a nested dataframe in spark which I have solved using a custom function but I am wondering if there is a better way to go about this. The workflow is simple, the files ...
BurgerTown's user avatar
3 votes
1 answer
48 views

How to format string date for AWS glue crawler/data frame to correctly identify as date field?

I have some json data (sample below). aws glue crawler reads this data and creates a glue catalog database with table , and sets the date field as a string field . is there a way , i can format date ...
kishi's user avatar
  • 125
0 votes
0 answers
20 views

What are the character limitations of a SchemaName in the AWS Schema Registry?

What are the limiations of the SchemaName within AWS Glue Schema Registry? In particular I would like to know: What is the maximum character limit for a schema name? I ask because the ...
vab2048's user avatar
  • 1,195
0 votes
0 answers
24 views

how to check logs for a glue trigger job?

I have aws glue resources set up , which if i run manually from the browser works. to automate this i added aws_glue_trigger resource with a condition that , if the crawler succeeds then i fire off ...
kishi's user avatar
  • 125
0 votes
0 answers
35 views

How to execute SQL statements against on premise Oracle DB in AWS Glue

I am trying to update a table in my Oracle DB using AWS Glue. I know that PySpark is not really meant for updating tables or inserting. More for reading. Though, I am trying to update all rows for a ...
rz01's user avatar
  • 61
0 votes
1 answer
33 views

Invalid Identifier error while using standard_hash function in oracle 11g

I'm trying to generate a hash based on a field but got the following error: Query: select standard_hash(pk_time) from schema.table Error: "STANDARD_HASH": invalid identifier The column type ...
Gocht's user avatar
  • 10.2k

15 30 50 per page
1
2 3 4 5
281