Skip to main content

All Questions

Tagged with
0 votes
0 answers
31 views

Simultaneous overwrite and read causing file not found exception in S3

As part of my requirement we have a raw s3 bucket where initially the data is dumped as binary file and then the file are separately processed and consolidated to json file. The processor job which ...
Ashit_Kumar's user avatar
0 votes
1 answer
44 views

Speed up the data save to S3 buckets using spark scala

I am looking out for some pointers by which I can fasten the speed at which data is being persisted to S3. So I am currently persisting data to s3 buckets based on the below example path s3://...
Ashit_Kumar's user avatar
0 votes
0 answers
27 views

Spark Streaming java.io.InvalidClassException: projectname.clasname; local class incompatible: stream classdesc serialVersionUID

I create a Spark Streaming app with Spark 2.0.2 and Scala 2.11.8. In my app, a class extends Serializable. But when I add new fields to this class, this error is displayed when running on Spark. ...
Truong-Giang Nguyen's user avatar
0 votes
0 answers
18 views

Get all records within a window in spark structured streaming

I've a streaming application in structured streaming , I want to get all the records within a defined window (this is a event time window), when I am trying to use groupby(window) and use aggregate as ...
Amit Shahwal's user avatar
0 votes
0 answers
36 views

Higher latency discrepancy in Spark application with Synapse ML package between jar execution and Docker containerization

I have a Spark application leveraging the Synapse ML package for inferencing purposes. When I run the application as a standalone jar using java -jar my.jar, the p99 latency averages around 800ms. ...
Kundan Kumar's user avatar
  • 1,994
0 votes
1 answer
55 views

Derive Dataframe having uncommon rows based on a single column of another Dataframe

I have encountered an issue where in I have to get uncommon rows from a dataframe based on a column of another dataframe. Example being First dataframe i.e df1 _id name 12 abc 56 def 90 jkl ...
Ashit_Kumar's user avatar
0 votes
0 answers
351 views

How can i readStream from an Iceberg table as source whose snapshots expire?

I am reading with spark streaming an Iceberg table as source, this Iceberg table receives the information from kafka and have compact and expiring old snapshot maintenance options. I am using this ...
Emilio's user avatar
  • 11
1 vote
1 answer
112 views

Spark changelog issue, changelog doesn't exists

I am facing this issue with my spark job. The job was running fine for long time but now I see this issue. I am not able to find a solution please help me. We are running this on Kubernetes. Caused ...
Aditya Verma's user avatar
1 vote
2 answers
205 views

Error when trying to write spark to mongodb

I'm trying to append data to mongodb using spark streaming and I face some issue. Here is my code: def write_to_db(df, epoch_id): df.select("id", "production_companies").show() ...
Nguyễn Quốc Nhật Minh's user avatar
-1 votes
1 answer
126 views

How to convert column rows into a string variable using Spark Dataframe

I need to convert single-column rows into a string variable for use in a where condition while loading from a DB table, instead of loading the entire data from the table. Sample dataframe like below. ...
RMK's user avatar
  • 17
0 votes
0 answers
29 views

Spark streaming LOCAL run doesn't start when doing foreachbatch or flatMapGroupWithState

I'm facing a strange issue, I'm doing some tests about structured streaming, to do that I'm creating some main executed in LOCAL mode, everything was working fine until I tested flatMapGroupsWithState,...
D. belvedere's user avatar
1 vote
1 answer
482 views

Scala Spark Iceberg writeStream. How to set bucket?

I'm trying to write data to Iceberg table in Spark streaming (written in Scala). Writer code: val streamResult = joined.writeStream .format("iceberg") .partitionBy("...
Netrunner's user avatar
1 vote
0 answers
58 views

How to convert a sequence into a streaming DataFrame?

Having a sequence of strings Seq[String], how can I convert it into a streaming DataFrame? The functions SparkSession.createDataFrame(...) create static DataFrames only. And the class MemoryStream ...
pgrandjean's user avatar
0 votes
0 answers
35 views

Kafka consumer fetches new messages only after restarting the consumer

I'm facing an issue with my kafka consumer job written in scala. when we start the consumer, it fetches all messages available in the broker from the last consumed offset, process those JSON messages ...
Mani Ganesh's user avatar
1 vote
0 answers
74 views

Does Spark Structured stream support the concept of "tombstone" a.k.a deletion?

I have been developing with kafka streams for several years now. Recently, i got into a project that relies on spark structured streaming. Going through the documentation, to my surprise i could not ...
MaatDeamon's user avatar
  • 9,694

15 30 50 per page
1
2 3 4 5
84