Skip to main content

All Questions

Tagged with
1 vote
0 answers
18 views

Encrypt Spark Libsvm Dataframe

I have a libsvm file that I want to load into Spark and then encrypt it. I want to iterate over every element in the features to apply my encrypt function, but there doesn't seem to be any way to ...
Landor3000's user avatar
0 votes
1 answer
17 views

Adding new Rows to Spark Partition while using forEachPartition

I am trying to add a new Row to each Partition in my Spark Job. I am using the following code to achieve this: StructType rowType = new StructType(); rowType.add(DataTypes.createStructField("...
Sateesh K's user avatar
  • 1,081
0 votes
0 answers
21 views

Scala Spark Dataframe creation from Seq of tuples doesn't work in Scala 3, but does in Scala 2

When trying to test something locally with Scala Spark, I noticed the following problem and was wondering what causes it, and whether there exists a workaround. Consider the following build ...
Maurycyt's user avatar
  • 718
-1 votes
0 answers
39 views

Using spark 3.4.1 lib in Java when extending StringRegexExpression to a java class

I am using spark 3.4.1 in maven project where I am configured scala (2.13.8) lang as well. I am trying to create a class Like.java in project by extending spark's StringRegexExpression package com....
Manoj Kumar's user avatar
1 vote
1 answer
27 views

Can I use same SparkSession in different threads

In my spark app I use many temp views to read datasets and then use it in huge sql expression, like that: for (view < cfg.views) spark.read.format(view.format).load(view.path).createTempView(view....
Vladimir Shadrin's user avatar
1 vote
0 answers
27 views

Spark scala transformations

I have spark input dataframe like below. Emp_ID Cricket Chess Swim 11 Y N N 12 Y Y Y 13 N N Y Need Out Dataframe like below. Hobbies Emp_id_list Cricket 11,12 Chess 12 Swim 12,13 Any way to ...
srinivas gowda's user avatar
-1 votes
0 answers
25 views

udf to transform a json string into multiple rows based on first level of nesting

I am trying to transform a df based on the first level nesting in the json string. input dataframe +------+------------------------------------+---------------------------------------------------------...
Shibu's user avatar
  • 1,490
0 votes
1 answer
48 views

spark.sql() giving error : org.apache.spark.sql.catalyst.parser.ParseException: Syntax error at or near '('(line 2, pos 52)

I have class LowerCaseColumn.scala where one function is defined as below : override def registerSQL(): Unit = spark.sql( """ |CREATE OR REPLACE TEMPORARY ...
Chandra Prakash's user avatar
0 votes
1 answer
58 views
+50

How to create data-frame on rocks db (SST files)

We hold our documents in rocks-db. We will be syncing these rocks-db sst files to S3. I would like to create a dataframe on the SST files and later run an sql. When googled, I was not able to find any ...
chendu's user avatar
  • 729
0 votes
0 answers
22 views

Flattening nested json with back slash in apache spark scala Dataframe

{ "messageBody": "{\"task\":{\"taskId\":\"c6d9fb0e-42ba-4a3e-bd39-f2a32a6958c1\",\"serializedTaskData\":\"{\\\"clientId\\\":\\\&...
Vanshaj Singh's user avatar
0 votes
0 answers
33 views

Spark : Read special characters from the content of dat file without corrupting it in scala

I have to read all the special characters in some dat file (e.g.- testdata.dat) without being corrupted and initialise it into a dataframe in scala using spark. I have one dat file (eg - testdata.dat),...
Prantik Banerjee's user avatar
1 vote
0 answers
28 views

Creating a custom aggregator in spark with window rowsBetween?

What I'm trying to do is use a window function to get the last and current row and do some computation on a couple of the columns with a custom aggregator. I have time series data with points that are ...
Adrian Corey's user avatar
1 vote
0 answers
31 views

More Parallelism Than Expected in Glue ETL Spark Job

I am using Glue ETL Spark jobs to run some tests. I am trying to understand why I am getting more parallel processing than the available cores on a single executor. Here's my job config: I setting ...
Yar's user avatar
  • 7,328
0 votes
2 answers
38 views

Determine if a condition is ever true in an aggregated dataset with Scala spark sql library

I'm trying to aggregate a dataset and determine if a condition is ever true for a row in the dataset. Suppose I have a dataset with these values cust_id travel_type distance_travelled 1 car 10 1 ...
Darragh.McL's user avatar
0 votes
0 answers
23 views

How to use Apache Ignite RDD with Apache Spark RDD in a web application

I created a scala web application using apache ignite and spark. The application uses ignite RDD and Spark RDD. I was only able to successfully run the application by using spark-submit. But I wonder ...
Chana's user avatar
  • 1

15 30 50 per page
1
2 3 4 5
1618