Skip to main content

All Questions

Tagged with
0 votes
1 answer
48 views

spark.sql() giving error : org.apache.spark.sql.catalyst.parser.ParseException: Syntax error at or near '('(line 2, pos 52)

I have class LowerCaseColumn.scala where one function is defined as below : override def registerSQL(): Unit = spark.sql( """ |CREATE OR REPLACE TEMPORARY ...
Chandra Prakash's user avatar
0 votes
2 answers
38 views

Determine if a condition is ever true in an aggregated dataset with Scala spark sql library

I'm trying to aggregate a dataset and determine if a condition is ever true for a row in the dataset. Suppose I have a dataset with these values cust_id travel_type distance_travelled 1 car 10 1 ...
Darragh.McL's user avatar
0 votes
0 answers
42 views

Spark re computes the cached Dataframes

Working on a Spark application written in Scala. Have six functions. Each function takes two Dataframes as an input, processes them and emits one result DF. I am caching the result of each function's ...
Karthik's user avatar
  • 63
0 votes
1 answer
133 views

java.lang.OutOfMemoryError: UTF16 String size exceeding default value

I was trying to load a tsv files from urls (max file size was 1.05 GB or 1129672402 Bytes) I used java.net.URL for it. But, it throwed the below error (for the largest one)- java.lang.OutOfMemoryError:...
prisoner's user avatar
0 votes
2 answers
39 views

spark dataframe to check if all the elements are matched to given value of particular column

I have created spark dataframe using scala, here is sample data emp_id|result 1000 | [true,true,true] 1001 | [true,false,true] 1002 | [true,true,true] result column is array I would like to ...
N9909's user avatar
  • 225
1 vote
1 answer
36 views

adding new column to dataframe of Array[String] type based on condition, spark scala

I have the following dataframe - colA colB A1 B1 A2 B2 A3 B3 colA: String, colB: String Also, I have a Map[String, Array[String]] I want to add a new column 'colC' containing values of Map ...
prisoner's user avatar
0 votes
0 answers
74 views

Compare 2 Lists/Array in Scala Spark

I have 2 lists: # time taken x1 = List(10, 20, 30, 40, 50) # time alloted y1 = List(15, 30) here are some more examples +-------------------+----------------------+----------+ | time_taken |...
nagraj036's user avatar
  • 175
0 votes
0 answers
62 views

Convert nested avro structures to flat schema in Apache Spark

I have a use case where I have to read data from Kafka and write to a Sink. The data in kafka is in avro and the fields are wrapped in an avro map. The map will not have same keys always and will vary ...
user3679686's user avatar
0 votes
0 answers
30 views

Scala Spark: average of difference

Given input dataframe with structure: | machine_id | process_id | activity_type | timestamp | | ---------- | ---------- | ------------- | --------- | | 0 | 0 | start | ...
Jelly's user avatar
  • 1,178
0 votes
0 answers
24 views

Spark SQL - performance degradation after adding a new column

My code is in Scala and I'm using Spark SQL syntax to make a union between 3 dataframes of data. Currently I am working on adding a new field. It's applicable only for one of the dataframes, so the ...
nelyanne's user avatar
  • 106
-1 votes
1 answer
41 views

Not able to create CSV using Spark dataframe and scala, Instead it is creating folder with `.csv` in folder name

I am not able to write or create csv using spark dataframe. Instead it is creating directoy for me. This is my code package com.package.dssupplier import org.apache.spark.sql.{SaveMode, SparkSession} ...
Braham Shakti's user avatar
2 votes
0 answers
50 views

How can I replace values in array of struct with another values using spark?

I have a hive table named student_details, it has the below format: | Date | Name | Age | Subject | Students ...
Ambivert's user avatar
0 votes
1 answer
46 views

how to call a class inside another Scala Object?

I have a class DFHelper which helps getting the dataframe keys. I want to maintain it as generic code and call it from another main scala object. E.g the first code section i am defining for generic ...
Shankar Panda's user avatar
0 votes
1 answer
31 views

How to get the keys from org.apache.spark.sql.Column type in scala and put into a list variable?

I am trying to get the keys from org.apache.spark.sql.Column type variable and put it into a list so that i can do some schema comparison. inputFieldMap: org.apache.spark.sql.Column = keys:[customerID,...
Shankar Panda's user avatar
-1 votes
1 answer
25 views

Filter out and log null values from Spark dataframe

I have this dataframe : +------+-------------------+-----------+ |brand |original_timestamp |weight | +------+-------------------+-----------+ |BR1 |1632899456 |4.0 | |BR2 |...
Nab's user avatar
  • 138

15 30 50 per page
1
2 3 4 5
456