All Questions
183
questions
0
votes
0
answers
33
views
Spark : Read special characters from the content of dat file without corrupting it in scala
I have to read all the special characters in some dat file (e.g.- testdata.dat) without being corrupted and initialise it into a dataframe in scala using spark.
I have one dat file (eg - testdata.dat),...
0
votes
0
answers
61
views
How to create an empty TSV file and how to transfer dataframe to TSV file in Scala Spark?
The logic is I have a set of set, let's suppose it's val Set = {{1,1,1}, {}, {2,2,2}}. I want to loop this set, if the set is empty, I want to create an empty TsV file, but if it is not empty, I want ...
0
votes
1
answer
271
views
How to read JSON file in Spark Scala?
I have a JSON file I want to read using Spark Scala, but when I read that file as DF it shows "_corrupt_record" column, and I tried all possible ways.
val df = spark.read
.format("...
-2
votes
1
answer
83
views
An elegant way to extract 'Date' attribute from a File [closed]
So I have multiple files (on Windows 10 operating system and NTFS file system) with 'Date' attributes as here
'Date' attributes are available in Windows File Explorer in the 'Details' view after ...
1
vote
2
answers
1k
views
Read file in Spark Scala having special character '{' and '}' in their filename
I wanted to read a file in Spark Scala having name: monthlyPurchaseFile{202205}-May.TXT
I am using below code:
val df = spark.read.text("handel_special_ch/monthlyPurchaseFile{202205}-May.TXT"...
-1
votes
1
answer
80
views
Writing in a file goes wrong
Hello I am trying to add a line to a file using scala.
I tried this
val pw = new FileWriter("src/test/resources/config")
pw.write("file contents")
pw.append("keke")
...
1
vote
1
answer
809
views
decompress data in scala using gzip
when i try to decompress gzip file i get error:
my code:
val file_inp = new FileInputStream("Textfile.txt.gzip")
val file_out = new FileOutputStream("Textfromgzip.txt")
val gzInp =...
0
votes
2
answers
967
views
Read local/linux files in Spark Scala code executing in Yarn Cluster Mode
How to access and read local file data in Spark executing in Yarn Cluster Mode.
local/linux file: /home/test_dir/test_file.csv
spark-submit --class "" --master yarn --deploy_mode cluster --...
0
votes
1
answer
320
views
NullpointException when reading file with RowCsvInputFormat in flink
I am a beginner on Flink streaming.
When reading a file with RowCsvInputFormat, the code that Kryo serializer creates Row does not work properly.
The code is below.
val readLocalCsvFile = new ...
0
votes
2
answers
273
views
Is it possible to read a file using SparkSession object of Scala language on Windows?
I've been trying to read from a .csv file on many ways, utilizing SparkContext object. I found it possible through scala.io.Source.fromFile function, but I want to use spark object. Everytime I run ...
0
votes
2
answers
309
views
Better way to create scala array from lines of a file
Considering a file with string on each lines, I want to create an array with each line of the file as an element of the array. I know I can do it like so:
import scala.io.Source
val path: String = &...
1
vote
0
answers
100
views
Java/Scala Create test Directory for testing write failures
I am unit testing I/O code roughly of the following form. I want to test that the fallback logic works if the write fails initially. How can I create a file path such that it will fail the initial ...
1
vote
0
answers
191
views
Running Fat jar and getting path not found exception in SBT
I have a scala project which is referring to a file created in the src/main/resources/MyFile.csv.
I created a fat jar and following is my build.sbt
name := "my-project"
version := "0.1&...
1
vote
1
answer
3k
views
How to read a file via sftp in scala
I am looking for a simple way to read a file (and maybe a directory) via sftp protocol in scala.
My tries:
I've looked at the Alpakka library, which is a part of akka.
But this works with streams, ...
0
votes
1
answer
494
views
Scala - How to merge incremental files of HDFS location
My requirement is that I've a multiple HDFS location which ingest files from Kafka every hour. So for each directory how to merge all the files of a particular timestamp to current timestamp as a ...