Skip to main content

All Questions

Tagged with
0 votes
0 answers
12 views

"Hadoop archive -archiveName directoryname.har -p /source/hdfs/path /destination/hdfs/path" doesn't work via spark-submit but working in spark-shell

I am trying to develop code in spark scala in intellij, created mvn package and doing spark-submit in cluster but it shows warn in yarn log saying "warn ioc.client: exception encountered while ...
Prateek Sharma's user avatar
0 votes
0 answers
20 views

Spark Job Hold a While in "Sending RPC" Log

I have a Spark Job, When Running Job on a YARN Cluster (HDP 3.1), After A long Time (about 1hour) i get this Message on Trace Log and Job Nothing to Do, After That Job Create Executers and Running ...
Willywonka's user avatar
0 votes
1 answer
94 views

How to delete key for all commits in HUDI Table (history)?

For a HUDI table the goal is to apply GDPR and delete a key of a table. I'm only able to delete data fror the latest commit of the table. How can I make sure the key is deleted for all commits on the ...
jensb's user avatar
  • 11
0 votes
0 answers
40 views

How to get the name of the file that was just written by a Spark Job?

I have this simple Hadoop application in Scala. I'm repartitioning and writing to 2 files. I however, need to know the file name that was just written. package com.scala.sparkscalaplayground import ...
Sujan's user avatar
  • 93
0 votes
0 answers
94 views

Unable to run scala test file due to installation problem with installation of java package

When I was running the Scala test code in IntelliJ, I was troubled by this error: Testing started ... Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 24/01/03 22:20:24 ...
Dhanush Babu's user avatar
0 votes
1 answer
237 views

Spark-submit yarn-client mode hangs even though spark task completed (pyspark 3.4.1)

Recently set up a dockerized env along with CDP for submitting yarn-client mode spark jobs on a kereberized hadoop cluster and seeing inconsistent behavior with application lifecycle. Scenario: ...
StrangerThinks's user avatar
0 votes
2 answers
117 views

Alternative to InMemoryFileIndex to list files in folder using spark scala

The task I would like to solve: I have a constant influx of files in a specific folder held on azure storage. I would like to periodically list the files in this folder in order to copy them to a ...
Tamás Godányi's user avatar
-1 votes
1 answer
99 views

How to copy zip file from hdfs to sftp server

I have a zip file named - "FileName.zip" in hdfs location. I wanted to copy this zip file to sftp server. The zip folder structure is below (when downloaded to local)- FileName.zip - file....
Glarixon's user avatar
1 vote
0 answers
207 views

NoSuchMethodError while trying to save parquet files on s3 bucket

I am trying to save data into s3 bucket using Scala code. I am always getting following error. How can I resolve this error? After changing jar version of hadoop-aws from 3.3.0 to 3.3.1 version, the ...
Omkar  Gaikwad's user avatar
0 votes
0 answers
126 views

To read orc file from GCS bucket

To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket val hadoopConf = new ...
Nitish N Banakar's user avatar
4 votes
4 answers
1k views

How to resolve harmless "java.nio.file.NoSuchFileException: xxx/hadoop-client-api-3.3.4.jar" error in Spark when run `sbt run`?

I have a simple Spark application in Scala 2.12. My App find-retired-people-scala/project/build.properties sbt.version=1.8.2 find-retired-people-scala/src/main/scala/com/hongbomiao/FindRetiredPeople....
Hongbo Miao's user avatar
  • 48.6k
1 vote
1 answer
615 views

java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir while trying to save a table to Hive

I am trying to read a kafka stream and save it to Hive as a table. The consumer code is : import org.apache.spark.sql.{DataFrame, Dataset, SaveMode, SparkSession} import org.apache.spark.sql.functions....
sami JIMOUH's user avatar
0 votes
1 answer
104 views

Submitting Multiple Jobs in Sequence

I'm having some trouble understanding how Spark allows for scheduling of jobs. I have a series of jobs I'd like to run in sequence. From what I've read, I can submit any number of jobs to spark-submit ...
maxwellray's user avatar
0 votes
1 answer
1k views

Spark Shell on Kubernetes with Kerberos enabled Cluster

I have a hard time to get the spark shell (3.3.1) on kubernetes to work with kerberos. It works in cluster mode and client mode for submit. Here is what we did to get it to work: cluster mode (works ...
jonas.hartwig's user avatar
1 vote
0 answers
150 views

Dropping external table in spark is dropping the location or data too

import org.apache.hadoop.fs.{Path,FileSystem} import org.apache.hadoop.conf.Configuration import org.apache.spark.sql.{SaveMode, SparkSession} import org.apache.spark.sql.functions.{current_date, ...
Neo's user avatar
  • 11

15 30 50 per page
1
2 3 4 5
89