Newest 'scala+hadoop' Questions

0 votes

0 answers

12 views

"Hadoop archive -archiveName directoryname.har -p /source/hdfs/path /destination/hdfs/path" doesn't work via spark-submit but working in spark-shell

I am trying to develop code in spark scala in intellij, created mvn package and doing spark-submit in cluster but it shows warn in yarn log saying "warn ioc.client: exception encountered while ...

Prateek Sharma

1

asked Feb 24 at 16:42

0 votes

0 answers

20 views

Spark Job Hold a While in "Sending RPC" Log

I have a Spark Job, When Running Job on a YARN Cluster (HDP 3.1), After A long Time (about 1hour) i get this Message on Trace Log and Job Nothing to Do, After That Job Create Executers and Running ...

Willywonka

1

asked Feb 6 at 9:38

0 votes

1 answer

94 views

How to delete key for all commits in HUDI Table (history)?

For a HUDI table the goal is to apply GDPR and delete a key of a table. I'm only able to delete data fror the latest commit of the table. How can I make sure the key is deleted for all commits on the ...

jensb

11

asked Jan 29 at 12:30

0 votes

0 answers

40 views

How to get the name of the file that was just written by a Spark Job?

I have this simple Hadoop application in Scala. I'm repartitioning and writing to 2 files. I however, need to know the file name that was just written. package com.scala.sparkscalaplayground import ...

Sujan

93

asked Jan 5 at 22:51

0 votes

0 answers

94 views

Unable to run scala test file due to installation problem with installation of java package

When I was running the Scala test code in IntelliJ, I was troubled by this error: Testing started ... Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 24/01/03 22:20:24 ...

Dhanush Babu

1

asked Jan 4 at 8:04

0 votes

1 answer

237 views

Spark-submit yarn-client mode hangs even though spark task completed (pyspark 3.4.1)

Recently set up a dockerized env along with CDP for submitting yarn-client mode spark jobs on a kereberized hadoop cluster and seeing inconsistent behavior with application lifecycle. Scenario: ...

StrangerThinks

248

asked Nov 20, 2023 at 21:54

0 votes

2 answers

117 views

Alternative to InMemoryFileIndex to list files in folder using spark scala

The task I would like to solve: I have a constant influx of files in a specific folder held on azure storage. I would like to periodically list the files in this folder in order to copy them to a ...

Tamás Godányi

105

asked Oct 25, 2023 at 15:21

-1 votes

1 answer

99 views

How to copy zip file from hdfs to sftp server

I have a zip file named - "FileName.zip" in hdfs location. I wanted to copy this zip file to sftp server. The zip folder structure is below (when downloaded to local)- FileName.zip - file....

Glarixon

59

asked Oct 24, 2023 at 13:55

1 vote

0 answers

207 views

NoSuchMethodError while trying to save parquet files on s3 bucket

I am trying to save data into s3 bucket using Scala code. I am always getting following error. How can I resolve this error? After changing jar version of hadoop-aws from 3.3.0 to 3.3.1 version, the ...

Omkar Gaikwad

69

asked Sep 26, 2023 at 9:08

0 votes

0 answers

126 views

To read orc file from GCS bucket

To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket val hadoopConf = new ...

Nitish N Banakar

149

asked Jun 9, 2023 at 3:58

4 votes

4 answers

1k views

How to resolve harmless "java.nio.file.NoSuchFileException: xxx/hadoop-client-api-3.3.4.jar" error in Spark when run `sbt run`?

I have a simple Spark application in Scala 2.12. My App find-retired-people-scala/project/build.properties sbt.version=1.8.2 find-retired-people-scala/src/main/scala/com/hongbomiao/FindRetiredPeople....

Hongbo Miao

48.6k

asked Apr 19, 2023 at 23:55

1 vote

1 answer

615 views

java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir while trying to save a table to Hive

I am trying to read a kafka stream and save it to Hive as a table. The consumer code is : import org.apache.spark.sql.{DataFrame, Dataset, SaveMode, SparkSession} import org.apache.spark.sql.functions....

sami JIMOUH

13

asked Mar 29, 2023 at 17:33

0 votes

1 answer

104 views

Submitting Multiple Jobs in Sequence

I'm having some trouble understanding how Spark allows for scheduling of jobs. I have a series of jobs I'd like to run in sequence. From what I've read, I can submit any number of jobs to spark-submit ...

maxwellray

121

asked Feb 25, 2023 at 6:45

0 votes

1 answer

1k views

Spark Shell on Kubernetes with Kerberos enabled Cluster

I have a hard time to get the spark shell (3.3.1) on kubernetes to work with kerberos. It works in cluster mode and client mode for submit. Here is what we did to get it to work: cluster mode (works ...

jonas.hartwig

897

asked Feb 24, 2023 at 7:14

1 vote

0 answers

150 views

Dropping external table in spark is dropping the location or data too

import org.apache.hadoop.fs.{Path,FileSystem} import org.apache.hadoop.conf.Configuration import org.apache.spark.sql.{SaveMode, SparkSession} import org.apache.spark.sql.functions.{current_date, ...

Neo

11

asked Jan 17, 2023 at 19:18

Collectives™ on Stack Overflow

All Questions

"Hadoop archive -archiveName directoryname.har -p /source/hdfs/path /destination/hdfs/path" doesn't work via spark-submit but working in spark-shell

Spark Job Hold a While in "Sending RPC" Log

How to delete key for all commits in HUDI Table (history)?

How to get the name of the file that was just written by a Spark Job?

Unable to run scala test file due to installation problem with installation of java package

Spark-submit yarn-client mode hangs even though spark task completed (pyspark 3.4.1)

Alternative to InMemoryFileIndex to list files in folder using spark scala

How to copy zip file from hdfs to sftp server

NoSuchMethodError while trying to save parquet files on s3 bucket

To read orc file from GCS bucket

How to resolve harmless "java.nio.file.NoSuchFileException: xxx/hadoop-client-api-3.3.4.jar" error in Spark when run `sbt run`?

java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir while trying to save a table to Hive

Submitting Multiple Jobs in Sequence

Spark Shell on Kubernetes with Kerberos enabled Cluster

Dropping external table in spark is dropping the location or data too

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags