Skip to main content

All Questions

Tagged with
0 votes
1 answer
30 views

Use Jedis `echo` in pipeline

The examples use Scala code, but the issue would be the same with Java. Way back in version 2 of Jedis, you could use echo in a pipeline: import redis.clients.jedis._ object Main { def main(args: ...
Amaterasu's user avatar
  • 341
0 votes
0 answers
146 views

Spark-MongoDB Connector Aggregation Pipeline error: not found: value Document

I am trying to create an aggregation pipeline: val rdd = MongoSpark.load(sc) val aggregatedRdd = rdd.withPipeline(Seq(Document.parse("[{$project: {Country: 1, Region: 1, 'Unit Price': 1, 'Unit ...
Antonis Pervanas's user avatar
0 votes
1 answer
154 views

Training/Test data with SparkML in Scala

I've been facing with an issue for the past couple of hours. In theory, when we split data for training and testing, we should standardize the data for training independently, so as not to introduce ...
Aron Latis's user avatar
0 votes
0 answers
41 views

Getting Task not Serializable error on trying to create Decision Tree using Spark with Scala while executing in my local machine

I am trying to create Fraud Transaction Detector using spark with scala. My code works fine with normal Spark logic. However when I try the solution using decision tree approach I get task not ...
vinit ratan's user avatar
1 vote
1 answer
194 views

Training of Kmeans algorithm failed on Spark

I have created a pipeline and tried to train Kmean clustering algorithm in spark but it fails and I am unable to find what exact error is. Here is code import org.apache.spark.ml.Pipeline import org....
Asif's user avatar
  • 733
1 vote
2 answers
1k views

Spark Error: java.io.NotSerializableException: scala.runtime.LazyRef

I am new to spark, can you please help in this? The below simple pipeline to do a logistic regression produces an exception: The Code: package pipeline.tutorial.com import org.apache.log4j.Level ...
Moha's user avatar
  • 71
1 vote
0 answers
418 views

Importing Pyspark PipelineModel with custom transformers into Scala

I recently created a pyspark PipelineModel with a few custom transformers to generate features not doable with the native Spark transformers. Here's an example of one of my transformers. It takes an ...
Octoflague's user avatar
0 votes
1 answer
32 views

about model training results from Spark Scala ML API

I'm new to the spark scala ML package. After assembling a pipeline and fit some regression model to training dataset (using the command: val model = pipeline.fit(training)), how can I check/print out ...
hub's user avatar
  • 1
6 votes
1 answer
2k views

Initializing Apache Beam Test Pipeline in Scala fails

When I try to run a test pipeline it raise an error here is the source code to create the test pipeline: val p: TestPipeline = TestPipeline.create() and here is the error : java.lang....
Saeed Mohtasham's user avatar
1 vote
1 answer
382 views

Scala Passing Sequence of Functions as Argument Type

In order to pipeline a variety of data transformation functions I want to iterate through a sequence of functions and apply each to the initial input. For a single input it would be something like ...
Layman's user avatar
  • 938
1 vote
1 answer
74 views

Debug a custom Pipeline Transformer in Flink

I am trying to implement a custom Transformer in Flink following indications in its documentation but when I try to executed it seems the fit operation is never being called. Here it is what I've ...
Alejandro Alcalde's user avatar
0 votes
1 answer
2k views

java.lang.NoSuchMethodException: <Class>.<init>(java.lang.String) when copying custom Transformer

Currently playing with custom tranformers in my spark-shell using both spark 2.0.1 and 2.2.1. While writing a custom ml transformer, in order to add it to a pipeline, I noticed that there is an issue ...
y-_-t's user avatar
  • 115
0 votes
1 answer
1k views

Where is the withPipeline function in MongoDB Spark connector

I am trying to load some data from MongoDB into Spark. I have defined a ReadConfig to specify the database and collection. I want also to apply a filter, to avoid loding all the collection. I am ...
yashar's user avatar
  • 71
6 votes
2 answers
5k views

How to use spark quantilediscretizer on multiple columns

All, I have a ml pipeline setup as below import org.apache.spark.ml.feature.QuantileDiscretizer import org.apache.spark.sql.types.{StructType,StructField,DoubleType} import org.apache.spark.ml....
sramalingam24's user avatar
1 vote
1 answer
730 views

Spark ml pipeline that works in 1.6 doesn't in 2.0. Type mismatch error

All, I have the following code that works in Spark 1.6. import org.apache.spark.ml.feature.{ChiSqSelectorModel,QuantileDiscretizer,VectorAssembler,ChiSqSelector} import org.apache.spark.sql.types.{...
sramalingam24's user avatar

15 30 50 per page