All Questions
28
questions
0
votes
1
answer
30
views
Use Jedis `echo` in pipeline
The examples use Scala code, but the issue would be the same with Java.
Way back in version 2 of Jedis, you could use echo in a pipeline:
import redis.clients.jedis._
object Main {
def main(args: ...
0
votes
0
answers
146
views
Spark-MongoDB Connector Aggregation Pipeline error: not found: value Document
I am trying to create an aggregation pipeline:
val rdd = MongoSpark.load(sc)
val aggregatedRdd = rdd.withPipeline(Seq(Document.parse("[{$project: {Country: 1, Region: 1, 'Unit Price': 1, 'Unit ...
0
votes
1
answer
154
views
Training/Test data with SparkML in Scala
I've been facing with an issue for the past couple of hours.
In theory, when we split data for training and testing, we should standardize the data for training independently, so as not to introduce ...
0
votes
0
answers
41
views
Getting Task not Serializable error on trying to create Decision Tree using Spark with Scala while executing in my local machine
I am trying to create Fraud Transaction Detector using spark with scala. My code works fine with normal Spark logic. However when I try the solution using decision tree approach I get task not ...
1
vote
1
answer
194
views
Training of Kmeans algorithm failed on Spark
I have created a pipeline and tried to train Kmean clustering algorithm in spark but it fails and I am unable to find what exact error is. Here is code
import org.apache.spark.ml.Pipeline
import org....
1
vote
2
answers
1k
views
Spark Error: java.io.NotSerializableException: scala.runtime.LazyRef
I am new to spark, can you please help in this?
The below simple pipeline to do a logistic regression produces an exception:
The Code:
package pipeline.tutorial.com
import org.apache.log4j.Level
...
1
vote
0
answers
418
views
Importing Pyspark PipelineModel with custom transformers into Scala
I recently created a pyspark PipelineModel with a few custom transformers to generate features not doable with the native Spark transformers. Here's an example of one of my transformers. It takes an ...
0
votes
1
answer
32
views
about model training results from Spark Scala ML API
I'm new to the spark scala ML package.
After assembling a pipeline and fit some regression model to training dataset (using the command: val model = pipeline.fit(training)), how can I check/print out ...
6
votes
1
answer
2k
views
Initializing Apache Beam Test Pipeline in Scala fails
When I try to run a test pipeline it raise an error
here is the source code to create the test pipeline:
val p: TestPipeline = TestPipeline.create()
and here is the error :
java.lang....
1
vote
1
answer
382
views
Scala Passing Sequence of Functions as Argument Type
In order to pipeline a variety of data transformation functions I want to iterate through a sequence of functions and apply each to the initial input. For a single input it would be something like ...
1
vote
1
answer
74
views
Debug a custom Pipeline Transformer in Flink
I am trying to implement a custom Transformer in Flink following indications in its documentation but when I try to executed it seems the fit operation is never being called. Here it is what I've ...
0
votes
1
answer
2k
views
java.lang.NoSuchMethodException: <Class>.<init>(java.lang.String) when copying custom Transformer
Currently playing with custom tranformers in my spark-shell using both spark 2.0.1 and 2.2.1.
While writing a custom ml transformer, in order to add it to a pipeline, I noticed that there is an issue ...
0
votes
1
answer
1k
views
Where is the withPipeline function in MongoDB Spark connector
I am trying to load some data from MongoDB into Spark. I have defined a ReadConfig to specify the database and collection. I want also to apply a filter, to avoid loding all the collection. I am ...
6
votes
2
answers
5k
views
How to use spark quantilediscretizer on multiple columns
All,
I have a ml pipeline setup as below
import org.apache.spark.ml.feature.QuantileDiscretizer
import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
import org.apache.spark.ml....
1
vote
1
answer
730
views
Spark ml pipeline that works in 1.6 doesn't in 2.0. Type mismatch error
All,
I have the following code that works in Spark 1.6.
import org.apache.spark.ml.feature.{ChiSqSelectorModel,QuantileDiscretizer,VectorAssembler,ChiSqSelector}
import org.apache.spark.sql.types.{...