Questions tagged [imputation]
Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).
imputation
959
questions
0
votes
0
answers
26
views
Is there a way to modelize a partial predictor in a classification problem with an unbalanced target?
I would like to share with you a classification issue I faced during the modelling process. I have to create a model for an unbalanced binary target by 4 predictors where one of them has 45% of wrong ...
0
votes
0
answers
19
views
My IV summary in R reports as NA after imputing with mice and matching with Amelia
After imputing and matching, my IV of interest returns NAs.
I have a dataset that is mostly complete but for a couple of variables - coord1D and cinc. I used the following code to create my ...
0
votes
0
answers
25
views
Python sklearn Iterative Imputer - How to impute with mixed numerical and categorical features and keep the format of categorical columns intact?
Say we've got a dataframe with a mixture of categorical and numerical features which will be used for binary classification with missing values.
import pandas as pd
import numpy as np
from sklearn....
0
votes
1
answer
44
views
Interpolate zero values only if one zero and surrounding values are bigger than zero
I want to interpolate zero values in a time series dataframe but only if: 1) there is only one missing value so subsequent and proceeding values are non-zero, 2) the surrounding non-zero values are ...
0
votes
0
answers
34
views
Using fine-gray regression on mids object created with mice()
I am trying to fit a Fine-Gray regression model on a multiple imputed dataset created with mice() and was wondering how to do it with the finegray() function.
I used code found in cant get crr() Fine-...
0
votes
0
answers
16
views
Differences Between IterativeImputer with RandomForestRegressor and the MissForest Imputer
If I use IterativeImputer with the estimator "RandomForestRegressor()" and, on the other side, MissForest Imputer, what is the difference ?
Iterative imputer will use tree-based methods to ...
1
vote
0
answers
28
views
After using ga.lasso from the miselect package how do I pool results?
ive been running multiple imputation on a dataset using the mice package creating 5 mids objects. Using those objects ive performed variable selection using the cv.galasso function from the miselect ...
1
vote
0
answers
36
views
How do use multiple imputation only for intermittent missing values?
I have a dataset with time-ordered variables where I distinguish between a continuous series of missing values including the final value (monotone missing) and missing values where at least one non-...
0
votes
0
answers
18
views
Error with parallelize='variables' using "missForest" in R
I've started using missForest to potentially replace rfImpute and while doing some testing with both synthetic and real data and the different flavours of parallelization strategies offered by ...
0
votes
0
answers
21
views
Pooling Levene’s Test in R: Why is D1 method not working? [duplicate]
I want to perform a Levene's Test on multiply imputed datasets (m=5) using the pool_leventest function in R.
First, I followed the example code to understand the procedure:
imp_data <- mice(...
0
votes
0
answers
18
views
Imputation Strategy on Boston Housing Dataset Delivers Same Results
I'm following some tutorials on doing data engineering and feature engineering using boston dataset sample and here is an example where I'm trying the different impute strategy with cross validation ...
0
votes
0
answers
14
views
How do I solve module 'numpy' has no attribute 'float'. Error while using MICE?
Here's my code
1
2
Here's some data
3 - data sample
Here's knn imputation result
4 - knn imputation
Hi, I'm a beginner in machine learning. While filling in the missing values in the data using MICE, ...
0
votes
0
answers
39
views
How to create my own custom imputter to input constant values seamlessly in pyspark.ml pipelines
I would like to optimize the imputation of missing values on my dataset through a CV search. This is trivial to do in sklearn, with which I am familiar -- however, I am for the first time working with ...
1
vote
1
answer
52
views
Number of observations changing significantly after imputing using mice() in R
My data has a significant number of missing values, so I can't use the na.omit() default in order to conduct downstream analysis on my dataset, as this removes the whole row if there is even one value ...
0
votes
1
answer
34
views
Using a for loop to run multiple imputation in R
I suspect that there are parallels with other questions but I haven't been able to find a combination which works in this situation.
In essence I am trying to use a for loop to do multiple imputation (...