Skip to content

cristinelpopescu/Resampling-strategies-for-imbalanced-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Index

  • Imbalanced datasets
  • The metric trap
  • Confusion matrix
  • Resampling
  • Random under-sampling
  • Random over-sampling
  • Python imbalanced-learn module
  • Random under-sampling and over-sampling with imbalanced-learn
  • Under-sampling: Tomek links
  • Under-sampling: Cluster Centroids
  • Over-sampling: SMOTE
  • Over-sampling followed by under-sampling
  • Recommended reading

Imbalanced datasets

https://www.kaggle.com/code/rafjaa/resampling-strategies-for-imbalanced-datasets/notebook

In this kernel we will know some techniques to handle highly unbalanced datasets, with a focus on resampling. The Porto Seguro's Safe Driver Prediction competition, used in this kernel, is a classic problem of unbalanced classes, since insurance claims can be considered unusual cases when considering all clients. Other classic examples of unbalanced classes are the detection of financial fraud and attacks on computer networks.

***This notebook is created by RAFAEL ALENCAR during the Porto Seguro’s Safe Driver Prediction competitions. This copy is created only for educational purpose on how we can deal with imbalanced datasets in machine learning