Bozhong LiuThree effective ways to deal with domain gap and imbalanced data in multi-class classificationHow to deal with domain gap and imbalanced data in a large dataset.May 28, 2021May 28, 2021
InHeartbeatbyKris OgrabekCost-Sensitive Learning for Imbalanced DataTraining good models that deal with imbalanced data is challenging. Cost-sensitive training is one way of doing it.Oct 8, 2021Oct 8, 2021
Mridul KattaHandling Imbalanced DatasetImbalanced datasets are normal in real life. This article talks about a few good strategies to create & evaluate the balance in the…Aug 10, 2021Aug 10, 2021
InTDS ArchivebySusan Currie Sivek, Ph.D.Balancing Act: Classification with Imbalanced DataHelp your data find balance to make sure you create a quality classification modelNov 4, 2021Nov 4, 2021
InTDS ArchivebySamuele MazzantiYou Can Compute ROC Curve Also for Regression ModelsThey probably told you that Area under ROC curve cannot be computed for a continuous target variable. Well, they were wrong. Here is how to…Sep 16, 20215Sep 16, 20215
InTDS ArchivebySamuele MazzantiYour Dataset Is Imbalanced? Do Nothing!Class imbalance is not a problem. Debunking one of the most widespread misconceptions in the ML community.Aug 24, 202229Aug 24, 202229
InTDS ArchivebySamuele MazzantiData Scientists Need to Know Just One Statistical TestAfter you read this, you will be able to test any possible statistical hypothesis. With a unique algorithm.Jun 30, 202221Jun 30, 202221
InTDS ArchivebyScott LundbergInterpretable Machine Learning with XGBoostThis is a story about the danger of interpreting your machine learning model incorrectly, and the value of interpreting it correctly…Apr 17, 201849Apr 17, 201849
InTDS ArchivebySteven DyeHow to Handle SMOTE Data in Imbalanced Classification ProblemsKnow where the pitfalls are and how to avoid themMay 2, 2020May 2, 2020
InTDS ArchivebyBecca RImbalanced Class Sizes and Classification Models: A Cautionary Tale Part 2Recently, I wrote this post about imbalanced class sizes in classification models might lead to overestimation of a classification model’s…Mar 27, 20195Mar 27, 20195
InTDS ArchivebyKanellis GeorgiosProper Balancing for Cross ValidationWho hasn’t come across the need of applying the cross validation technique, while the dataset in hand is imbalanced, in regards to the…Oct 11, 20192Oct 11, 20192
InTDS ArchivebyKSV MuralidharThe right way of using SMOTE with Cross-validationThis article discusses the right way to use SMOTE to avoid inaccurate evaluation metrics while using cross-validation.Mar 29, 20219Mar 29, 20219
InAnalytics VidhyabyAyobami AkiodeHow to carry out k-fold cross-validation on an imbalanced classification problemAn imbalance classification has its OWN rules. Know them, else you violate their rights.Jul 10, 20202Jul 10, 20202
David B Rosen (PhD)Good article!For classification evaluation metrics, I think the first and foremost distinction that must be drawn among them is those that evaluate a…Jun 28, 2021Jun 28, 2021
David B Rosen (PhD)There's an easier way to vectorize your example in pandas in just a single assignment of a single…Or alternatively you could nest two np.where() in a single assignment as follows, which is about twice as fast as above (about 1300 times…Aug 24, 20211Aug 24, 20211
InLumiatabyLumiataCross-Validation for Imbalanced DatasetsAnusha Mohan, Data ScientistMar 5, 20198Mar 5, 20198
David B Rosen (PhD)To @Panos V. ’s question, “ How to implement this with a GridSearchCV though?”:I’ve done it this way:Jan 24, 2020Jan 24, 2020
InTDS ArchivebyJonathan GrandperrinHow to use confidence scores in machine learning modelsAs humans, machine learning models sometimes make mistakes when predicting a value from an input data point. But also like humans, most…Jan 19, 20216Jan 19, 20216
David B Rosen (PhD)R² (R-squared) is a very useful transformation of MSE which simply scales and offsets it relative…Jun 28, 2021Jun 28, 2021
InTDS ArchivebyJoão Paulo FigueiraStratified Splitting of Grouped Datasets Using OptimizationThis article explains how to perform a stratified split of a grouped dataset into train and validation sets.Feb 23, 2021Feb 23, 2021