PinnedDavid B Rosen (PhD)inTowards Data ScienceHow To Deal With Imbalanced Classification, Without Re-balancing the DataBefore considering over-sampling your data, try simply tuning your classification decision threshold·9 min read·Aug 2, 2020--5--5
PinnedDavid B Rosen (PhD)inTowards Data ScienceHow to calculate confidence intervals for performance metrics using an automatic bootstrap methodAre your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test…·11 min read·Sep 8, 2021--6--6
PinnedDavid B Rosen (PhD)inTowards Data ScienceHow to Auto-Detect the Date/Datetime Columns & Set Their Datatype When Reading a CSV File in PandasWhen read_csv( ) reads “2021-03-04” and “2021-03-04 21:37:01.123” as mere “object” types, often you can auto-convert them all to datetimes.·5 min read·Aug 16, 2021--2--2
David B Rosen (PhD)Three more tips:1. Find and copy Unicode math symbols on Mathematical operators and symbols in Unicode - Wikipedia1 min read·Dec 16, 2023----
David B Rosen (PhD)R² is completely equivalent to MSE in that it is simply a (negatively) scaled and offset form of…1 min read·Aug 29, 2022----
David B Rosen (PhD)Good article!How To Deal With Imbalanced Classification, Without Re-balancing the Data (Before considering over-sampling your data, try simply tuning…1 min read·Aug 29, 2022----
David B Rosen (PhD)Good article!As an alternative to using a specific set of functional forms for the relationships, we could use the ROC area of each feature as a "score"…1 min read·Jun 7, 2022----
David B Rosen (PhD)Good article -- I was not aware of how flawed the mean decrease in feature impurity method could be!One pitfall of permutation feature importance could be if there are some important subsets of the features that are highly-correlated or…1 min read·Jun 7, 2022----
David B Rosen (PhD)The cost to the business of false positives vs false negatives is unlikely to be equal, so you need…Also it is incorrect to apply ordinary cross-validation to an already-oversampled training set (even oversampled by adasyn or smote)…1 min read·Jun 3, 2022----
David B Rosen (PhD)You mention the average squared error -- how is this defined if it is different from mean squared…Also it is worth mentioning that R-squared (coeff. of determination) is a rescaled version of MSE such that 100% is perfection and 0%…1 min read·Nov 1, 2021----