Good article!

As an alternative to using a specific set of functional forms for the relationships, we could use the ROC area of each feature as a "score" predicting (positively or negatively) each other feature. This is a type of nonparametric ordinal correlation. It is very effective at finding which features are related to others as a monotonic (increasing or decreasing) but otherwise-arbitrary function.

You mention the average squared error -- how is this defined if it is different from mean squared error?

Also it is worth mentioning that R-squared (coeff. of determination) is a rescaled version of MSE such that 100% is perfection and 0% implies the same MSE that you would get by simply always predicting the overall mean of the dataset. So it has an interpretability advantage over MSE because when it is positive it represents the % reduction in MSE compared with the naive baseline of predict-the-mean, but they are both equivalent in the sense that which ever model has the best MSE also has the best R-squared and vice-versa. When R-squared is negative this implies that your model is *worse* than a model that merely predicts the mean, which isn't necessarily obvious from looking at the MSE itself.