You mention the average squared error -- how is this defined if it is different from mean squared error?

Also it is worth mentioning that R-squared (coeff. of determination) is a rescaled version of MSE such that 100% is perfection and 0% implies the same MSE that you would get by simply always predicting the overall mean of the dataset. So it has an interpretability advantage over MSE because when it is positive it represents the % reduction in MSE compared with the naive baseline of predict-the-mean, but they are both equivalent in the sense that which ever model has the best MSE also has the best R-squared and vice-versa. When R-squared is negative this implies that your model is *worse* than a model that merely predicts the mean, which isn't necessarily obvious from looking at the MSE itself.