-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
FIX instability of _RidgeGCV
#33020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FIX instability of _RidgeGCV
#33020
Conversation
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked the code and the math yet, but the fact that the tests pass and the examples still look good is very promising.
I also ran some quick interactive timeit experiments and the performance seems good (including with torch/MPS) compared to main.
However, I found a performance regression for the linear regression panel of the examples/ensemble/plot_stack_predictors.py example:
in main, it runs in less than 0.01s: https://scikit-learn.org/dev/auto_examples/ensemble/plot_stack_predictors.html
Maybe this is because of a bad "auto" policy for that case?
Please also find a first pass of feedback below.
| @@ -0,0 +1,6 @@ | |||
| - The leave-one out errors :class:`linear_model.RidgeCV` are now numerically | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more than a numerical stability fix for the estimation of the LOO error values. It's also, and more importantly, an improvement in the estimation of the model parameters themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what is it then? If you suggest an enhancement, what is enhanced apart from the fix? (I ask for my own understanding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have said "a numerical stability fix in the estimation of the model parameters themselves" instead.
sklearn/linear_model/_ridge.py
Outdated
|
|
||
| # svd not implemented for sparse X, fallback to eigen | ||
| if gcv_mode == "svd" and sparse.issparse(X): | ||
| gcv_mode = "eigen" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather raise a ValueError here. Most users should stick to the auto mode anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually, passing gcv_mode="svd" explicitly in main works with sparse inputs but would no longer work in this branch.
So maybe we could instead raise a FutureWarning when gcv_mode="svd" is set explicitly by the user and fitting on sparse data. The warning message could now state that this now uses the "eigen" solver instead and that in the future this will raise ValueError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please extend the test test_ridge_cv_results_predictions to check that it works both for n_samples > n_features and n_features < n_samples with the auto/eigen solver?
EDIT: and similarly for test_ridge_cv_multioutput_sample_weight and test_ridge_cv_custom_multioutput_scorer.
| return "eigen" | ||
|
|
||
| # All other cases ("auto", "eigen", "svd" with sparse X) | ||
| # fallbacks to gram (n <= p) or cov (p < n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should raise a warning if the user explicitly asks for svd and fits on sparse data (see the previous discussion on the outdated code here: #33020 (comment)).
| @pytest.mark.parametrize("X_shape", [(100, 50), (50, 50), (50, 100)]) | ||
| @pytest.mark.parametrize("X_container", [np.asarray] + CSR_CONTAINERS) | ||
| def test_ridge_gcv_noiseless(alpha, gcv_mode, fit_intercept, X_shape, X_container): | ||
| # Ridge should recover LinearRegression in the noiseless case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Ridge should recover LinearRegression in the noiseless case | |
| # Ridge should recover LinearRegression in the noiseless case and | |
| # near-zero alpha. |
| gcv_ridge = RidgeCV(alphas=alphas, gcv_mode=gcv_mode, fit_intercept=fit_intercept) | ||
| gcv_ridge.fit(X, y) | ||
| assert_allclose(gcv_ridge.coef_, lin_reg.coef_, atol=1e-10) | ||
| assert_allclose(gcv_ridge.intercept_, lin_reg.intercept_, atol=1e-10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this test to expect a FutureWarning (and later a ValueError) for the gcv_mode="svd" / sparse X combinations.
Reference Issues/PRs
Fixes #32459 and supersedes #32506
What does this implement/fix? Explain your changes.
This fixes numerical errors in the small alpha regime by refactoring the solve/decompose methods:
looeand the primal coeffcientcoefinstead of thecandG_inverse_diag_compute_gramand_compute_covreuse the precomputedX_meanlooeandcoefis adapted to each method for best numerical stabilitycovto force use eigendecomposition of the covariance matrixgramto force use eigendecomposition of the gram matrixThe
gcv_modeoptions have been modified to always pick the cheaper optioneigennow switches to the cheaper option:gramif n < p andcovifp < nsvd, not implemented for sparse X, fallbacks toeigenwhen X is sparse.eigenalways picksgramandsvdalways fallbacks tocovfor sparse X (whatever the shape of X), which can explain the poor benchmark observed in FIX instability of gcv_mode="svd" in RidgeCV #32506.TODO
_RidgeGCVdocstring, especially math explanations