Skip to content

Conversation

@antoinebaker
Copy link
Contributor

@antoinebaker antoinebaker commented Jan 7, 2026

Reference Issues/PRs

Fixes #32459 and supersedes #32506

What does this implement/fix? Explain your changes.

This fixes numerical errors in the small alpha regime by refactoring the solve/decompose methods:

  • solve now returns the leave-one-out errors looe and the primal coeffcient coef instead of the c and G_inverse_diag
  • the helper _compute_gram and _compute_cov reuse the precomputed X_mean
  • estimation of looe and coef is adapted to each method for best numerical stability
  • new option cov to force use eigendecomposition of the covariance matrix
  • new option gram to force use eigendecomposition of the gram matrix

The gcv_mode options have been modified to always pick the cheaper option

  • option eigen now switches to the cheaper option: gram if n < p and cov if p < n
  • option svd, not implemented for sparse X, fallbacks to eigen when X is sparse.
  • in main, eigen always picks gramand svd always fallbacks to cov for sparse X (whatever the shape of X), which can explain the poor benchmark observed in FIX instability of gcv_mode="svd" in RidgeCV #32506.

TODO

  • make it array api compliant
  • benchmark memory/time compared to main
  • update _RidgeGCV docstring, especially math explanations

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the code and the math yet, but the fact that the tests pass and the examples still look good is very promising.

I also ran some quick interactive timeit experiments and the performance seems good (including with torch/MPS) compared to main.

However, I found a performance regression for the linear regression panel of the examples/ensemble/plot_stack_predictors.py example:

image

in main, it runs in less than 0.01s: https://scikit-learn.org/dev/auto_examples/ensemble/plot_stack_predictors.html

Maybe this is because of a bad "auto" policy for that case?

Please also find a first pass of feedback below.

@@ -0,0 +1,6 @@
- The leave-one out errors :class:`linear_model.RidgeCV` are now numerically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more than a numerical stability fix for the estimation of the LOO error values. It's also, and more importantly, an improvement in the estimation of the model parameters themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what is it then? If you suggest an enhancement, what is enhanced apart from the fix? (I ask for my own understanding)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have said "a numerical stability fix in the estimation of the model parameters themselves" instead.


# svd not implemented for sparse X, fallback to eigen
if gcv_mode == "svd" and sparse.issparse(X):
gcv_mode = "eigen"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather raise a ValueError here. Most users should stick to the auto mode anyway.

Copy link
Member

@ogrisel ogrisel Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah actually, passing gcv_mode="svd" explicitly in main works with sparse inputs but would no longer work in this branch.

So maybe we could instead raise a FutureWarning when gcv_mode="svd" is set explicitly by the user and fitting on sparse data. The warning message could now state that this now uses the "eigen" solver instead and that in the future this will raise ValueError.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extend the test test_ridge_cv_results_predictions to check that it works both for n_samples > n_features and n_features < n_samples with the auto/eigen solver?

EDIT: and similarly for test_ridge_cv_multioutput_sample_weight and test_ridge_cv_custom_multioutput_scorer.

return "eigen"

# All other cases ("auto", "eigen", "svd" with sparse X)
# fallbacks to gram (n <= p) or cov (p < n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise a warning if the user explicitly asks for svd and fits on sparse data (see the previous discussion on the outdated code here: #33020 (comment)).

@pytest.mark.parametrize("X_shape", [(100, 50), (50, 50), (50, 100)])
@pytest.mark.parametrize("X_container", [np.asarray] + CSR_CONTAINERS)
def test_ridge_gcv_noiseless(alpha, gcv_mode, fit_intercept, X_shape, X_container):
# Ridge should recover LinearRegression in the noiseless case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Ridge should recover LinearRegression in the noiseless case
# Ridge should recover LinearRegression in the noiseless case and
# near-zero alpha.

gcv_ridge = RidgeCV(alphas=alphas, gcv_mode=gcv_mode, fit_intercept=fit_intercept)
gcv_ridge.fit(X, y)
assert_allclose(gcv_ridge.coef_, lin_reg.coef_, atol=1e-10)
assert_allclose(gcv_ridge.intercept_, lin_reg.intercept_, atol=1e-10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this test to expect a FutureWarning (and later a ValueError) for the gcv_mode="svd" / sparse X combinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG RidgeCV with gcv_mode="svd" is unstable when alpha is very small

3 participants