-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Warn and ignore NaN values in violinplot #30932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6fd59a8 to
73d4a38
Compare
73d4a38 to
2144f11
Compare
|
Reconsidering: Do we need to warn at all? This is a bit of a philosophic question: Is NaN considered just a placeholder for "no value"? In that case it is reasonably to silently drop NaNs. Or is NaN marking "undefined value" in which case a warning would be reasonable. Since |
|
My preference would also be to not warn and instead skip NaN values, like we do for hist. Sorry @codingabhiroop, I think this should get discussed more in the original issue before we commit to a solution |
|
Thanks for the clarification — that makes sense. Happy to hold off on further changes until there’s agreement in the original issue, and I’m also happy to update this PR like removing the warning entirely once a direction is decided. |
|
I'll decide here: silently dropping NaNs is the way to go. |
lib/matplotlib/cbook.py
Outdated
| raise ValueError("List of violinplot statistics and quantiles values" | ||
| " must have the same length") | ||
|
|
||
| has_nan = any(np.isnan(np.asarray(x)).any() for x in X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| has_nan = any(np.isnan(np.asarray(x)).any() for x in X) | |
| has_nan = any(np.isnan(x).any() for x in X) |
I suspect asarray() does not have any benefit here as isnan() accepts array-like.
|
|
||
|
|
||
| def test_violinplot_nan_values_warn_and_plot(): | ||
| data = [[0, 1, 2, 3], [1, np.nan, 2, 3]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also test the case of a single 1D input data = [1, np.nan, 2, 3]
|
|
||
| # If all values are NaN, skip this violin | ||
| if x.size == 0: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would swallow the vpstats.append() at the end, which means you loose all-nan data columns. Instead, we must create a stats entry with nan for all values.
PR summary
violinplotcurrently fails silently when a dataset contains NaN values, asidefrom a low-level NumPy warning, which can be confusing for users.
This PR filters NaN values in
cbook.violin_stats, emits aRuntimeWarning,and allows violins with valid data to be plotted. Datasets consisting entirely
of NaNs are skipped.
This behavior is consistent with other plotting functions such as
hist.CI Note: Failing tests (WebAgg, GTK/Tk/Qt backends, Windows_py312) are unrelated to this PR.
This PR only touches core plotting logic and
test_axes.py. Failures are due to headless/ARM/Windows CI environments.PR checklist