Optimise overload consistency checks for large numbers of literal-containing overloads#21218
Open
pelson wants to merge 3 commits intopython:masterfrom
Open
Optimise overload consistency checks for large numbers of literal-containing overloads#21218pelson wants to merge 3 commits intopython:masterfrom
pelson wants to merge 3 commits intopython:masterfrom
Conversation
…n^2) consistency check. Add a fast-path at the top of the inner loop.
…ecking by preparing the list of valid items
bbac93f to
8125e31
Compare
This comment has been minimized.
This comment has been minimized.
…ents build_literal_fingerprint included optional arguments (ARG_OPT / ARG_NAMED_OPT) in the fingerprint used to detect disjoint overloads. When two overloads differed only in an optional Literal arg (e.g. as_index: Literal[True] vs Literal[False]), the fast-path treated them as mutually exclusive and skipped the full overlap check - even though a caller can omit the argument entirely, matching both overloads through their other arguments. Fix by only fingerprinting required arguments (ARG_POS and ARG_NAMED). Optional arguments can always be omitted by the caller, so they can never prove two overloads are disjoint. The real-world symptom was pandas-stubs gaining spurious unused-ignore errors: overloads that master correctly flagged as overlapping were silently skipped by the fast-path, making existing type: ignore redundant.
Author
This was a genuine issue - there was no existing test coverage of the case where optional arguments overlapped (e.g. Note that the test that I added to cover this is also highlighting a problem with the |
Contributor
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10004
Introduce an optimisation to reduce the number of calls to (the expensive)
overload_can_never_matchwithin the inner (O(N^2)) loop ofcheck_overlapping_overloadsin the case where the overloads contain literals.The approach is to track all exclusively literal argument types, and to be able to quickly determine if two signatures can't possibly overlap.
For example, given:
We can know by looking at the literals alone that there are no overlapping overloads, so the expensive pairwise check is skipped except for the last override. When the optimisation doesn't apply, we fall back to the existing (and complete) logic.
Please note that I have used Claude to help me find the key parts to change, and to help with the test validation. That said, this was a significant effort on my part to produce something that was tightly scoped and consistent with the approach I saw in
checker.py.In terms of performance:
Script to compute the runtime
Here is the script I used to compute the above runtime (fully Claude generated)
I had also prototyped an algorithmic simplification to reduce the
O(N^2)pairwise search in the case where there are many literals of the same type. This did have a reduction in the runtime, but it was not sufficiently large to justify the additional complexity, and I consider the implementation here to strike a good balance between the two.