-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[update_lib] show deps #6821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[update_lib] show deps #6821
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR introduces a comprehensive library dependency checking system via a new GitHub Actions workflow and supporting Python tooling. It adds a Changes
Sequence DiagramsequenceDiagram
participant GH as GitHub Actions
participant WF as Workflow
participant Git as Git
participant Py as Python Script
participant CP as CPython Repo
participant PR as PR Comments API
GH->>WF: Trigger on PR to Lib/
WF->>Git: Checkout base + fetch PR head
WF->>CP: Clone CPython v3.14.2
WF->>Py: Detect changed files in Lib/
Py-->>WF: Return module list
alt Modules Detected
WF->>Py: Run dependency check
Py->>CP: Analyze imports & dependencies
Py-->>WF: Return deps_output
WF->>PR: Post sticky comment with details
else No Changes
WF->>PR: Delete existing comment
end
PR-->>GH: Comment posted/removed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin show-deps |
7bc4ef4 to
a9d231f
Compare
|
probably we need the opposite graph. finding the bottom-most library and solve one by one from there. Then everything could be updated smoothly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @.github/workflows/lib-deps-check.yaml:
- Around line 67-85: The workflow step "Run deps check" currently expands
untrusted module names directly in the shell via the command starting with
"python scripts/update_lib deps ${{ steps.changed-files.outputs.modules }}
--depth 2", which allows command injection; change this to stop interpolating
the raw variable and instead stream the modules through a safe tool (e.g., use
printf or echo to produce one module per line and pipe into xargs) so each
module is passed as a positional argument and include a "--" marker before
module names to prevent option injection, invoking "python scripts/update_lib
deps -- <module> --depth 2" for each module; keep the rest of the step (writing
/tmp/deps_output.txt and populating GITHUB_OUTPUT and has_output) unchanged.
- Around line 35-60: The duplicate-module check uses a regex substring test in
the condition 'if [[ -n "$module" && ! " $modules " =~ " $module " ]]; then'
which misdetects modules like "os" when "posix" exists; replace this with
exact-match deduplication by tracking seen modules with an associative array
(e.g., declare -A seen) inside the same loop (for file in $changed) and only
append when the module key is not present (check via [[ -z "${seen[$module]}"
]]); after appending set seen[$module]=1 and keep producing the trimmed $modules
and GITHUB_OUTPUT as before.
In `@scripts/update_lib/deps.py`:
- Around line 12-13: Remove the unused safe_read_text import from the import
statement in update_lib/deps.py: update the line that imports from
update_lib.io_utils so it only imports read_python_files and safe_parse_ast,
eliminating safe_read_text to satisfy lint rule F401 and avoid unused-import
warnings.
- Around line 333-369: The directory comparison in is_up_to_date uses
filecmp.dircmp but only inspects the top-level diff_files/left_only/right_only,
missing nested differences; add a recursive check (e.g., a helper like
_dircmp_has_diff) that walks dcmp.subdirs (dcmp.subdirs.values()) and returns
True if any dcmp.diff_files, dcmp.left_only, dcmp.right_only is non-empty at any
level, and use that helper in is_up_to_date when handling the directory branch
so nested subpackage mismatches are detected; reference symbols: is_up_to_date,
get_lib_paths, filecmp.dircmp, and dcmp.subdirs.
In `@scripts/update_lib/tests/test_deps.py`:
- Around line 10-14: The new assertions/logic added to the test file violate the
rule that test files under **/*test*.py may only change expectedFailure
annotations; revert the behavioral changes in tests like get_soft_deps,
get_test_dependencies, get_test_paths, parse_lib_imports, and parse_test_imports
and either (a) move any new validation logic into a non-test helper/module
(e.g., scripts/update_lib/helpers.py) and import it from the tests, or (b) if
the test must change behavior because of a known failing condition, mark the
test with `@pytest.mark.xfail` (or expectedFailure) instead of changing
assertions; ensure only annotation changes remain in the test file.
🧹 Nitpick comments (4)
scripts/update_lib/__main__.py (1)
52-56: Consider updating the usage docstring to includedeps.The top-level usage block doesn’t mention the new subcommand, which can confuse CLI users.
.github/workflows/lib-deps-check.yaml (1)
20-33: Consider pinning external actions and CPython clone to commit SHAs for supply-chain hardening.The external actions at lines 21 and 63 (actions/checkout@v6.0.1 and actions/setup-python@v6.1.0) and the CPython clone at line 33 (v3.14.2) use floating tags rather than commit SHAs. While these versions are valid and current, GitHub Actions security best practices recommend pinning to full-length commit SHAs and including the tag in a comment for human traceability (e.g.,
uses: actions/checkout@<sha> # v6.0.1). This reduces supply-chain risk by preventing mutations to the pinned reference.scripts/update_lib/path.py (1)
112-144: Validatepreferto avoid silent fallbacks.A typo currently routes to the dir-first branch; consider rejecting unknown values up-front.
♻️ Proposed guard
def resolve_module_path( name: str, prefix: str = "cpython", prefer: str = "file" ) -> pathlib.Path: @@ - file_path = pathlib.Path(f"{prefix}/Lib/{name}.py") + if prefer not in {"file", "dir"}: + raise ValueError(f"prefer must be 'file' or 'dir' (got: {prefer})") + + file_path = pathlib.Path(f"{prefix}/Lib/{name}.py")scripts/update_lib/show_deps.py (1)
200-223: Avoid duplicate output whenallis combined with explicit names.Expansion can repeat modules; de-dupe while preserving order.
♻️ Proposed de-duplication
expanded_names = [] for name in names: if name == "all": expanded_names.extend(get_all_modules(cpython_prefix)) else: expanded_names.append(name) + # Preserve order but remove duplicates + expanded_names = list(dict.fromkeys(expanded_names))
| - name: Get changed Lib files | ||
| id: changed-files | ||
| run: | | ||
| # Get the list of changed files under Lib/ | ||
| changed=$(git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }} -- 'Lib/*.py' 'Lib/**/*.py' | head -50) | ||
| echo "Changed files:" | ||
| echo "$changed" | ||
| # Extract unique module names (top-level only, skip test/) | ||
| modules="" | ||
| for file in $changed; do | ||
| # Skip test files | ||
| if [[ "$file" == Lib/test/* ]]; then | ||
| continue | ||
| fi | ||
| # Extract module name: Lib/foo.py -> foo, Lib/foo/__init__.py -> foo | ||
| module=$(echo "$file" | sed -E 's|^Lib/||; s|/__init__\.py$||; s|\.py$||; s|/.*||') | ||
| if [[ -n "$module" && ! " $modules " =~ " $module " ]]; then | ||
| modules="$modules $module" | ||
| fi | ||
| done | ||
| modules=$(echo "$modules" | xargs) # trim whitespace | ||
| echo "Detected modules: $modules" | ||
| echo "modules=$modules" >> $GITHUB_OUTPUT | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate-module detection (substring match can skip modules).
Line 52 uses a regex substring check against a concatenated string, which can treat os as already present when posix is in the list. This can silently drop modules from the deps run.
🔧 Proposed fix (exact-match dedup)
- modules=""
+ modules=""
+ declare -A seen=()
@@
- if [[ -n "$module" && ! " $modules " =~ " $module " ]]; then
- modules="$modules $module"
- fi
+ if [[ -n "$module" && -z "${seen[$module]:-}" ]]; then
+ seen["$module"]=1
+ modules="$modules $module"
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Get changed Lib files | |
| id: changed-files | |
| run: | | |
| # Get the list of changed files under Lib/ | |
| changed=$(git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }} -- 'Lib/*.py' 'Lib/**/*.py' | head -50) | |
| echo "Changed files:" | |
| echo "$changed" | |
| # Extract unique module names (top-level only, skip test/) | |
| modules="" | |
| for file in $changed; do | |
| # Skip test files | |
| if [[ "$file" == Lib/test/* ]]; then | |
| continue | |
| fi | |
| # Extract module name: Lib/foo.py -> foo, Lib/foo/__init__.py -> foo | |
| module=$(echo "$file" | sed -E 's|^Lib/||; s|/__init__\.py$||; s|\.py$||; s|/.*||') | |
| if [[ -n "$module" && ! " $modules " =~ " $module " ]]; then | |
| modules="$modules $module" | |
| fi | |
| done | |
| modules=$(echo "$modules" | xargs) # trim whitespace | |
| echo "Detected modules: $modules" | |
| echo "modules=$modules" >> $GITHUB_OUTPUT | |
| - name: Get changed Lib files | |
| id: changed-files | |
| run: | | |
| # Get the list of changed files under Lib/ | |
| changed=$(git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }} -- 'Lib/*.py' 'Lib/**/*.py' | head -50) | |
| echo "Changed files:" | |
| echo "$changed" | |
| # Extract unique module names (top-level only, skip test/) | |
| modules="" | |
| declare -A seen=() | |
| for file in $changed; do | |
| # Skip test files | |
| if [[ "$file" == Lib/test/* ]]; then | |
| continue | |
| fi | |
| # Extract module name: Lib/foo.py -> foo, Lib/foo/__init__.py -> foo | |
| module=$(echo "$file" | sed -E 's|^Lib/||; s|/__init__\.py$||; s|\.py$||; s|/.*||') | |
| if [[ -n "$module" && -z "${seen[$module]:-}" ]]; then | |
| seen["$module"]=1 | |
| modules="$modules $module" | |
| fi | |
| done | |
| modules=$(echo "$modules" | xargs) # trim whitespace | |
| echo "Detected modules: $modules" | |
| echo "modules=$modules" >> $GITHUB_OUTPUT |
🤖 Prompt for AI Agents
In @.github/workflows/lib-deps-check.yaml around lines 35 - 60, The
duplicate-module check uses a regex substring test in the condition 'if [[ -n
"$module" && ! " $modules " =~ " $module " ]]; then' which misdetects modules
like "os" when "posix" exists; replace this with exact-match deduplication by
tracking seen modules with an associative array (e.g., declare -A seen) inside
the same loop (for file in $changed) and only append when the module key is not
present (check via [[ -z "${seen[$module]}" ]]); after appending set
seen[$module]=1 and keep producing the trimmed $modules and GITHUB_OUTPUT as
before.
| - name: Run deps check | ||
| if: steps.changed-files.outputs.modules != '' | ||
| id: deps-check | ||
| run: | | ||
| # Run deps for all modules at once | ||
| python scripts/update_lib deps ${{ steps.changed-files.outputs.modules }} --depth 2 > /tmp/deps_output.txt 2>&1 || true | ||
| # Read output for GitHub Actions | ||
| echo "deps_output<<EOF" >> $GITHUB_OUTPUT | ||
| cat /tmp/deps_output.txt >> $GITHUB_OUTPUT | ||
| echo "EOF" >> $GITHUB_OUTPUT | ||
| # Check if there's any meaningful output | ||
| if [ -s /tmp/deps_output.txt ]; then | ||
| echo "has_output=true" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "has_output=false" >> $GITHUB_OUTPUT | ||
| fi | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against command injection from PR-controlled module names.
Line 72 expands untrusted module names directly into a shell command in a pull_request_target workflow. A crafted filename could inject shell tokens. Pipe modules through xargs and include -- to avoid option injection.
🔒 Proposed fix (no shell eval of module names)
- python scripts/update_lib deps ${{ steps.changed-files.outputs.modules }} --depth 2 > /tmp/deps_output.txt 2>&1 || true
+ printf '%s\n' ${{ steps.changed-files.outputs.modules }} | \
+ xargs -r python scripts/update_lib deps --depth 2 -- > /tmp/deps_output.txt 2>&1 || true🤖 Prompt for AI Agents
In @.github/workflows/lib-deps-check.yaml around lines 67 - 85, The workflow
step "Run deps check" currently expands untrusted module names directly in the
shell via the command starting with "python scripts/update_lib deps ${{
steps.changed-files.outputs.modules }} --depth 2", which allows command
injection; change this to stop interpolating the raw variable and instead stream
the modules through a safe tool (e.g., use printf or echo to produce one module
per line and pipe into xargs) so each module is passed as a positional argument
and include a "--" marker before module names to prevent option injection,
invoking "python scripts/update_lib deps -- <module> --depth 2" for each module;
keep the rest of the step (writing /tmp/deps_output.txt and populating
GITHUB_OUTPUT and has_output) unchanged.
| from update_lib.io_utils import read_python_files, safe_parse_ast, safe_read_text | ||
| from update_lib.path import construct_lib_path, resolve_module_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused safe_read_text import.
This is unused and will trip lint (F401).
🧹 Proposed cleanup
-from update_lib.io_utils import read_python_files, safe_parse_ast, safe_read_text
+from update_lib.io_utils import read_python_files, safe_parse_ast🧰 Tools
🪛 Flake8 (7.3.0)
[error] 12-12: 'update_lib.io_utils.safe_read_text' imported but unused
(F401)
🤖 Prompt for AI Agents
In `@scripts/update_lib/deps.py` around lines 12 - 13, Remove the unused
safe_read_text import from the import statement in update_lib/deps.py: update
the line that imports from update_lib.io_utils so it only imports
read_python_files and safe_parse_ast, eliminating safe_read_text to satisfy lint
rule F401 and avoid unused-import warnings.
| get_soft_deps, | ||
| get_test_dependencies, | ||
| get_test_paths, | ||
| parse_lib_imports, | ||
| parse_test_imports, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test logic changes violate repo test-editing rules.
New test logic/assertions were added in a **/*test*.py file, which conflicts with the rule that only expectedFailure annotations may be changed. Please confirm an exception or move these checks to a non-test module. As per coding guidelines, ...
Also applies to: 236-343
🤖 Prompt for AI Agents
In `@scripts/update_lib/tests/test_deps.py` around lines 10 - 14, The new
assertions/logic added to the test file violate the rule that test files under
**/*test*.py may only change expectedFailure annotations; revert the behavioral
changes in tests like get_soft_deps, get_test_dependencies, get_test_paths,
parse_lib_imports, and parse_test_imports and either (a) move any new validation
logic into a non-test helper/module (e.g., scripts/update_lib/helpers.py) and
import it from the tests, or (b) if the test must change behavior because of a
known failing condition, mark the test with `@pytest.mark.xfail` (or
expectedFailure) instead of changing assertions; ensure only annotation changes
remain in the test file.
cc @ShaharNaveh @moreal @terryluan12
To understand better what lib is related to its dependencies
$ python3 scripts/update_lib deps dis [+] lib: cpython/Lib/dis.py [+] test: cpython/Lib/test/test_dis.py soft_deps: - [ ] dis (native: _opcode, sys) - [ ] collections (native: _collections, _weakref, itertools, sys) - [ ] copy - [ ] copyreg - [ ] operator (native: _operator, builtins) - [x] functools - [x] typing - [ ] types (native: _types, sys) - [x] _collections_abc, functools - [ ] weakref (native: _weakref, atexit, gc, itertools, sys) - [ ] _weakrefset (native: _weakref) - [ ] types - [ ] copy - [x] _collections_abc - [ ] operator (native: _operator, builtins) - [x] functools - [ ] reprlib (native: _thread, builtins, itertools, math, sys) - [x] _collections_abc, heapq, keyword - [ ] io (native: _io, _thread, errno, msvcrt, sys) - [ ] codecs (native: _codecs, builtins, sys) - [ ] encodings (native: _codecs_cn, _codecs_hk, _codecs_iso2022, _codecs_jp, _codecs_kr, _codecs_tw, _multibytecodec, _win_cp_codecs, binascii, sys, unicodedata, zlib) - [ ] quopri (native: binascii, sys) - [ ] getopt (native: sys) - [ ] gettext (native: builtins, errno, sys) - [ ] locale (native: _locale, builtins, sys) - [ ] os (native: nt, posix, sys) - [ ] ntpath (native: _winapi, nt, sys) - [ ] posixpath (native: errno, posix, pwd, sys) - [ ] subprocess (native: _posixsubprocess, _winapi, builtins, errno, fcntl, grp, msvcrt, pwd, select, sys, time) - [ ] io - [x] _collections_abc, abc, stat, warnings - [ ] encodings - [x] _collections_abc, functools, re, warnings - [ ] os (native: nt, posix, sys) - [ ] io, ntpath, posixpath, subprocess - [x] _collections_abc, abc, stat, warnings - [ ] struct (native: _struct) - [ ] copy, operator - [x] re, warnings - [ ] os (native: nt, posix, sys) - [ ] io, ntpath, posixpath, subprocess - [x] _collections_abc, abc, stat, warnings - [ ] io - [ ] codecs, io - [x] base64, bz2, re, stringprep - [x] warnings - [ ] locale (native: _locale, builtins, sys) - [ ] encodings, os - [x] _collections_abc, functools, re, warnings - [ ] os (native: nt, posix, sys) - [ ] io, ntpath, posixpath, subprocess - [x] _collections_abc, abc, stat, warnings - [x] _collections_abc, abc, stat, warnings - [ ] opcode (native: _opcode, builtins) - [ ] _opcode_metadata - [ ] types (native: _types, sys) - [x] _collections_abc, functools - [x] argparseSummary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.