Skip to content

gh-146044: Fix ctrl-w (unix-word-rubout) to use whitespace word boundaries#146174

Open
kimimgo wants to merge 5 commits intopython:mainfrom
kimimgo:fix/pyrepl-ctrl-w-146044
Open

gh-146044: Fix ctrl-w (unix-word-rubout) to use whitespace word boundaries#146174
kimimgo wants to merge 5 commits intopython:mainfrom
kimimgo:fix/pyrepl-ctrl-w-146044

Conversation

@kimimgo
Copy link

@kimimgo kimimgo commented Mar 19, 2026

Fixes #146044

Summary

The unix-word-rubout command (ctrl-w) was using bow() which treats
punctuation as word separators via the syntax table. This differs from
bash/readline's unix-word-rubout which uses only whitespace as boundaries.

Changes

  • reader.py: Add bow_whitespace() method — same as bow() but only
    considers spaces and newlines as word boundaries
  • commands.py: unix_word_rubout now calls bow_whitespace() instead of bow()
  • test_reader.py: Add TestBowWhitespace class with 4 tests verifying
    the whitespace-only behavior and the difference from bow()

Example

With buffer foo.bar baz and cursor at end:

  • Before: ctrl-w deletes baz, then bar, then foo (3 operations)
  • After: ctrl-w deletes baz, then foo.bar (2 operations, matching bash)

The existing bow() method used by backward-kill-word (M-Backspace) is unchanged.

…on#146044)

The unix-word-rubout command (ctrl-w) was using syntax_table-based
word boundaries (bow()), which treats punctuation as word separators.
This differs from bash/readline's unix-word-rubout which uses only
whitespace as word boundaries.

Add bow_whitespace() method that uses whitespace-only boundaries,
and use it in unix_word_rubout instead of bow(). The existing bow()
method (used by backward-kill-word/M-Backspace) is unchanged.

Example: with 'foo.bar baz' and cursor at end:
- Before (bow): ctrl-w deletes 'baz', then 'bar', then 'foo'
- After (bow_whitespace): ctrl-w deletes 'baz', then 'foo.bar'
@python-cla-bot
Copy link

python-cla-bot bot commented Mar 19, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Mar 19, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Mar 19, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
okiemute04 added a commit to okiemute04/cpython that referenced this pull request Mar 19, 2026
Add a check in Parse() to prevent calls when in_callback is true,
as this violates expat's requirements and can cause crashes. Now raises
RuntimeError with a clear message.

Add tests to verify the fix and ensure normal parsing still works.
@picnixz
Copy link
Member

picnixz commented Mar 20, 2026

I personally prefer stopping at separators rather than whitespaces. I think zsh stops at punctuations. So I would first open a DPO thread about this before changing anything.

@kimimgo
Copy link
Author

kimimgo commented Mar 20, 2026

Thanks for the feedback @picnixz!

The distinction I'm making is between two different readline commands:

  • unix-word-rubout (Ctrl-W): POSIX/bash definition uses whitespace-only boundaries. From the GNU Readline docs: "Kill the word behind point, using white space as a word boundary."
  • backward-kill-word (M-Backspace): Uses syntax-table boundaries (stops at punctuation), which is the current bow() behavior.

The current code maps Ctrl-W to unix_word_rubout but uses bow() (syntax-table/punctuation boundaries), which matches backward-kill-word behavior instead.

This PR keeps bow() unchanged for backward-kill-word — it only changes unix_word_rubout to use whitespace boundaries, matching the readline specification.

That said, I'm happy to open a Discourse thread if you think this needs broader discussion first.

@picnixz
Copy link
Member

picnixz commented Mar 20, 2026

I am a bit torn here... I would say M-backspace is... less convenient to type but at the same time if ctrl+w is meant to emulate the Unix one then I guess it may be better to ensure this.

What is the behavior on the old REPL? (in 3.12?)

@injust
Copy link
Contributor

injust commented Mar 21, 2026

I think zsh stops at punctuations.

ctrl-w (unix-word-rubout) uses whitespace boundaries on bash, zsh, and fish on macOS.

@injust
Copy link
Contributor

injust commented Mar 21, 2026

What is the behavior on the old REPL? (in 3.12?)

ctrl-w stops on whitespace boundaries in the 3.12 REPL.

unix_word_rubout and backward_kill_word had the same implementation since 3.13, so this looks like a bug that was just missed until now.

@kimimgo
Copy link
Author

kimimgo commented Mar 21, 2026

Thanks @injust for confirming the cross-shell behavior and the 3.12 regression!

To summarize: this is a regression introduced in 3.13's pyrepl where unix_word_rubout and backward_kill_word were given the same bow() implementation. In 3.12 (GNU readline), ctrl-w correctly used whitespace boundaries.

This PR restores the 3.12 behavior by adding bow_whitespace() for unix_word_rubout while keeping bow() (syntax-table based) for backward_kill_word.

@picnixz picnixz added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Mar 21, 2026
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks but please make tests less LLM-esque.

p -= 1
return p + 1

def bow_whitespace(self, p: int | None = None) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unix_bow() instead? or bow_ws()? since we use quite short names.


class TestBowWhitespace(TestCase):
def test_bow_whitespace_stops_at_whitespace(self):
# GH#146044
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        # See https://github.com/cpython/issues/146044

Use a link looking like that so that I can click on it in my IDE. I created the link from memory sonjust check that it is the correct URL.

reader.pos = len(reader.buffer)

result = reader.bow()
self.assertEqual(result, 8) # same — "baz" is all word chars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(result, 8) # same "baz" is all word chars
self.assertEqual(result, 8) # same: "baz" is all word chars

Avoid LLM long dashes and use regular english please

reader.buffer = list("foo.bar")
reader.pos = len(reader.buffer)

# bow() stops at '.' → returns index of 'b' in "bar"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# bow() stops at '.' → returns index of 'b' in "bar"
# bow() stops at '.' so we return the index of 'b' in "bar"

@@ -0,0 +1,3 @@
Fix ``unix-word-rubout`` (Ctrl-W) in the REPL to use whitespace-only word
boundaries, matching bash/readline behavior. Previously it used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boundaries, matching bash/readline behavior. Previously it used
boundaries, matching behavior of the basic REPL. Previously it used

self.assert_screen_equal(reader, 'flag {o}={z} {s}"🏳️\\u200d🌈"{z}'.format(**colors))


class TestBowWhitespace(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire test case can have one single reference to the GH issue as a comment and we can remove them from the methods.

However I would prefer that we extend the existing test case with the bow tests and place them where existing ones are.

@bedevere-app
Copy link

bedevere-app bot commented Mar 21, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

p -= 1
while p >= 0 and b[p] in (" ", "\n"):
p -= 1
while p >= 0 and b[p] not in (" ", "\n"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while p >= 0 and b[p] not in (" ", "\n"):
while p >= 0 and b[p] not in " \n":

p = self.pos
b = self.buffer
p -= 1
while p >= 0 and b[p] in (" ", "\n"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while p >= 0 and b[p] in (" ", "\n"):
while p >= 0 and b[p] in " \n":

p -= 1
while p >= 0 and b[p] in (" ", "\n"):
p -= 1
while p >= 0 and b[p] not in (" ", "\n"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC newlines are also counted but what about tabs? do we also convert them to 4-indents in the REPL or?

- Rename bow_whitespace() to bow_ws() for shorter naming convention
- Move tests from separate TestBowWhitespace class into TestReader
- Add tab character to whitespace set (space, newline, tab)
- Add test_bow_ws_with_tabs for tab handling
- Fix link format to python#146044
- Remove LLM-style long dashes from comments
- Update NEWS wording per review
@kimimgo
Copy link
Author

kimimgo commented Mar 21, 2026

Thanks @picnixz! Updated:

  • Renamed bow_whitespace()bow_ws() for shorter naming
  • Moved all tests into TestReader (removed separate class)
  • Added \t to the whitespace set and a test_bow_ws_with_tabs test
  • Fixed link format to https://github.com/python/cpython/issues/146044
  • Removed long dashes, cleaned up comment style
  • Updated NEWS wording


p defaults to self.pos; only whitespace is considered a word
boundary, matching the behavior of unix-word-rubout in bash/readline.
See https://github.com/python/cpython/issues/146044"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See https://github.com/python/cpython/issues/146044"""
"""

Comment on lines +387 to +388
self.assertEqual(reader.bow_ws(), 4)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(reader.bow_ws(), 4)
self.assertEqual(reader.bow_ws(), 4)

reader.setpos_from_xy(8, 0)
self.assertEqual(reader.pos, 7)

def test_bow_ws_stops_at_whitespace(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the existing tests for bow()? id there are some, please put those tests next to them

def test_bow_ws_includes_punctuation_in_word(self):
reader = prepare_reader(prepare_console([]))
reader.buffer = list("foo.bar(baz) qux")
reader.pos = 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the index() method to get the index of )


def test_bow_ws_with_tabs(self):
reader = prepare_reader(prepare_console([]))
reader.buffer = list("foo\tbar")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests with \n and ensure that you also properly jump lines.

Comment on lines +589 to +590


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove those extra blanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting changes needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ctrl-w deletes words with punctuation instead of whitespace boundary

3 participants