Skip to content

Repeated Python Thread State creation and destruction for foreign threads is slow #146219

@gpshead

Description

@gpshead

Feature or enhancement

Proposal:

Terminology: foreign thread - An OS thread not created by CPython itself, such as by Rust or C code.

1.) "don't do that" - sure - that's the documentation PR I'll put up to start with, code can manually work around this and improve its performance today. this works on existing Python versions. it matters more on free-threading builds due to their additional allocation overheads but is a meaningful improvement on old-style traditional builds as well.

2.) Simple C API patterns for calling into the CPython interpreter lead to tripping over this performance problem. A classic PyGILState_Ensure() + PyGILState_Release() combo surrounding the call into CPython in a foreign worker thread creates a new Python Thread State via Ensure and destroys it again upon Release. So if you're a foreign thread repeatedly calling into the interpreter (such as a thread pool)... it hurts.

I'm filing this rather than only doing a documentation addition as I think we might be able to make this work smoother by default in many cases without the complications code needs to deal with to prevent it.

One (not well fleshed out) idea to explore - what if we kept a cache of recent ready but no longer referenced thread states for trivial resurrection instead of destroying them upon Release?

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions