Skip to content

Frame.co_code implementation.#2149

Draft
alanjds wants to merge 12 commits intoRustPython:mainfrom
alanjds:_frame_co_code
Draft

Frame.co_code implementation.#2149
alanjds wants to merge 12 commits intoRustPython:mainfrom
alanjds:_frame_co_code

Conversation

@alanjds
Copy link
Contributor

@alanjds alanjds commented Aug 25, 2020

TBD

@alanjds
Copy link
Contributor Author

alanjds commented Nov 19, 2020

When testing pdb/bdb with this branch, I understood that some source of tooling confusion is that the AST generator (or parser, idk) thing is marking "line 0" for lots of bytecodes, even far down ones. This is leading pdb to not reliably print the current line or perceiving where to stop after a step. I consider this a bug, even being an interpreter-dependent one.

For some reason, I got to know PEP 626, accepted on CPython 3.10. It adds a Frame.co_lines iterator of tuples with bytecode offsets & source line, to be used inside Frame.co_lnotab changed to a lazy object generated from co_lines.

I was having trouble trying to implement the co_lnotab for RustPython, and this can ease the whole implementation yet making the thing future-proof.

Could I get some guidance on where/how to implement it?

@alanjds
Copy link
Contributor Author

alanjds commented Nov 19, 2020

@coolreader18
Copy link
Member

Yeah, I think co_lines seems better, especially if we're having issues with bytecode locations. For implementing, I think we could iterate through the locations vec and do something like group_by to find the ranges of where the bytecode locations are identical.

@coolreader18
Copy link
Member

Also, what are you planning on using the co_code attribute for? Our bytecode format isn't really stable, and all the existing code that reads co_code probably expects the cpython bytecode format. I think I'd prefer to keep it unimplemented if possible, just so we avoid things reading co_code as "corrupted" because it isn't the format it expects.

@alanjds
Copy link
Contributor Author

alanjds commented Nov 20, 2020

Also, what are you planning on using the co_code attribute for? Our bytecode format isn't really stable, and all the existing code that reads co_code probably expects the cpython bytecode format. I think I'd prefer to keep it unimplemented if possible, just so we avoid things reading co_code as "corrupted" because it isn't the format it expects.

Well, not really.

I needed co_code it available because pdb.py uses dis and inspect modules to get the source lines, via co_lnotab + co_code. Is not nice, but is how the debuggers work today :/
The internal dis.rs is ok for demonstrations but not enough for inspect and pdb modules.

Looking at the code of pdb and bdb, I saw nothing very specific to the bytecode format, which makes sense as the same pdb.py is used on PyPy for example, that have a different bytecode set than CPython.

Implement co_code + co_lnotab seemed a more stable approach, as pdb.py, inspect and dis are kinda coupled around it. I started by patching pdb.py replacing the parts touching co_code+co_lnotab, but the rabbit hole showed itself when I started digging.

Another thing is that, after pdb working, my next targets would be IPython, wdb and debugpy. This ones will probable need patches and maintenance. Implementing co_code+co_lnotab seems like less work now and on the future.

@alanjds
Copy link
Contributor Author

alanjds commented Nov 20, 2020

Ah, I also plan to implement PEP 3147 (aka __pycache__ folder) in the near future. This will need direct access to the bytecode to be dumped to a file on disk.

@alanjds
Copy link
Contributor Author

alanjds commented Nov 20, 2020

This is where pdb.py indirectly uses co_code:
https://github.com/python/cpython/blob/022bc7572f061e1d1132a4db9d085b29707701e7/Lib/pdb.py#L107-L122

def getsourcelines(obj):
    lines, lineno = inspect.findsource(obj)
    if inspect.isframe(obj) and obj.f_globals is obj.f_locals:
        # must be a module frame: do not try to cut a block out of it
        return lines, 1
    elif inspect.ismodule(obj):
        return lines, 1
    return inspect.getblock(lines[lineno:]), lineno+1

def lasti2lineno(code, lasti):
    linestarts = list(dis.findlinestarts(code))
    linestarts.reverse()
    for i, lineno in linestarts:
        if lasti >= i:
            return lineno
    return 0

Also, pdb.py sets f_lasti when jumping around. This will need a way to translate the line number to the bytecode offset, if I understood correctly.

@coolreader18
Copy link
Member

I also plan to implement PEP 3147

I think that's already implemented, just by the importlib frozen module.

Also, it looks like the way findlinestarts is implemented assumes that co_code is actually bytecode, i.e. f_lasti corresponds to an index into f_code.co_code, or at least len(co_code) represents the max that f_lasti can be. I think because the dis module is so tied to the bytecode format of the interpreter, it would make sense to implement findlinestarts in the dis.rs module we already have.

@alanjds
Copy link
Contributor Author

alanjds commented Nov 20, 2020

I was playing with an implementation of findlinestarts on dis.rs when let it half-baked here.

Will explore this path first. Thanks for the guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants