Skip to content

gh-145742: Fix cross-block adrp+ldr to movz optimization on AArch64#148436

Open
corona10 wants to merge 2 commits intopython:mainfrom
corona10:gh-145742-aarch64
Open

gh-145742: Fix cross-block adrp+ldr to movz optimization on AArch64#148436
corona10 wants to merge 2 commits intopython:mainfrom
corona10:gh-145742-aarch64

Conversation

@corona10
Copy link
Copy Markdown
Member

@corona10 corona10 commented Apr 12, 2026

I found a possible optimization of the current stencil at AArch64 while working on #148217

_LOAD_FAST_r01

// Before (8 instructions):
adrp    x8, __JIT_OPARG_16@GOTPAGE
ldr     x8, [x8, __JIT_OPARG_16@GOTPAGEOFF]
add     x8, x21, w8, uxth #3
ldr     x24, [x8, #0x50]
tbnz    w24, #0x0, ...
ldr     w8, [x24]
add     w8, w8, #0x1
str     w8, [x24]
// After (7 instructions):
movz    x8, #0x0
add     x8, x21, w8, uxth #3
ldr     x24, [x8, #0x50]
tbnz    w24, #0x0, ...
ldr     w8, [x24]
add     w8, w8, #0x1
str     w8, [x24]

_LOAD_FAST_BORROW_r01

// Before (5 instructions):
adrp    x8, __JIT_OPARG_16@GOTPAGE
ldr     x8, [x8, __JIT_OPARG_16@GOTPAGEOFF]
add     x8, x21, w8, uxth #3
ldr     x8, [x8, #0x50]
orr     x24, x8, #0x1
// After (4 instructions):
movz    x8, #0x0
add     x8, x21, w8, uxth #3
ldr     x8, [x8, #0x50]
orr     x24, x8, #0x1

Affected uop codes at AArch64

_CHECK_ATTR_METHOD_LAZY_DICT_r11
_CHECK_ATTR_METHOD_LAZY_DICT_r22
_CHECK_ATTR_METHOD_LAZY_DICT_r33
_COPY_r01
_GUARD_BIT_IS_SET_POP_r00
_GUARD_BIT_IS_SET_POP_r10
_GUARD_BIT_IS_SET_POP_r21
_GUARD_BIT_IS_SET_POP_r32
_GUARD_BIT_IS_UNSET_POP_r00
_GUARD_BIT_IS_UNSET_POP_r10
_GUARD_BIT_IS_UNSET_POP_r21
_GUARD_BIT_IS_UNSET_POP_r32
_GUARD_CODE_VERSION__PUSH_FRAME_r00
_GUARD_CODE_VERSION__PUSH_FRAME_r11
_GUARD_CODE_VERSION__PUSH_FRAME_r22
_GUARD_CODE_VERSION__PUSH_FRAME_r33
_GUARD_TYPE_VERSION_LOCKED_r11
_GUARD_TYPE_VERSION_LOCKED_r22
_GUARD_TYPE_VERSION_LOCKED_r33
_GUARD_TYPE_VERSION_r11
_GUARD_TYPE_VERSION_r22
_GUARD_TYPE_VERSION_r33
_LOAD_COMMON_CONSTANT_r01
_LOAD_COMMON_CONSTANT_r12
_LOAD_COMMON_CONSTANT_r23
_LOAD_CONST_r01
_LOAD_CONST_r12
_LOAD_CONST_r23
_LOAD_FAST_AND_CLEAR_r01
_LOAD_FAST_AND_CLEAR_r12
_LOAD_FAST_AND_CLEAR_r23
_LOAD_FAST_BORROW_r01
_LOAD_FAST_BORROW_r12
_LOAD_FAST_BORROW_r23
_LOAD_FAST_r01
_LOAD_FAST_r12
_LOAD_FAST_r23
_LOAD_SMALL_INT_r01
_LOAD_SMALL_INT_r12
_LOAD_SMALL_INT_r23
_PUSH_NULL_CONDITIONAL_r00
_SAVE_RETURN_OFFSET_r00
_SAVE_RETURN_OFFSET_r11
_SAVE_RETURN_OFFSET_r22
_SAVE_RETURN_OFFSET_r33
_SWAP_FAST_r01
_SWAP_FAST_r11
_SWAP_FAST_r22
_SWAP_FAST_r33
_SWAP_r11

@corona10 corona10 added the performance Performance or resource usage label Apr 12, 2026
@corona10
Copy link
Copy Markdown
Member Author

corona10 commented Apr 12, 2026

cc @diegorusso and @markshannon who work at ARM :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant