CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.
The project is in early development and considered experimental. Pull requests are welcome!
- Platforms: iOS (simulator + limited device support) and Android (emulator + device).
- Core commands:
open,back,home,app-switcher,press,long-press,swipe,focus,type,fill,scroll,scrollintoview,pinch,wait,alert,screenshot,close,reinstall. - Inspection commands:
snapshot(accessibility tree). - Device tooling:
adb(Android),simctl/devicectl(iOS via Xcode). - Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).
npm install -g agent-deviceOr use it without installing:
npx agent-device open SampleAppUse refs for agent-driven exploration and normal automation flows.
agent-device open Contacts --platform ios # creates session on iOS Simulator
agent-device snapshot
agent-device click @e5
agent-device fill @e6 "John"
agent-device fill @e7 "Doe"
agent-device click @e3
agent-device closeagent-device <command> [args] [--json]Basic flow:
agent-device open SampleApp
agent-device snapshot
agent-device click @e7
agent-device fill @e8 "hello"
agent-device close SampleAppDebug flow:
agent-device trace start
agent-device snapshot -s "Sample App"
agent-device find label "Wi-Fi" click
agent-device trace stop ./trace.logCoordinates:
- All coordinate-based commands (
press,long-press,swipe,focus,fill) use device coordinates with origin at top-left. - X increases to the right, Y increases downward.
Gesture series examples:
agent-device press 300 500 --count 12 --interval-ms 45
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pongboot,open,close,reinstall,home,back,app-switchersnapshot,find,getclick,focus,type,fill,press,long-press,swipe,scroll,scrollintoview,pinch,isalert,wait,screenshottrace start,trace stopsettings wifi|airplane|location on|offappstate,apps,devices,session list
| Backend | Speed | Accuracy | Requirements |
|---|---|---|---|
xctest |
Fast | High | No Accessibility permission required |
ax |
Fast | Medium | Accessibility permission for the terminal app, not recommended |
Notes:
- Default backend is
xcteston iOS. - Scope snapshots with
-s "<label>"or-s @ref. - If XCTest returns 0 nodes (e.g., foreground app changed), agent-device falls back to AX when available.
Flags:
--version, -Vprint version and exit--platform ios|android--device <name>--udid <udid>(iOS)--serial <serial>(Android)--activity <component>(Android app launch only; package/Activity or package/.Activity; not for URL opens)--session <name>--count <n>repeat count forpress/swipe--interval-ms <ms>delay betweenpressiterations--hold-ms <ms>hold duration perpressiteration--jitter-px <n>deterministic coordinate jitter forpress--pause-ms <ms>delay betweenswipeiterations--pattern one-way|ping-pongrepeat pattern forswipe--verbosefor daemon and runner logs--jsonfor structured output--backend ax|xctest(snapshot only; defaults toxcteston iOS)
Pinch:
pinchis supported on iOS simulators.- On Android,
pinchcurrently returnsUNSUPPORTED_OPERATIONin the adb backend.
Swipe timing:
swipeaccepts optionaldurationMs(default250, range16..10000).- Android uses requested swipe duration directly.
- iOS uses a safe normalized duration to avoid long-press side effects.
Install the automation skills listed in SKILL.md.
npx skills add https://github.com/callstackincubator/agent-device --skill agent-deviceSessions:
openstarts a session. Without args boots/activates the target device/simulator without launching an app.- All interaction commands require an open session.
- If a session is already open,
open <app|url>switches the active app or opens a deep link URL. closestops the session and releases device resources. Pass an app to close it explicitly, or omit to just close the session.- Use
--session <name>to manage multiple sessions. - Session scripts are written to
~/.agent-device/sessions/<session>-<timestamp>.adwhen recording is enabled with--save-script. - Deterministic replay is
.ad-based; usereplay --update(-u) to update selector drift and rewrite the replay file in place.
Navigation helpers:
boot --platform ios|androidensures the target is ready without launching an app.- Use
bootmainly when starting a new session andopenfails because no booted simulator/emulator is available. open [app|url]already boots/activates the selected target when needed.reinstall <app> <path>uninstalls and installs the app binary in one command (Android + iOS simulator in v1).reinstallaccepts package/bundle id style app names and supports~in paths.
Deep links:
open <url>supports deep links withscheme://....- Android opens deep links via
VIEWintent. - iOS deep link open is simulator-only in v1.
--activitycannot be combined with URL opens.
agent-device open "myapp://home" --platform android
agent-device open "https://example.com" --platform iosFind (semantic):
find <text> <action> [value]finds by any text (label/value/identifier) using a scoped snapshot.find text|label|value|role|id <value> <action> [value]for specific locators.- Actions:
click(default),fill,type,focus,get text,get attrs,wait [timeout],exists.
Assertions:
ispredicates:visible,hidden,exists,editable,selected,text.is textuses exact equality.
Replay update:
replay <path>runs deterministic replay from.adscripts.replay -u <path>attempts selector updates on failures and atomically rewrites the same file.- Refs are the default/core mechanism for interactive agent flows.
- Update targets:
click,fill,get,is,wait. - Selector matching is a replay-update internal: replay parses
.adlines into actions, tries them, snapshots on failure, resolves a better selector, then rewrites that failing line.
Update examples:
# Before (stale selector)
click "id=\"old_continue\" || label=\"Continue\""
# After replay -u (rewritten in place)
click "id=\"auth_continue\" || label=\"Continue\""# Before (ref-based action from discovery)
snapshot -i -c -s "Continue"
click @e13 "Continue"
# After replay -u (upgraded to selector-based action)
snapshot -i -c -s "Continue"
click "id=\"auth_continue\" || label=\"Continue\""Android fill reliability:
fillclears the current value, then enters text.typeenters text into the focused field without clearing.fillnow verifies the entered value on Android.- If value does not match, agent-device clears the field and retries once with slower typing.
- This reduces IME-related character swaps on long strings (e.g. emails and IDs).
Settings helpers (simulators):
settings wifi on|offsettings airplane on|offsettings location on|off(iOS uses per-app permission for the current session app) Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.
App state:
appstateshows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it falls back to a snapshot-based guess (AX first, XCTest if AX can’t identify).apps --metadatareturns app list with minimal metadata.
agent-device trace startagent-device trace stop ./trace.log- The trace log includes snapshot logs and XCTest runner logs for the session.
- Built-in retries cover transient runner connection failures, AX snapshot hiccups, and Android UI dumps.
- For snapshot issues (missing elements), compare with
--rawflag for unaltered output and scope with-s "<label>".
Boot diagnostics:
- Boot failures include normalized reason codes in
error.details.reason(JSON mode) and verbose logs. - Reason codes:
IOS_BOOT_TIMEOUT,IOS_RUNNER_CONNECT_TIMEOUT,ANDROID_BOOT_TIMEOUT,ADB_TRANSPORT_UNAVAILABLE,CI_RESOURCE_STARVATION_SUSPECTED,BOOT_COMMAND_FAILED,UNKNOWN. - Android boot waits fail fast for permission/tooling issues and do not always collapse into timeout errors.
- Use
agent-device boot --platform ios|androidwhen starting a new session only ifopencannot find/connect to an available target. - Set
AGENT_DEVICE_RETRY_LOGS=1to print structured retry telemetry (attempt, phase, delay, elapsed/remaining deadline, reason).
- Bundle/package identifiers are accepted directly (e.g.,
com.apple.Preferences). - Human-readable names are resolved when possible (e.g.,
Settings). - Built-in aliases include
Settingsfor both platforms.
- Input commands (
press,type,scroll, etc.) are supported only on simulators in v1 and use the XCTest runner. alertandscrollintoviewuse the XCTest runner and are simulator-only in v1.- Real device support (including snapshots) is on the roadmap for iOS.
pnpm testUseful local checks:
pnpm typecheck
pnpm test:unit
pnpm test:smokepnpm buildEnvironment selectors:
ANDROID_DEVICE=Pixel_9_Pro_XLorANDROID_SERIAL=emulator-5554IOS_DEVICE="iPhone 17 Pro"orIOS_UDID=<udid>AGENT_DEVICE_IOS_BOOT_TIMEOUT_MS=<ms>to adjust iOS simulator boot timeout (default:120000, minimum:5000).
Test screenshots are written to:
test/screenshots/android-settings.pngtest/screenshots/ios-settings.png
See CONTRIBUTING.md.
agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.