[DOCS]
[TYPE_SYSTEM]

Type System Reference

All types are defined in src/types.ts. Zod schemas provide runtime validation; TypeScript types are derived from them with z.infer<>.


Core Interfaces

ShellExecutor

The dependency injection boundary for all external process execution.

interface ShellExecutor {
  exec(command: string, options?: ShellExecOptions): Promise<string>;
}

interface ShellExecOptions {
  timeoutMs?: number;    // Default: 30,000ms (TIMEOUTS.SHELL_COMMAND_MS)
  signal?: AbortSignal;  // For cancellation
}

Implementations:

  • ProcessShellExecutor (production) — calls child_process.exec
  • Mock implementations (testing) — return pre-recorded responses

HelperClient (Phase 3)

HTTP client for the on-device helper APK. Talks to http://127.0.0.1:8765 (forwarded to the device's 8765 via adb forward).

class HelperClient {
  constructor(hostPort: number, defaultTimeoutMs?: number);

  status(timeoutMs?: number): Promise<HelperStatus | null>;
  waitForReady(timeoutMs?: number, expectedProtocolVersion?: number): Promise<HelperStatus>;
  getTree(opts?: { compact?: boolean; packageName?: string }): Promise<HelperJson>;
  detectFramework(packageName?: string): Promise<HelperFrameworkInfo>;
  waitForIdle(opts?: { timeoutMs?: number; packageName?: string }): Promise<HelperIdleResult>;
  screenshot(): Promise<string>; // base64 PNG
  tap(x: number, y: number): Promise<void>;
  swipe(x1: number, y1: number, x2: number, y2: number, durationMs: number): Promise<void>;
  longPress(x: number, y: number, durationMs: number): Promise<void>;
  key(keycode: number): Promise<void>;
  typeText(text: string): Promise<void>;
  shutdown(): Promise<void>;
}

interface HelperStatus {
  ok: boolean;
  name: string;
  version: string;
  protocolVersion: number;
  sdkInt: number;
  device: string;
  manufacturer: string;
}

interface HelperFrameworkInfo {
  ok: boolean;
  packageName: string;
  primary: 'flutter' | 'react_native' | 'compose' | 'native';
  frameworks: string[];   // e.g. ['react_native', 'react_native_hermes']
  signals: string[];      // detection evidence: 'class:...', 'lib:...'
}

interface HelperIdleResult {
  ok: boolean;
  idle: boolean;
  reason: string;         // 'events_quiet' | 'timeout'
  events: string[];       // accessibility event names recorded
}

FrameworkSync (Phase 3.5 / 3.6 / 3.7 / 3.9 / 3.10)

Orchestrator that composes framework-specific sync backends behind a single interface wired into DeviceClient.sync. waitForIdle in idle.ts calls FrameworkSync.waitForSync() as a tail probe after the helper's accessibility-event idle, so the helper's idle signal is augmented by JS-side (Hermes), Dart-side (VM Service), and app-side (Espresso idling bridge) checks whenever available. Also exposes snapshotFiberLabels() — a one-shot React Fiber walker that extracts component names (ArrowLeft, Settings, Camera, etc.) for unlabeled icon buttons in RN codegen apps (Phase 3.6).

interface FrameworkSyncInit {
  framework: 'flutter' | 'react_native' | 'compose' | 'native';
  packageName: string;
  adb: AdbClient;
}

class FrameworkSync {
  constructor(init: FrameworkSyncInit);

  /** The framework kind passed at construction. */
  get kind(): FrameworkKind;

  /** True iff any sync channel attached successfully. */
  get hasBackend(): boolean;
  /** True iff the Hermes CDP channel connected (RN debug + Metro running). */
  get hasHermes(): boolean;
  /** True iff the Dart VM Service channel connected (Flutter debug/profile). */
  get hasDartVm(): boolean;
  /** True iff the user's app declared the agentest-idling-bridge provider. */
  get hasIdlingBridge(): boolean;

  /**
   * In-band diagnostic trace surfaced through tool responses — one
   * human-readable line per channel event. Appended by each backend
   * during `attach()` + `waitForSync()` + `snapshotFiberLabels()`.
   * Tools slice out new lines after each call to return to the LLM.
   */
  get diagnostics(): string[];

  /** Open all applicable channels. Non-fatal per-channel; safe to call once. */
  attach(): Promise<void>;

  /** Tail probe after helper /wait-idle. Returns false only on channel timeout. */
  waitForSync(timeoutMs?: number): Promise<boolean>;

  /** Force Flutter semantics to build. Dart VM channel only. */
  refreshSemantics(): Promise<boolean>;

  /**
   * Extract React component names from the running Hermes JS runtime and
   * correlate them with `tree`'s accessibility nodes by bounds containment.
   * Returns a map of `nodeId → label` suitable for passing to
   * `serializeTreeCompact` as `externalLabels`. Cached by screen
   * fingerprint so multi-step flows don't re-extract on every snapshot.
   *
   * Silent no-op when framework !== 'react_native', Hermes is unattached,
   * or `AGENTEST_DISABLE_FIBER_INFERENCE=1`. Returns an empty map on
   * failure — callers fall through to Phase 3.5 hoisting.
   */
  snapshotFiberLabels(tree: UnifiedUINode): Promise<Map<string, string>>;

  /** Release all WebSockets. Safe to call multiple times. */
  close(): void;
}

Environment overrides:

  • AGENTEST_DISABLE_FRAMEWORK_SYNC=1 — short-circuits attach() so no sync backend is opened. Used by the unit tests to keep MockShellExecutor happy.
  • AGENTEST_DISABLE_FIBER_INFERENCE=1 — disables snapshotFiberLabels() even when Hermes is attached. Used by the unit tests to avoid any CDP traffic during handleConnect / handleRunFlow execution. Tools still return compact trees, just without fiber-derived labels.

HelperHandle

Returned from ensureHelper() (auto-install entry point). Lifecycle owner for the on-device helper process and its forwarded port.

interface HelperHandle {
  client: HelperClient;
  status: HelperStatus;
  shutdown(uninstall?: boolean): Promise<void>;
}

function ensureHelper(
  shell: ShellExecutor,
  deviceId?: string,
  options?: { startupTimeoutMs?: number },
): Promise<HelperHandle | null>;

ensureHelper returns null (never throws) if any of these go wrong:

  • AGENTEST_DISABLE_HELPER=1 is set
  • Prebuilt APKs not found in android-helper/prebuilt/
  • adb install fails
  • adb forward fails
  • am instrument spawn fails
  • Helper /status doesn't return ready within startupTimeoutMs (default 30s)

In all failure cases, the rest of the connect flow degrades to the ADB+gRPC path with a single stderr log. The user never sees a hard error from helper issues.


Geometry Types

interface Bounds {
  left: number;    // Pixels, screen-absolute
  top: number;
  right: number;
  bottom: number;
}

interface Point {
  x: number;  // Pixels, screen-absolute
  y: number;
}

Bounds are in absolute screen pixels as reported by Android's AccessibilityNodeInfo.getBoundsInScreen(). These are the raw values from the bounds attribute in uiautomator dump XML (format: [left,top][right,bottom]).


UnifiedUINode

The core data structure representing a single UI element in the accessibility tree. Every node in the parsed tree is a UnifiedUINode.

interface UnifiedUINode {
  // Identity
  id: string;              // Generated path-based ID (e.g., "0.1.3")
  resourceId: string;      // Android resource-id (e.g., "com.example:id/email")
  className: string;       // Raw Android class name
  role: UnifiedRole;       // Abstract mapped role
  text: string;            // Visible text content
  description: string;     // Content description (accessibility label)
  packageName: string;     // App package name
  
  // Geometry
  bounds: Bounds;          // Screen-absolute pixel bounds
  center: Point;           // Center point (computed from bounds)
  index: number;           // Sibling index (0-based)
  
  // State flags (from Android AccessibilityNodeInfo)
  enabled: boolean;
  focused: boolean;
  selected: boolean;
  checked: boolean;
  checkable: boolean;
  clickable: boolean;
  scrollable: boolean;
  longClickable: boolean;
  password: boolean;

  // Phase 3.8 Compose extras — empty string when absent
  hintText: string;         // AccessibilityNodeInfo.hintText (API 26+)
  stateDescription: string; // AccessibilityNodeInfo.stateDescription (API 28+)
  paneTitle: string;        // AccessibilityNodeInfo.paneTitle (API 28+)
  tooltipText: string;      // AccessibilityNodeInfo.tooltipText (API 28+)
  
  // Derived
  actions: UnifiedAction[];    // Available interactions
  children: UnifiedUINode[];   // Child nodes
}

Phase 3.8 Compose extras are populated by the on-device helper's TreeDumper.kt when the running app is Jetpack Compose. They surface as hint/state/pane/tooltip in the LLM tree (see LlmTreeNode below) and are the primary way to read Compose semantics from out-of-process — the unmerged tree is not reachable without in-app reflection. The XML parsing path (parseUiAutomatorXml) always defaults them to empty strings since uiautomator dump doesn't emit these fields.

Field sources:

UnifiedUINode fieldAndroid XML attributeNotes
idGenerated: dot-separated index path (e.g., "0.1.3")
resourceIdresource-idFull ID: "com.example.myapp:id/email"
classNameclassFull class: "android.widget.EditText"
roleclass (mapped)See Role Mapping
texttextVisible text content
descriptioncontent-descAccessibility label
packageNamepackageApp package
boundsboundsParsed from "[left,top][right,bottom]"
centerbounds (computed){ x: (left+right)/2, y: (top+bottom)/2 }
indexindexSibling position
enabledenabled"true"true
focusedfocused"true"true
selectedselected"true"true
checkedchecked"true"true
checkablecheckable"true"true
clickableclickable"true"true
scrollablescrollable"true"true
longClickablelong-clickable"true"true
passwordpassword"true"true

UnifiedRole

Abstract role mapped from Android class names. Defined as a const object with string literal values.

const UNIFIED_ROLES = {
  BUTTON: 'button',
  TEXT_FIELD: 'text_field',
  TEXT_VIEW: 'text_view',
  CHECK_BOX: 'check_box',
  SWITCH: 'switch',
  RADIO_BUTTON: 'radio_button',
  SLIDER: 'slider',
  SCROLL_VIEW: 'scroll_view',
  IMAGE: 'image',
  IMAGE_BUTTON: 'image_button',
  CONTAINER: 'container',
  LIST: 'list',
  LIST_ITEM: 'list_item',
  TAB: 'tab',
  TOOLBAR: 'toolbar',
  PROGRESS_BAR: 'progress_bar',
  SPINNER: 'spinner',
  WEB_VIEW: 'web_view',
  UNKNOWN: 'unknown',
} as const;

type UnifiedRole = (typeof UNIFIED_ROLES)[keyof typeof UNIFIED_ROLES];

Role Mapping

Android ClassMapped Role
android.widget.Buttonbutton
android.widget.ImageButtonimage_button
android.widget.EditTexttext_field
android.widget.CheckBoxcheck_box
android.widget.Switchswitch
android.widget.ToggleButtonswitch
android.widget.RadioButtonradio_button
android.widget.SeekBarslider
android.widget.Spinnerspinner
android.widget.TextViewtext_view
android.widget.ImageViewimage
android.widget.ProgressBarprogress_bar
android.widget.ScrollViewscroll_view
android.widget.HorizontalScrollViewscroll_view
android.widget.ListViewlist
androidx.recyclerview.widget.RecyclerViewlist
android.webkit.WebViewweb_view
android.widget.TabWidgettab
android.widget.Toolbartoolbar
androidx.appcompat.widget.Toolbartoolbar
Any class containing Layout, ViewGroup, CardView, ComposeView, ReactViewGroupcontainer
Anything elseunknown

UnifiedAction

Available interaction types, derived from the node's state flags.

const UNIFIED_ACTIONS = {
  TAP: 'tap',
  LONG_PRESS: 'long_press',
  TYPE: 'type',
  SCROLL: 'scroll',
  CHECK: 'check',
  ADJUST: 'adjust',
} as const;

Derivation rules:

ConditionAction
clickable === "true"tap
long-clickable === "true"long_press
scrollable === "true"scroll
checkable === "true"check
class === "android.widget.EditText"type
class === "android.widget.SeekBar"adjust

Zod Schemas

ElementSelectorSchema

const ElementSelectorSchema = z.object({
  /**
   * Short ref token from the last tree snapshot (e.g. "@b1", "@f2").
   * When set, ref takes priority over all other fields — they are ignored.
   * If the ref is stale, throws ElementNotFoundError with an actionable
   * message telling the LLM to call agentest_get_ui_tree for fresh refs.
   * No automatic fallback to other selector fields.
   */
  ref: z.string().optional(),
  id: z.string().optional(),
  text: z.string().optional(),
  textContains: z.string().optional(),
  className: z.string().optional(),
  description: z.string().optional(),
  index: z.number().int().nonnegative().optional(),
});

type ElementSelector = z.infer<typeof ElementSelectorSchema>;

ActionStepSchema

A discriminated union on the action field with 20 variants:

ActionRequired FieldsOptional FieldsIdle
taptargetFull
tap_coordinatesx, yLightweight
typetarget, valueLightweight
clear_texttargetLightweight
swipedirectiontarget, durationMsFull
swipe_coordinatesx1, y1, x2, y2durationMsLightweight
long_presstargetdurationMsFull
long_press_coordinatesx, ydurationMsLightweight
double_taptargetFull
double_tap_coordinatesx, yLightweight
press_keykeycodeLightweight
pinchcx, cy, startRadius, endRadiusdurationMsFull (gRPC only)
rotatecx, cy, radius, startAngleDeg, endAngleDegdurationMsFull (gRPC only)
scroll_totargetscrollTarget, direction, maxScrollsInternal
waittimeoutMsSnapshot
wait_for_stabletimeoutMsFull+Loading
assert_visibletargetSnapshot
assert_not_visibletargetSnapshot
assert_text_equalstarget, valueSnapshot
assert_text_containstarget, valueSnapshot

Idle column: Full = full idle detection with loading indicator awareness. Lightweight = single snapshot only (faster). Internal = handles its own polling. Snapshot = reads tree once.


Result Types

StepResult

interface StepResult {
  stepIndex: number;       // 0-based position in the steps array
  action: ActionStep;      // The step that was executed
  success: boolean;        // Whether the step passed
  durationMs: number;      // Wall-clock time for this step
  error?: string;          // Human-readable error (only on failure)
  loadingDetected?: string; // Description of loading indicators waited out (if any)
}

FlowTrace

interface FlowTrace {
  success: boolean;            // true if ALL steps passed
  stepsCompleted: number;      // Number of steps executed (may be < totalSteps on failure)
  totalSteps: number;          // Total steps in the flow
  results: StepResult[];       // Per-step results
  /** 6-char screen fingerprint (always present). */
  screenFingerprint: string;
  /**
   * True when the screen meaningfully changed during the flow. False when
   * the UI is exactly where you started — typing into a field does NOT
   * flip this flag (in-place mutation, not a screen change).
   */
  screenChanged: boolean;
  /**
   * Compact text snapshot at the end (or at point of failure). **Omitted
   * entirely** when screenChanged is false AND success is true — when
   * nothing changed and everything passed, there's nothing new to report
   * and the tree would just burn tokens.
   */
  finalUiTree?: string;
  error?: string;              // Top-level error message (only on failure)
  systemDialogs?: SystemDialog[]; // Permission prompts, crash dialogs detected during flow
  appCrashDetected?: boolean;  // True if the app process died (root package changed to system)
}

RefRegistry (compact text format)

Session-scoped map from short ref tokens (@b1, @f2, …) to UnifiedUINode instances. Rebuilt on every tree snapshot so refs always point at the live tree. Lives in src/android/ref-registry.ts.

class RefRegistry {
  /**
   * Walk a fresh tree, assign refs, compute fingerprint, and cache the
   * compact text result. Call this every time the tree changes.
   */
  rebuild(tree: UnifiedUINode, opts?: CompactSerializeOptions): CompactSerializeResult;

  /**
   * O(1) lookup. Throws ElementNotFoundError with stale-ref message if the
   * ref doesn't exist — tells the LLM to call agentest_get_ui_tree.
   * No automatic fallback to other selector fields.
   */
  resolve(ref: string): UnifiedUINode;

  get fingerprint(): string;  // 6-char hash from last rebuild
  get text(): string;         // compact text from last rebuild
  get size(): number;         // total refs in current map
  clear(): void;              // drop all state (called on connect)
}

interface CompactSerializeOptions {
  /** Hard cap on emitted lines. Defaults to 200. */
  maxLines?: number;
  /** Max indent depth. 0 = unlimited (default). */
  maxDepth?: number;
  /** Drop plain text lines. Defaults to false. */
  onlyInteractive?: boolean;
  /**
   * External labels keyed by node id — fiber-derived component names
   * from `FrameworkSync.snapshotFiberLabels()` (Phase 3.6). When a node
   * has an external label, the serializer prefers it over hoisted labels
   * over own labels. Makes unlabeled icon buttons in RN codegen apps
   * resolvable by the LLM without any target-app code changes.
   */
  externalLabels?: Map<string, string>;
}

interface CompactSerializeResult {
  text: string;                            // indented text representation
  fingerprint: string;                     // 6-char screen fingerprint
  refMap: Map<string, UnifiedUINode>;      // @ref → node
  refCount: number;                        // total refs assigned
  lineCount: number;                       // lines emitted
  truncated: boolean;                      // true if maxLines kicked in
}

The compact text format is documented in docs/mcp-tools.md#compact-tree-format.

Tree Fingerprints

Two fingerprint functions, exposed from src/android/tree-parser.ts:

/**
 * Idle fingerprint: stable across cursor blink + bounds jitter + live
 * timestamps. Used by idle.ts. Includes EditText text content because
 * typing IS a UI change for idle detection purposes.
 */
function computeIdleFingerprint(node: UnifiedUINode): string;

/**
 * Screen fingerprint: 6-char hash. Used by run_flow's screenChanged gate.
 * Critical: EXCLUDES EditText text content — typing should not flip the
 * fingerprint (concern #1 from the design review). The LLM verifies type
 * success via assertions or the final tree on !success.
 */
function computeScreenFingerprint(node: UnifiedUINode): string;

LlmTreeNode

Compact serialization of UnifiedUINode for LLM consumption. Omits default values to minimize token usage.

interface LlmTreeNode {
  id?: string;          // Only when resource-id is non-empty
  role: string;         // Always present
  text?: string;        // Only when non-empty
  desc?: string;        // Only when content description is non-empty
  cls?: string;         // Short class name for unlabeled elements (e.g. "ReactViewGroup")
  hint?: string;        // Phase 3.8 — hintText (API 26+). Compose TextField placeholder etc.
  state?: string;       // Phase 3.8 — stateDescription (API 28+). Compose Switch state etc.
  pane?: string;        // Phase 3.8 — paneTitle (API 28+). Compose Scaffold / navigation panes.
  tooltip?: string;     // Phase 3.8 — tooltipText (API 28+).
  bounds: string;       // Always present, format: "[left,top][right,bottom]"
  clickable?: true;     // Only for unlabeled tappable elements
  enabled?: false;      // Only when disabled (enabled is the default)
  checked?: true;       // Only when checked
  focused?: true;       // Only when focused
  selected?: true;      // Only when selected
  password?: true;      // Only for password fields
  scrollable?: true;    // Only for scrollable containers
  actions?: string[];   // Only when actions are available
  children?: LlmTreeNode[];  // Only when children exist
}

Tree pruning: The serialized tree is pruned for LLM consumption:

  • Single-child wrapper containers (no label, not clickable/scrollable) are collapsed — their child replaces them
  • Zero-size and off-screen elements are removed
  • System UI (com.android.systemui) nodes are removed
  • The full tree is preserved internally for findElements() — pruning only affects the LLM output

Constants (constants.ts)

All magic values are centralized in named constant objects:

ConstantPurpose
TOOL_NAMES, TOOL_NAMES_EXTMCP tool name strings (10 tools)
SERVER_NAME, SERVER_VERSIONMCP server identity
ADB, ADB_COMMANDS, ADB_COMMANDS_EXT, MONKEY_FLAGSADB command fragments incl. run-as, sqlite3, svc wifi/data, emu network
ANDROID_CLASSESAndroid widget fully-qualified class names
ANDROID_PROPSDevice property keys (SDK version, model, etc.)
TIMEOUTSAll timeout/interval values incl. KEYBOARD_SETTLE_MS, PINCH_DURATION_MS, ROTATE_DURATION_MS
BOUNDS_REGEXRegex for parsing [left,top][right,bottom]
DIFF_THRESHOLDSNoise filtering thresholds
KEYCODES, KEYCODE_TO_W3CAndroid keycodes + W3C key string mapping for gRPC sendKey
SWIPE_OFFSETSSwipe distance calculation constants
DOUBLE_TAPDouble tap interval (100ms)
CLEAR_TEXTClear text key sequence constants
RETRYADB retry config (3 attempts, exponential backoff)
SCROLL_TOScroll-to-find defaults (10 max scrolls, 300ms settle)
LOADING_INDICATORSClass names, text/desc patterns for spinner/shimmer detection
IDLE_LOADINGLoading wait config (8s max, 500ms poll)
LIGHTWEIGHT_ACTIONSActions that skip full idle polling
SYSTEM_PACKAGESSystem package names for dialog/crash detection
LOGCATMax logcat lines (200)
GRPCgRPC config (port offset 3000, timeouts, swipe FPS, finger pressure)
NETWORK_SPEED_PRESETS, NETWORK_DELAY_PRESETSValid presets for agentest_set_network

Error Classes (errors.ts)

All errors extend AgenTestError, which extends Error and adds a code: string property.

ClassCodeExtra Properties
AgenTestError(base)code: string
AdbConnectionErrorADB_CONNECTION_ERROR
AdbCommandErrorADB_COMMAND_ERRORcommand: string
ElementNotFoundErrorELEMENT_NOT_FOUNDselector: Record<string, unknown>
IdleTimeoutErrorIDLE_TIMEOUT
AssertionFailedErrorASSERTION_FAILEDexpected: string, actual: string
AppNotInstalledErrorAPP_NOT_INSTALLED
TreeParseErrorTREE_PARSE_ERROR
GrpcConnectionErrorGRPC_CONNECTION_ERROR
GrpcRpcErrorGRPC_RPC_ERRORrpcMethod: string