parse() - TestDriver

Overview

Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates. This method analyzes the entire screen and returns every detected element. It’s useful for:

Understanding the full UI layout of a screen
Finding all clickable or interactive elements
Building custom element-based logic
Debugging what elements TestDriver can detect
Accessibility auditing

Availability: parse() requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.

Syntax

const result = await testdriver.parse()

Parameters

None.

Returns

Promise<ParseResult> - Object containing detected UI elements

ParseResult

Property	Type	Description
`elements`	`ParsedElement[]`	Array of detected UI elements
`annotatedImageUrl`	`string`	URL of the annotated screenshot with bounding boxes
`imageWidth`	`number`	Width of the analyzed screenshot
`imageHeight`	`number`	Height of the analyzed screenshot

ParsedElement

Property	Type	Description
`index`	`number`	Element index
`type`	`string`	Element type (e.g. `"text"`, `"icon"`, `"button"`)
`content`	`string`	Text content or description of the element
`interactivity`	`string`	Interactivity level (e.g. `"clickable"`, `"non-interactive"`)
`bbox`	`object`	Bounding box in pixel coordinates `{x0, y0, x1, y1}`
`boundingBox`	`object`	Bounding box as `{left, top, width, height}`

Examples

Get All Elements on Screen

const result = await testdriver.parse();
console.log(`Found ${result.elements.length} elements`);

result.elements.forEach((el, i) => {
  console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
});

Find Clickable Elements

const result = await testdriver.parse();

const clickable = result.elements.filter(e => e.interactivity === 'clickable');
console.log(`Found ${clickable.length} clickable elements`);

clickable.forEach(el => {
  console.log(`- "${el.content}" at (${el.bbox.x0}, ${el.bbox.y0})`);
});

Find and Click an Element by Content

const result = await testdriver.parse();

// Find a "Submit" button
const submitBtn = result.elements.find(e => 
  e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
);

if (submitBtn) {
  // Calculate center of the bounding box
  const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
  const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
  
  await testdriver.click({ x, y });
}

Filter by Element Type

const result = await testdriver.parse();

// Get all text elements
const textElements = result.elements.filter(e => e.type === 'text');
textElements.forEach(e => console.log(`Text: "${e.content}"`));

// Get all icons
const icons = result.elements.filter(e => e.type === 'icon');
console.log(`Found ${icons.length} icons`);

// Get all buttons
const buttons = result.elements.filter(e => e.type === 'button');
console.log(`Found ${buttons.length} buttons`);

Build Custom Assertions

import { describe, expect, it } from "vitest";
import { TestDriver } from "testdriverai/vitest/hooks";

describe("Login Page", () => {
  it("should have expected form elements", async (context) => {
    const testdriver = TestDriver(context);
    
    await testdriver.provision.chrome({
      url: 'https://myapp.com/login',
    });

    const result = await testdriver.parse();
    
    // Assert expected elements exist
    const textContent = result.elements.map(e => e.content.toLowerCase());
    expect(textContent).toContain('email');
    expect(textContent).toContain('password');
    
    // Assert there are clickable elements
    const clickable = result.elements.filter(e => e.interactivity === 'clickable');
    expect(clickable.length).toBeGreaterThan(0);
  });
});

Use Bounding Box Coordinates

const result = await testdriver.parse();

result.elements.forEach(el => {
  // Pixel coordinates
  console.log(`Element "${el.content}":`);
  console.log(`  bbox: (${el.bbox.x0}, ${el.bbox.y0}) to (${el.bbox.x1}, ${el.bbox.y1})`);
  console.log(`  size: ${el.boundingBox.width}x${el.boundingBox.height}`);
  console.log(`  position: left=${el.boundingBox.left}, top=${el.boundingBox.top}`);
});

View Annotated Screenshot

const result = await testdriver.parse();

// The annotated image shows all detected elements with bounding boxes
console.log('Annotated screenshot:', result.annotatedImageUrl);
console.log(`Image dimensions: ${result.imageWidth}x${result.imageHeight}`);

How It Works

TestDriver captures a screenshot of the current screen
The image is sent to the TestDriver API
OmniParser v2 analyzes the image to detect all UI elements
Each element is classified by type (text, icon, button, etc.) and interactivity
Bounding box coordinates are returned in pixel coordinates matching the screen resolution

OmniParser detects elements visually — it works with any UI framework, native apps, and even non-standard interfaces. It does not rely on DOM or accessibility trees.

Best Practices

Use find() for targeting specific elements

For locating and interacting with a specific element, prefer find() which uses AI vision. Use parse() when you need a complete inventory of all elements on screen.

// Prefer this for clicking a specific element
await testdriver.find("Submit button").click();

// Use parse() for full UI analysis
const result = await testdriver.parse();
const allButtons = result.elements.filter(e => e.type === 'button');

Filter by interactivity

Use the interactivity field to distinguish between clickable and non-interactive elements.

const result = await testdriver.parse();
const interactive = result.elements.filter(e => e.interactivity === 'clickable');
const static_ = result.elements.filter(e => e.interactivity === 'non-interactive');

Wait for content to load

If elements aren’t being detected, the page may not be fully loaded. Add a wait first.

// Wait for page to stabilize
await testdriver.wait(2000);

// Then parse
const result = await testdriver.parse();

Use the annotated image for debugging

The annotatedImageUrl provides a visual overlay showing all detected elements with their bounding boxes — great for debugging.

const result = await testdriver.parse();
console.log('View annotated screenshot:', result.annotatedImageUrl);

find() - AI-powered element location
assert() - Make AI-powered assertions about screen state
screenshot() - Capture screenshots
Elements Reference - Complete Element API

​Overview

​Syntax

​Parameters

​Returns

​ParseResult

​ParsedElement

​Examples

​Get All Elements on Screen

​Find Clickable Elements

​Find and Click an Element by Content

​Filter by Element Type

​Build Custom Assertions

​Use Bounding Box Coordinates

​View Annotated Screenshot

​How It Works

​Best Practices

​Related

Overview

Syntax

Parameters

Returns

ParseResult

ParsedElement

Examples

Get All Elements on Screen

Find Clickable Elements

Find and Click an Element by Content

Filter by Element Type

Build Custom Assertions

Use Bounding Box Coordinates

View Annotated Screenshot

How It Works

Best Practices

Related