ocr()

Overview

Extract all visible text from the current screen using Tesseract OCR. Returns structured data including each word’s text content, bounding box coordinates, and confidence scores. This method runs OCR on-demand and returns the results immediately. It’s useful for:

Verifying text content on screen
Finding elements by their text when visual matching alone isn’t enough
Debugging what text TestDriver can “see”
Building custom text-based assertions

Performance: OCR runs server-side using Tesseract.js with a worker pool for fast extraction. A typical screenshot processes in 200-500ms.

Syntax

const result = await testdriver.ocr()

Parameters

None.

Returns

Promise<OCRResult> - Object containing extracted text data

OCRResult

Property	Type	Description
`words`	`OCRWord[]`	Array of extracted words with positions
`fullText`	`string`	All text concatenated with spaces
`confidence`	`number`	Overall OCR confidence (0-100)
`imageWidth`	`number`	Width of the analyzed screenshot
`imageHeight`	`number`	Height of the analyzed screenshot

OCRWord

Property	Type	Description
`content`	`string`	The word’s text content
`confidence`	`number`	Confidence score for this word (0-100)
`bbox.x0`	`number`	Left edge X coordinate
`bbox.y0`	`number`	Top edge Y coordinate
`bbox.x1`	`number`	Right edge X coordinate
`bbox.y1`	`number`	Bottom edge Y coordinate

Examples

Get All Text on Screen

const result = await testdriver.ocr();
console.log(result.fullText);
// "Welcome to TestDriver Sign In Email Password Submit"

console.log(`Found ${result.words.length} words with ${result.confidence}% confidence`);

Check if Text Exists

const result = await testdriver.ocr();

// Check for error message
const hasError = result.words.some(w => 
  w.content.toLowerCase().includes('error')
);

if (hasError) {
  console.log('Error message detected on screen');
}

Find and Click Text

const result = await testdriver.ocr();

// Find the "Submit" button text
const submitWord = result.words.find(w => w.content === 'Submit');

if (submitWord) {
  // Calculate center of the word's bounding box
  const x = Math.round((submitWord.bbox.x0 + submitWord.bbox.x1) / 2);
  const y = Math.round((submitWord.bbox.y0 + submitWord.bbox.y1) / 2);
  
  // Click at those coordinates
  await testdriver.click({ x, y });
}

Filter Words by Confidence

const result = await testdriver.ocr();

// Only use high-confidence words (90%+)
const reliableWords = result.words.filter(w => w.confidence >= 90);

console.log('High confidence words:', reliableWords.map(w => w.content));

Build Custom Assertions

import { describe, expect, it } from "vitest";
import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";

describe("Login Page", () => {
  it("should show form labels", async (context) => {
    const testdriver = TestDriver(context);
    
    await testdriver.provision.chrome({
      url: 'https://myapp.com/login',
    });

    const result = await testdriver.ocr();
    
    // Assert expected labels are present
    expect(result.fullText).toContain('Email');
    expect(result.fullText).toContain('Password');
    expect(result.fullText).toContain('Sign In');
  });
});

Debug Screen Content

// Useful for debugging what TestDriver can see
const result = await testdriver.ocr();

console.log('=== Screen Text ===');
console.log(result.fullText);
console.log('');

console.log('=== Word Details ===');
result.words.forEach((word, i) => {
  console.log(`${i + 1}. "${word.content}" at (${word.bbox.x0}, ${word.bbox.y0}) - ${word.confidence}% confidence`);
});

Find Multiple Instances

const result = await testdriver.ocr();

// Find all instances of "Button" text
const buttons = result.words.filter(w => 
  w.content.toLowerCase() === 'button'
);

console.log(`Found ${buttons.length} buttons on screen`);

buttons.forEach((btn, i) => {
  console.log(`Button ${i + 1} at position (${btn.bbox.x0}, ${btn.bbox.y0})`);
});

How It Works

TestDriver captures a screenshot of the current screen
The image is sent to the TestDriver API
Tesseract.js processes the image server-side with multiple workers
The API returns structured data with text and positions
Bounding box coordinates are scaled to match the original screen resolution

OCR works best with clear, readable text. Very small text, unusual fonts, or low-contrast text may have lower confidence scores or be missed entirely.

Best Practices

Use find() for element location

For locating elements, prefer find() which uses AI vision. Use ocr() when you need raw text data or want to build custom text-based logic.

// Prefer this for clicking elements
await testdriver.find("Submit button").click();

// Use ocr() for text verification or custom logic
const result = await testdriver.ocr();
expect(result.fullText).toContain('Success');

Filter by confidence

OCR can sometimes misread characters. Filter by confidence score when accuracy is critical.

const result = await testdriver.ocr();
const reliable = result.words.filter(w => w.confidence >= 85);

Handle case sensitivity

Text matching should usually be case-insensitive since OCR capitalization can vary.

const result = await testdriver.ocr();
const hasLogin = result.words.some(w => 
  w.content.toLowerCase() === 'login'
);

Wait for content to load

If text isn’t being found, the page may not be fully loaded. Add a wait or use waitForText().

// Wait for specific text to appear
await testdriver.waitForText("Welcome");

// Then run OCR
const result = await testdriver.ocr();

find() - AI-powered element location
assert() - Make AI-powered assertions about screen state
waitForText() - Wait for text to appear on screen
screenshot() - Capture screenshots

Overview

Creating Tests

Running Tests

Scaling

Actions

SDK

Overview

Syntax

Parameters

Returns

OCRResult

OCRWord

Examples

Get All Text on Screen

Check if Text Exists

Find and Click Text

Filter Words by Confidence

Build Custom Assertions

Debug Screen Content

Find Multiple Instances

How It Works

Best Practices

Overview

Creating Tests

Running Tests

Scaling

Actions

SDK

​Overview

​Syntax

​Parameters

​Returns

​OCRResult

​OCRWord

​Examples

​Get All Text on Screen

​Check if Text Exists

​Find and Click Text

​Filter Words by Confidence

​Build Custom Assertions

​Debug Screen Content

​Find Multiple Instances

​How It Works

​Best Practices

​Related

Overview

Syntax

Parameters

Returns

OCRResult

OCRWord

Examples

Get All Text on Screen

Check if Text Exists

Find and Click Text

Filter Words by Confidence

Build Custom Assertions

Debug Screen Content

Find Multiple Instances

How It Works

Best Practices

Related