Overview
Extract all visible text from the current screen using Tesseract OCR. Returns structured data including each word’s text content, bounding box coordinates, and confidence scores. This method runs OCR on-demand and returns the results immediately. It’s useful for:- Verifying text content on screen
- Finding elements by their text when visual matching alone isn’t enough
- Debugging what text TestDriver can “see”
- Building custom text-based assertions
Performance: OCR runs server-side using Tesseract.js with a worker pool for fast extraction. A typical screenshot processes in 200-500ms.
Syntax
Parameters
None.Returns
Promise<OCRResult> - Object containing extracted text data
OCRResult
| Property | Type | Description |
|---|---|---|
words | OCRWord[] | Array of extracted words with positions |
fullText | string | All text concatenated with spaces |
confidence | number | Overall OCR confidence (0-100) |
imageWidth | number | Width of the analyzed screenshot |
imageHeight | number | Height of the analyzed screenshot |
OCRWord
| Property | Type | Description |
|---|---|---|
content | string | The word’s text content |
confidence | number | Confidence score for this word (0-100) |
bbox.x0 | number | Left edge X coordinate |
bbox.y0 | number | Top edge Y coordinate |
bbox.x1 | number | Right edge X coordinate |
bbox.y1 | number | Bottom edge Y coordinate |
Examples
Get All Text on Screen
Check if Text Exists
Find and Click Text
Filter Words by Confidence
Build Custom Assertions
Debug Screen Content
Find Multiple Instances
How It Works
- TestDriver captures a screenshot of the current screen
- The image is sent to the TestDriver API
- Tesseract.js processes the image server-side with multiple workers
- The API returns structured data with text and positions
- Bounding box coordinates are scaled to match the original screen resolution
OCR works best with clear, readable text. Very small text, unusual fonts, or low-contrast text may have lower confidence scores or be missed entirely.
Best Practices
Use find() for element location
Use find() for element location
For locating elements, prefer
find() which uses AI vision. Use ocr() when you need raw text data or want to build custom text-based logic.Filter by confidence
Filter by confidence
OCR can sometimes misread characters. Filter by confidence score when accuracy is critical.
Handle case sensitivity
Handle case sensitivity
Text matching should usually be case-insensitive since OCR capitalization can vary.
Wait for content to load
Wait for content to load
If text isn’t being found, the page may not be fully loaded. Add a wait or use
waitForText().Related
- find() - AI-powered element location
- assert() - Make AI-powered assertions about screen state
- waitForText() - Wait for text to appear on screen
- screenshot() - Capture screenshots

