Interactive Mode
Last updated
Was this helpful?
Last updated
Was this helpful?
This is the most common issues encountered with our agent, here are some possible reasons.
The Computer-Use Agent uses the context from your prompt and the computer screen to make a decision of what commands to run. You should only prompt the AI to interact with elements it can currently see.
A common example of this is interacting with a dropdown. We often see users prompt the agent to interact with a dropdown and choose a state.
Instead, simply treat these as two separate prompts. This allows the UI to render and gives the AI the opportunity to parse the new screen data.
The Computer-Use Agentrelies on visual understanding, not functional. Like any user, the AI does not understand what the function of a button will be. It can only guess.
If you're uncertain of how to describe an icon, simply ask ChatGPT-4o what it would call it, and use that as your input.
Small, isolated images smaller than 15x15px appear like "noise" to the AI and may not be clickable. However, you can use the match-image
command to select these using manually made screenshots.
> Click on 'options' and select 'edit'
> click on options
> select edit
> click on the "new task icon"
> click on the "plus icon"