An open protocol for AI agents to operate existing application user interfaces.
| MCP | (Anthropic) | --> | LLM ↔ Data / Tools |
| A2A | (Google) | --> | Agent ↔ Agent |
| AG-UI | (CopilotKit) | --> | Agent → Frontend streaming |
| A2UI | (Google) | --> | Agent → Generated UI |
| ACP | --> | Agent ↔ Existing Application UI |
Existing protocols let agents access data, coordinate with other agents, stream events to frontends, and generate new UI components. None of them allow an agent to operate an existing application's interface. ACP fills this gap.
Your app sends a manifest describing its screens, fields, actions, and modals. This is the agent's map of the interface.
The user sends natural language. The agent interprets intent using the manifest, knowing exactly what fields and actions are available.
The agent sends commands -- fill, click, navigate. The SDK executes them against the live UI and reports results back.
Three agents (Gemini, DeepSeek, Haiku) completing a full task in parallel using ACP.
{ "type": "manifest", "app": "contact-portal", "currentScreen": "contact", "screens": { "contact": { "id": "contact", "label": "Contact Form", "fields": [ { "id": "name", "type": "text", "label": "Full Name", "required": true }, { "id": "email", "type": "email", "label": "Email", "required": true }, { "id": "message", "type": "textarea", "label": "Message" } ], "actions": [ { "id": "submit", "label": "Send Message" } ] } } }
{ "type": "command", "seq": 1, "actions": [ { "do": "fill", "field": "name", "value": "Alice Park" }, { "do": "fill", "field": "email", "value": "[email protected]" }, { "do": "fill", "field": "message", "value": "Hello, I need help resetting my account." }, { "do": "click", "action": "submit" } ] }
{ "type": "result", "seq": 1, "results": [ { "index": 0, "success": true }, { "index": 1, "success": true }, { "index": 2, "success": true }, { "index": 3, "success": true } ] }
navigate, fill, click, select, clear, highlight, focus, scroll_to, show_toast, ask_confirm, open_modal, close_modal, enable, disable
text, number, currency, date, datetime, email, phone, masked, select, autocomplete, checkbox, radio, textarea, file, hidden
Screens, fields, actions, modals -- everything the agent needs to understand the application's UI and its current state.
The agent sends commands with sequence IDs. The SDK reports success or failure per action, enabling reliable multi-step workflows.
| Implementation | Type | Platform | Status |
|---|---|---|---|
| Vocall Engine by Primoia | Server | Go | Production |
| vocall_sdk by Primoia | SDK | Flutter | Production |
| vocall-react by Primoia | SDK | React / Next.js | Production |
The spec and conformance tests are all you need to build your own implementation. List yours here.