What it does
This MCP server bridges Claude and Midscene.js to automate iOS apps through vision-driven UI testing. Instead of relying on fragile selectors or accessibility trees, it interprets screenshots to locate and interact with elements on screen. Any UI state a human can see—buttons, text fields, modal dialogs, even custom controls—becomes directly automatable. The server lets Claude run multi-step iOS workflows: ordering coffee, filling forms, verifying renders, testing workflows across native apps where traditional accessibility automation falls short.
Who it's for
iOS QA engineers and mobile app developers who want regression testing without maintaining brittle selectors. Teams automating iOS user flows—like mobile-first companies testing core workflows or enterprises validating cross-platform release candidates. Anyone using Claude for autonomous task execution who needs iOS app control.
Common use cases
- Automate multi-step iOS user flows (sign-up, checkout, form entry) without selectors.
- Verify rendered output and app state visually—colors, highlights, modal presence—not just
DOMnodes. - Run regression tests on native iOS apps and cross-origin web views within iOS apps.
- Autonomous iOS app testing via Claude agents, triggered on schedule or on demand.
Setup pitfalls
- Requires iOS simulator or connected device accessible from where the server runs.
- One secret detected in the codebase—secure API keys or credentials in environment variables, not version control.
- Depends on multimodal model availability (Qwen, Doubao, Gemini, or self-hosted UI-TARS). Verify your chosen model is configured and reachable.
- Needs filesystem and network write access for screenshots and model inference calls. Sandbox accordingly.