ByteDance's open-source multimodal AI agent that controls your desktop, browser, and terminal through vision - like having an AI assistant that can actually see and click

Ever wished you could tell an AI to “book me a flight” and watch it actually navigate websites, fill forms, and complete the task? TARS makes this reality with multimodal AI agents that can see your screen and interact with any GUI - whether that’s your desktop apps, web browsers, or terminal. Instead of wrestling with brittle automation scripts that break when UI changes, you get human-like task completion that adapts to what it sees.

The stack delivers two powerful tools: Agent TARS for terminal/web workflows with CLI and Web UI interfaces, plus UI-TARS Desktop for native GUI control of your entire desktop. Both leverage cutting-edge vision language models and integrate seamlessly with MCP (Model Context Protocol) tools. The 27k+ stars and ByteDance backing suggest this isn’t just another AI experiment - it’s production-ready infrastructure for the next wave of AI-human collaboration.

Perfect for developers building AI assistants, automating repetitive GUI tasks, or exploring multimodal AI capabilities. The TypeScript codebase makes it hackable, while both local and remote deployment options give you control over your data. With comprehensive docs and active development, you can go from clone to controlling your desktop in minutes.

⭐ Stars: 27380
💻 Language: TypeScript
🔗 Repository: bytedance/UI-TARS-desktop