Xu0307 commited on
Commit
8a3ce70
·
verified ·
1 Parent(s): 204f3a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -3
README.md CHANGED
@@ -1,3 +1,26 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # MAI-UI: Real-World Centric Foundation GUI Agents.
5
+ ![overview](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/_ibfeHy_31ZRanQ3xxlnn.png)
6
+
7
+ ## 📖 Background
8
+ The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning the full spectrum of sizes, including 2B, 8B, 32B, and 235B-A22B variants. We identify four key challenges to realistic deployment: the lack of native agent–user interaction, the limits of UI-only operation, the absence of a practical deployment architecture, and brittleness in dynamic environments. MAI-UI addresses these issues with a unified methodology: a self-evolving data pipeline that expands the navigation data to include user interaction and MCP tool calls, a native device–cloud collaboration system that routes execution by task state, and an online RL framework with advanced optimizations to scale parallel environments and context length.
9
+
10
+ ## 🏆 Results
11
+
12
+ MAI-UI establishes new state-of-the-art across GUI grounding and mobile navigation.
13
+
14
+ - On grounding benchmarks, it reaches 73.5% on ScreenSpot-Pro, 91.3% on MMBench GUI L2, 70.9% on OSWorld-G, and 49.2% on UI-Vision, surpassing Gemini-3-Pro and Seed1.8 on ScreenSpot-Pro.
15
+ ![sspro](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/oNggS-XoVZSiiLitn8nD3.jpeg)
16
+ ![uivision](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/8t_oRozJ51aw5rKojqlOu.jpeg)
17
+ ![mmbench](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/dwc8np4JLxTWoNS8wu18p.jpeg)
18
+ ![osworld-g](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/lff2k7ZkMgCrgt3lPZChb.jpeg)
19
+
20
+ - On mobile GUI navigation, it sets a new SOTA of 76.7% on AndroidWorld, surpassing UI-Tars-2, Gemini-2.5-Pro and Seed1.8. On MobileWorld, MAI-UI obtains 41.7% success rate, significantly outperforming end-to-end GUI models and competitive with Gemini-3-Pro based agentic frameworks.
21
+ ![aw](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/EUXl1osfQ26WGYcV4hhsG.jpeg)
22
+ ![mw](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/gsbjUtIoUFWyR6Nw3qRz6.jpeg)
23
+
24
+ - Our online RL experiments show significant gains from scaling parallel environments from 32 to 512 (+5.2 points) and increasing environment step budget from 15 to 50 (+4.3 points).
25
+ ![rl](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/BO8GnYXsJ51ZTvVRLxn6u.jpeg)
26
+ ![rl_env](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/39mhKfUvt7159dKM_oeGa.jpeg)